Some Highly Complex Statistical Models in Two Different Scenarios in Life Sciences

Rodrigo Labouriau
(Department of Mathematics, Aarhus University)
Thiele Seminar
Thursday, 9 April, 2015, at 13:15, in Koll. D (1531-211)
Abstract:

Some highly complex statistical models pertaining to two different typical scenarios in life sciences will be presented. The scenarios are of rather different nature; while in the first scenario the complexity arises from a very large number of observations of relatively few variables (typically observed with noise), in the second scenario the complexity is due to a large number of variables observed in a relatively low number of observational units. Some of the challenges posed by these two scenarios are the construction of calculable procedures of inference and the construction of efficient algorithms. Another challenge is to obtain stable inference and prediction procedures.

The models illustrated from the first scenario are multivariate generalized linear mixed models (e.g. multivariate binomial models for describing genetic determination of Metritis based on around 900,000 cows) and complex Gaussian frailty models for right-censored variables (e.g. models for describing genetic determination longevity and fertility of cows based on around 800,000 animals and more than 2,000,000 observations and the Nordic Cattle Genetic Evaluation involving simultaneous analysis of several traits using more than 40 million observations).

The examples of the second scenario I will present are high-dimensional graphical models, i.e. multivariate models where the restrictions imposed in the covariance structure are encoded by a graph. Typical examples are models for the transcription factors of, say, around 50,000 genes simultaneously measured in hundreds of individuals, but there are many other examples from modern molecular biology. Here imposing restrictions in the topology of the associated graph allows us to build efficient algorithms, perform local inference for answering specific biological questions and define suitable notions of information contents of biological meaning. A new theorem characterizing the topology of a graph associated to data of evolution threes (involving the so-called junction three of triangulable graphs) will be presented.

Organised by: The T.N. Thiele Centre
Contact person: Søren Asmussen