Background

In biomedicine, structured data in the form of image stacks, genome sequences or time series are collected in increasing number and volume from high-throughput technologies. They are characterised by their inherent interdependencies between measurements, their often non-vector nature and the presence of confounding influences and sampling biases. For example, population structure, systematic measurement error, non-independent sampling, or different age distributions in groups can lead to spurious results if not accounted for.

Deep learning (DL) excels in many applications on structured data and allows for accurate predictions due to its ability to capture complex dependencies within and between inputs and outputs. Despite recent advances, DL still has limitations in terms of uncertainty assessment, interpretability, and validation. However, these are essential components to go beyond prediction towards understanding the underlying biology.

To this end, statistics has traditionally been used in the biomedical sciences to obtain interpretable model results and statistical inference, i.e., to quantify uncertainty, correct confounding, and test hypotheses with statistical error control. However, classic statistical methods are limited in their modelling flexibility for structured data and in their ability to capture complex nonlinearities in a data-driven manner.

Thus, modern methodological tools combining advantages of DL and statistics for the analysis of structured biomedical datasets are urgently needed.

Our Aim

In this research unit, we bring together experts from machine learning and statistics with a track record in biomedical applications to develop methods that integrate DL and statistics. We aim to improve interpretability, uncertainty quantification and statistical inference for DL, and to improve modeling flexibility of statistical methods for structured data. In particular, we will develop methods that provide statistical inference for structured data by quantification of uncertainty, testing of hypotheses and adjustment for confounders, and that improve explanations of structured data through hybrid statistical and deep learning models, population- and distribution -level explanations, and robust sparse explanations.

We create all methods in a feedback loop with biomedical applications, to account for their data analysis needs in methods development, and to directly generate novel bio-medical insights using our newly developed methods. Applications include analysis of MRI, fMRI and microscopy images, longitudinal disease progression modeling, DNA sequence analysis, and genetic association studies. Furthermore, methods will be relevant far beyond these important use cases.


DFG KI-FOR-5363 DeSBi     Adress     Contact Us
Spokesperson:
Prof. Dr. Sonja Greven (HU Berlin)

Co-Spokesperson:
Prof. Dr. Christoph Lippert (University of Potsdam/ HPI)
    KI-FOR-5363 DeSBi
School of Business and Economics
Humboldt-Universität zu Berlin
Spandauer Straße 1, D-10178 Berlin
Germany
    Eliza Mandieva
Project Coordinator
Chair of Statistics
School of Business and Economics
Humboldt-Universität zu Berlin
Spandauer Straße 1, D-10178 Berlin
E-Mail: eliza.mandieva@hu-berlin.de
Phone: +49-30-2093-99489