DeSBi
DeSBi
DeSBi

DeSBi Events

June 17, 2024

BIO-LLM Symposium

Organized by: Bernhard Renard (HPI) in Partnership with Massachusetts Institute of Technology (MIT)

Information: 

This event is hosted by the research group of Bernhard Renard, Data Analytics and Computational Analytics at the Hasso – Plattner – Institute (HPI), in Partnership with Massachusetts Institut of Technology (MIT). This event provides an unique opportunity to engage with leading researchers from both HPI and MIT working on BIO-LLM, fostering an environment of collaboration and innovation. 

Access:
bio-LLM Symposium Website

June 13, 2024

Neural Regression Models

Speaker:
David Rügamer

Invited by, Seminar series: DeSBi/ESS 
Sonja Greven

Abstract:
In this talk, I will present my group’s research on neural regression models, i.e., regression models embedded in neural networks, and discuss their applications, advantages, and challenges. Integrating statistical regression into neural networks can be done for several reasons. Within a neural network framework, it is, e.g., possible to jointly learn interpretable structured effects of tabular data and additional unstructured effects modeled via a deep neural network. Another application is distributional regression where transformation functions can be learned through monotonic neural networks while learned effects on the shift and scale of the distribution remain interpretable. To better understand such use cases from a statistical perspective, it is important to address open challenges that exist even for the simpler model class of structured regression models trained in a neural network. This includes optimization, sparsity, smoothing, and statistical inference, which I will touch upon during the talk.

April 30, 2024

AICO: Model-Agnostic Feature Significance Tests for Machine Learning (Stanford Univ.)

Speaker:
Kay Giesecke

Invited by, Seminar series:
Sonja Greven

Abstract:
Supervised learning algorithms are increasingly being used to guide economic and social decisions. The opaqueness of these algorithms is a key challenge, especially in highly regulated domains including financial services, healthcare and the judicial sector where transparency is crucial. This paper provides statistical tools to improve model transparency. We develop a family of hypothesis tests for testing the significance of the input features (variables) of a supervised learning algorithm. The key idea is to evaluate a feature’s incremental effect on model predictive or decision performance, relative to a baseline that does not permit variability of any feature. A null-feature’s incremental effect has a distribution across a test sample whose center is non-positive. This formulation leads to simple, assumption-light uniformly most powerful sign and other tests that deliver exact, non-asymptotic p-values. These tests are model-agnostic and apply to any regression or classification model, including ensembles. They do not require retraining or other computationally costly procedures. A byproduct of the tests are interpretable measures of feature importance including exact confidence intervals that are robust to correlation among features. A hierarchical formulation can discover higher-order structure tied to subsets of features including feature interactions. A simulation study is used to demonstrate the performance of our tests and alternative approaches. An empirical study tests feature significance for a deep learning ensemble model of US home prices.

Email:
giesecke@stanford.edu

April 9, 2024

Uncertainties in image segmentation and beyond

Speaker:
Aasa Feragen

Invited by, Seminar series:
Dagmar Kainmüller

Introduced by:
Claudia Winklmayr

Location/Time:
Elsa Neumann room, MDC-BIMSB Hannoversche Straße 28, 10115 Berlin

Abstract:
Quantification of image segmentation uncertainty has seen considerable attention over the past years, with particular emphasis on the aleatoric uncertainty stemming from ambiguities in the data. While there is a wealth of papers on the topic, the actual modelling objectives vary considerably, and as a consequence, the validation of segmentation uncertainty quantification also remains a partially open problem. In this talk we will take a birds-eye view on segmentation uncertainty, discussing fundamental modelling challenges, partial solutions, open problems, and links to related topics such as generative models and explainable AI.

March 20, 2024

Hide and Seek with Hidden Confounders.

Speaker:
Jonas Peters (Workshop)

Invited by, Seminar series:
Sonja Greven

Location/Time:
Faculty of Economics, Humboldt-Universität zu Berlin, Spandauer Straße 1, 10178 Berlin, Room 23.

February 27, 2024

Large-scale sequence models for biology and genomics

Speaker:
Volodymyr Kuleshov

Invited by, Seminar series:
Christoph Lippert

Abstract:
Large-scale sequence models have sparked rapid progress in machine learning, bringing about advances that extend beyond natural language processing (NLP) into science, biology, and medicine. Within the biological sciences, these models have enabled predicting protein structures from sequences, deciphering the functions and interactions of molecules, and crafting new molecules not found in nature. As the cost of compute decreases, foundation models are poised to have an increasingly important impact in science. This talk will explore how foundation models can help advance biosciences and healthcare, with a particular focus on DNA language models (LMs). Biological applications often have different characteristics from language or vision: these include long sequence lengths, unique notions of data equivariance, and various needs for model robustness and reliability. This talk will first outline recent work on DNA LMs motivated by the above challenges, and introduce ongoing work on the creation of a robust benchmark for assessing and evaluating their performance. Then, I will describe new architectures for modeling DNA based on state space model (SSM) approaches, which naturally handle long sequences of over hundreds of thousands of nucleotides without the quadratic computation cost of attention-based architectures. Specifically, I will introduce extensions of recent Mamba SSMs to support bi-directionality and reverse complement (RC) equivariance, and use them as the basis of Caduceus, a family of long-range DNA LMs. We applied Caduceus to variant effect prediction (VEP), a task that seeks to determine whether a genetic mutation influences a phenotype, and we found that Caduceus attains new state-of-the-art performance on this task.

January 16, 2024

Causal Inference on Variable Groups and Time Series

Speaker:
Dr. Jonas Wahl (TU Berlin)

Abstract:
I will present an introduction to graphical causal models over nodes that correspond to variable groups or to stochastic processes. These are often the targets of causal questions in the Earth Sciences as well as in other scientific domains. We will discuss the general modeling framework in a variable group/time series context, the main challenges to good model performance, and the interplay of dimension reduction and causal inference. The talk will be based on recent works done in collaboration with colleagues at TU Berlin and DLR Jena.

Email:
wahl@tu-berlin.de

October 17, 2023

From Explainable AI to Generative Modeling with Tree-Based Machine Learning

Speaker:
Marvin Wright

Invited by, Seminar series:
Sonja Greven (P1,P7), Statistics and Econometrics Seminar

Introduced by:
Who would like an introduction to the topic?

Location/Time:
HU Berlin, Spandauer Str. 1, 12-13 pm (intro 11-12).

Abstract:
Despite the success of deep learning, tree-based machine learning is still competitive in many application domains. Recent literature shows that, e.g. on tabular data, tree-based methods still outperform neural networks, while being faster, easier to apply and requiring less tuning. However, this research focuses exclusively on discriminative models and their prediction performance. In this talk, I present two recent tree-based methods that go beyond predictive performance. First, an explanation method that provides a global representation of a prediction function by decomposing it into the sum of main and interaction components of arbitrary order. Our method extends Shapley values to higher-order interactions and is applicable to tree-based methods that consist of ensembles of low dimensional structures such as gradient-boosted trees. Second, I present adversarial random forests, a provably consistent method for density estimation and generative modeling. With the new method, we achieve comparable or superior performance to state-of-the-art deep learning models on various tabular data benchmarks while executing about two orders of magnitude faster.

Access:
Hiabu, Meyer, Wright (2023). Unifying local and global model explanations by functional decomposition of low dimensional structures. AISTATS 2023
Adversarial random forests for density estimation and generative modeling. AISTATS 2023

Email:
wright@leibniz-bips.de