P4: Regularization Strategies for Interpretable Deep Models and Robust Explanations with Application to Genomics

Project Summary

Deep networks have made significant progress in various areas, but understanding model decisions can be challenging due to their hierarchical, structured, and non-linear topology. To overcome this, one successful approach involves tracing the output score back through the network and combining relevance scores for all input features. Project P4 expands on this work in three ways: improving the robustness of XAI techniques used to compute population-level explanations, developing training objectives and regularization techniques to enhance interpretability, and applying them to genomic and bioimage data. To evaluate pattern sharing between protein targets, this project will use statistical methods for explanations developed in P2 as well as deep-learning-based conditional independence tests from P1. Additionally, we will collaborate with P3 to develop explanations in the face of model uncertainty. The regularization and visual interpretation methods developed in this project will also be used in P5.

Research Question(s)

How can we enhance the robustness and interpretability of explainable AI (XAI) techniques for deep networks, particularly in the context of genomic and bioimage data?

Research Framework

The research plan aims to enhance the interpretability and robustness of deep models through several lines of work, including prototype-based explanations and outlier detection, model correction and data attribution techniques. Our methods are tested on biomedical data and images. Overall, this helps to contribute to building more reliable and transparent AI systems in high-stakes medical scenarios.

Main Contribution

We follow several lines of work regarding regularization strategies for interpretable deep models, and to increase robustness of explanations, especially in the context of biomedical data. Significant progress has been made with regard to prototypical explanations [1], enabling visualisations of clusters of latent relevances and e.g. increased robustness by outlier detection. A novel concept-level correction method for deep neural networks that reduces sensitivity to biases through gradient penalization has been proposed in [4], addressing the risks of biased predictions in high-stakes applications like medical decision-making. By reducing the model sensitivity towards biases via gradient penalization, the method effectively mitigates biases in both controlled and real-world scenarios, including biomedical images. Another work on model correction is [2] which addresses the issue of Deep Neural Networks relying on spurious correlations in high-risk applications. Preventing overcorrection, it introduces a method for minimizing detrimental effects of corrections while reducing reliance on harmful features. „DualView“, and efficient and scalable alternative to methods of data attribution is proposed in [3]. It uses a surrogate interpretable model to generate explanations hundreds of thousands of times faster, while producing explanations of comparable quality to other methods, across many different criteria. In [5], we present a method for disentangling polysemantic neurons in deep neural networks by decomposing them into monosemantic „virtual“ neurons, enabling the isolation of sub-graphs for distinct features.

Publications

Dreyer, Maximilian, et al. „Understanding the (extra-) ordinary: Validating deep model decisions with prototypical concept-based explanations.“ Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024.
Bareeva, Dilyara, et al. „Reactive Model Correction: Mitigating Harm to Task-Relevant Features via Conditional Bias Suppression.“ Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024.
Yolcu, Galip Ümit, et al. „DualView: Data Attribution from the Dual Perspective.“ arXiv preprint arXiv:2402.12118 (2024).
Dreyer, Maximilian, et al. „From hope to safety: Unlearning biases of deep models via gradient penalization in latent space.“ Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 38. No. 19. 2024.
Dreyer, Maximilian, et al. „PURE: Turning Polysemantic Neurons Into Pure Features by Identifying Relevant Circuits.“ CVPR Workshop.