Paper ID | TN2.3 |
Paper Title |
Rate-Distortion Explanation (RDE): A Theoretical Framework for Interpreting Neural Network Decisions |
Authors |
Jan Macdonald, Stephan Wäldchen, Sascha Hauch, Gitta Kutyniok, TU Berlin, Germany |
Session | TN2: Mathematics of Deep Learning |
Location | Salle Route du Rhum |
Session Time | Tuesday, 17 December, 16:00 - 17:20 |
Presentation Time | Tuesday, 17 December, 16:40 - 17:00 |
Presentation |
Lecture
|
Topic |
Special Sessions: Mathematical Foundations of Deep Learning |
Abstract |
Despite the outstanding success of deep neural networks in real-world applications, most of the related research is empirically driven and a mathematical foundation of expressivity, learning, and generalization is still almost completely missing. The novel research direction of interpretability takes a slightly different viewpoint by assuming that we are already given a trained neural network without knowledge of how it was trained -- a situation one will encounter numerous times in the future. One key question is then to identify those features from the input, which are most crucial for the observed output, with a future vision being to derive an explanation of a network decision which is indistinguishable from a human explanation.
In this talk, we provide a theoretical framework for interpreting neural network decisions by formalizing the problem in a rate-distortion framework. This setting allows us to analyze the computational complexity of the related optimization problem, and show that it is complete for $NP^{PP}$, a complexity class frequently utilized for AI tasks. A relaxed version of this optimization problem, which we coin Rate-Distortion Explanation (RDE), is then discussed and analyzed. Finally, we present numerical experiments showing that our algorithmic approach outperforms established methods, in particular, for sparse explanations of neural network decisions.
This is joint work with Jan Macdonald, Stephan Wäldchen, and Sascha Hauch (TU Berlin). |