Technical Program

Paper Detail

Paper ID	TN2.3
Paper Title	Rate-Distortion Explanation (RDE): A Theoretical Framework for Interpreting Neural Network Decisions
Authors	Jan Macdonald, Stephan Wäldchen, Sascha Hauch, Gitta Kutyniok, TU Berlin, Germany
Session	TN2: Mathematics of Deep Learning
Location	Salle Route du Rhum
Session Time	Tuesday, 17 December, 16:00 - 17:20
Presentation Time	Tuesday, 17 December, 16:40 - 17:00
Presentation	Lecture
Topic	Special Sessions: Mathematical Foundations of Deep Learning
Abstract	Despite the outstanding success of deep neural networks in real-world applications, most of the related research is empirically driven and a mathematical foundation of expressivity, learning, and generalization is still almost completely missing. The novel research direction of interpretability takes a slightly different viewpoint by assuming that we are already given a trained neural network without knowledge of how it was trained -- a situation one will encounter numerous times in the future. One key question is then to identify those features from the input, which are most crucial for the observed output, with a future vision being to derive an explanation of a network decision which is indistinguishable from a human explanation. In this talk, we provide a theoretical framework for interpreting neural network decisions by formalizing the problem in a rate-distortion framework. This setting allows us to analyze the computational complexity of the related optimization problem, and show that it is complete for $NP^{PP}$, a complexity class frequently utilized for AI tasks. A relaxed version of this optimization problem, which we coin Rate-Distortion Explanation (RDE), is then discussed and analyzed. Finally, we present numerical experiments showing that our algorithmic approach outperforms established methods, in particular, for sparse explanations of neural network decisions. This is joint work with Jan Macdonald, Stephan Wäldchen, and Sascha Hauch (TU Berlin).