Exercises for the course
Machine Learning 1
Winter semester 2020/21
Abteilung Maschinelles Lernen Institut fu ̈r Softwaretechnik und theoretische Informatik Fakult ̈at IV, Technische Universit ̈at Berlin Prof. Dr. Klaus-Robert Mu ̈ller Email: klaus-robert.mueller@tu-berlin.de
Exercise Sheet 14
Exercise 1: Class Prototypes (25 P)
Consider the linear model f(x) = w⊤x + b mapping some input x to an output f(x). We would like to interpret the function f by building a prototype x⋆ in the input domain which produces a large value f. Activation maximization produces such interpretation by optimizing
max f (x) + Ω(x). x
Find the prototype x⋆ obtained by activation maximization subject to Ω(x) = log p(x) with x ∼ N (μ, Σ) where μ and Σ are the mean and covariance.
Exercise 2: Shapley Values (25 P)
Consider the function f(x) = min(x1,max(x2,x3)). Compute the Shapley values φ1,φ2,φ3 for the prediction f(x) with x = (1,1,1). (We assume a reference point x = 0, i.e. we set features to zero when removing them from the coalition).
Exercise 3: Taylor Expansions (25 P)
Consider the simple radial basis function
Show that the first-order terms of the Taylor expansion are given by (xi − μi)2
φi = ∥x−μ∥2 ·(∥x−μ∥−θ) Exercise 4: Layer-Wise Relevance Propagation (25 P)
f(x) = ∥x − μ∥ − θ
with θ > 0. For the purpose of extracting an explanation, we would like to build a first-order Taylor expansion of the function at some root point x. We choose this root point to be taken on the segment connecting μ and x (we assume that f(x) > 0 so that there is always a root point on this segment).
We would like to test the dependence of layer-wise relevance propagation (LRP) on the structure of the neural network. For this, we consider the function y = max(x1,x2), where x1,x2 ∈ R+ are the input activations. This function can be implemented as a ReLU network in multiple ways. Three examples are given below.
(a) 1 1
x1 a3 (c) 1 a3
(b)
-11 yout x 1 0.5 x2 a4 1 1 1 0.5
-1 -1 a4 yout x1 a3 1 x2 -1 0.5
1 yout x21a41 1
a5
where j and k are indices for two consecutive layers and where ()+ denotes the positive part. This propagation rule is applied to both layers.
Give for each network the computational steps that lead to the scores R1 and R2, and the obtained relevance values. More specifically, express R1 and R2 as a function of R3 and R4 (and R5), and express the latter relevances as a function of Rout = y.
We consider the propagation rule:
ajw+ jk
Rj = ajw+ Rk k j jk