King’s College London
Department of Mathematics
MSc in Mathematical Finance Academic year 2019–2020, Spring term
7CCMFM18 Machine Learning Problem sheet 1,
Solutions will be discussed between 20 January and.
Problem 1
Consider the logical operations
OR(x1,x2):=
NAND(x1, x2) :=
0, x1=1,×2 =0, 0, x =0,x =1,
0, x1 =0,×2 =0,
1, x1=1,×2=0, 1, x =0,x =1,
0, x1=0,×2 =0,
AND(x1,x2):=
1 2
12
1, x1=1,×2=1, 1, x1 =0,×2 =0,
1, x1=1,×2=0,
1, x =0,x =1, 1 2
0, x1=1,×2=1,
XOR(x1,x2):=
1, x1=1,×2 =1, 0, x1=0,×2 =0,
1, x1=1,×2 =0,
1, x =0,x =1, 12
0, x1=1,×2 =1.
Recall also the two-input perceptron
f (x1,x2;w1,w2,b):= H(w1x1 +w2x2 +b),
with weights w1,w2 ∈ R and bias b ∈ R, where H is the Heaviside function (as defined on page 13 of lecture notes).
(a) ShowthateachoftheoperationsOR,ANDandNANDcanberepresentedby theperceptronwithsomeweightsw1,w2 ∈{−1,1}andbiasb∈R.
(b) ShowthattheoperationXORcannotberepresentedbytheperceptronwith any choice of weights w1, w2 ∈ R and bias b ∈ R.
1
Problem 2
(a) ProveProposition2.5inlecturenotes.
(b) Prove Proposition 2.6 in lecture notes. Hint: Consider first the special case wheredk′ =dk+1forsomek=1,…,r−1whiledi′ =di fori̸=k. Usethis then to deduce the general statement.
Problem 3
Let r ≥ 2 and I,d1,…,dr−1,O ∈ N. Additionally, let σ be a function RO → RO. Show that there exists d ∈ N such that
Nr (I,d1,…,dr−1,O;ReLU,…,ReLU,σ) ⊂Nr+1(I,d1,…,dr−1,d,O;ReLU,…,ReLU,σ).
Hint: A linear combination of two ReLUs can represent the identity function. This way you can add a “dummy” layer to the network that does nothing.
Problem 4
(a) Let d ≥ 2 and let Softmaxd : Rd → Rd denote the d-dimensional softmax function. Suppose that x = (x1,…,xd) ∈ Rd is such that there exists i = 1,…,d for which xi > xj for all j ̸= i (that is, there is unique maximal com- ponent). For scalar λ > 0, determine the limit
lim Softmaxd(λx). λ→∞
Does this explain why the function is called “softmax”?
(b) Represent the convolutional layer C k (i.e., with stride = 1) of Definition 2.12 in lecture notes in matrix form,
Ck(x)=Wx, x∈Rd, withweightmatrixW ∈R(d−l−l′)×d determinedexplicitly.
2