程序代写代做 King’s College London

King’s College London
Department of Mathematics Academic year 2019–2020, Spring term Dr Blanka Horvath
7CCMFM18 Machine Learning Problem sheet 1 — Solutions
Problem 1
(a) Briefinspectionsuggeststhatforx1,x2∈{0,1}, 
OR(x1, x2) = 0, 1,

NAND(x1,x2)=0, 1,
 =0,
1,
x1+x2 <1, x1+x2≥1, x1+x2≥2, x1+x2<2, −x1−x2≤−2, −x1−x2>−2.
x1+x2 <2, 1, x1+x2≥2, Therefore, say, OR(x1,x2) = f (x1,x2;1,1,−1), NAND(x1,x2)= f (x1,x2;−1,−1,1.5), OR(x1,x2) = f (x1,x2;1,1,−2), x1,x2 ∈{0,1}. (b) Supposethatthereexistw1,w2∈Randb∈Rsuchthat XOR(x1,x2)=f(x1,x2;w1,w2,b), x1,x2∈{0,1}. By the definition of XOR,  w1·0+w2·0+b<0  b<0   w1·1+w2·0+b≥0  w1 + b ≥ 0 ⇐⇒ .  w1·0+w2·1+b≥0  w2 + b ≥ 0    w 1 · 1 + w 2 · 1 + b < 0    w 1 + w 2 + b < 0 The first and second inequalities imply that w1 ≥ −b > 0. Thus, by the third inequality, w1 +(w2 +b) ≥ 0, which contradicts with the fourth inequality. This shows that we cannot represent XOR with the perceptron.
1
AND(x1,x2)=0,

Problem 2
(a) Afunction f ∈Nr(O,d1,…,dr−1,O;σ1,…,σd)isexpressedas f =σr ◦Lr ◦···◦σ1◦L1,
where the affine functions L1,…,Lr characterise f . For each i = 1,…,r, the function Li is parameterised by a di × di −1 weight matrix and di -dimensional bias vector, that is, by di−1di +di = (di−1 +1)di parameters. Summing up, we get
r
􏰂(di−1 +1)di i=1
parameters in total, as claimed.
(b) Following the hint, consider f ∈ Nr (I,d1,…,dr−1,O,g1,…,gr−1,σr ) and let
di′=di+1andd′j =dj forj̸=i.Letuslookinto
Li(x)= Wi x+ bi , Li+1(x)= Wi+1 x+ bi+1 .
􏰎􏰍􏰌􏰏 􏰎􏰍􏰌􏰏 􏰎 􏰍􏰌 􏰏 􏰎􏰍􏰌􏰏
∈Rdi ×di−1 ∈Rdi ∈Rdi+1×di ∈Rdi+1 Definefirst,using0d =(0,…,0)∈Rd,
􏰈Wi􏰉′ i􏰈bi􏰉′ 􏰙 􏰚 ′
Wi:= ∈Rd×d ,
and then L􏰆i : Rdi−1 → Rdi′ and L􏰆i+1 : Rdi′ → Rdi+1 by
×d 􏰘 ii−1􏰆 i􏰘 d i+1i
Wi+1:= Wi+1 0 ∈Rd 0 ′d i − 1 0 i + 1
b := ∈Rd,
Li(x):=Wix+bi, Li+1(x):=Wi+1x+bi+1. By construction,
di′ di
􏰌 􏰏􏰎 􏰍 􏰌 􏰏􏰎 􏰍
L􏰆i+1◦(gi,…,gi)◦L􏰆i =Li+1◦(gi,…,gi)◦Li, 􏰎 􏰍􏰌 􏰏 􏰎 􏰍􏰌 􏰏
􏰆􏰘􏰆􏰆􏰘
: = g􏰆 i
:=gi
whereby
f =σr ◦Lr ◦gr−1◦Lr−1◦···◦Li+1◦gi ◦Li ◦···◦g1◦L1
= σ r ◦ L r ◦ g r − 1 ◦ L r − 1 ◦ · · · ◦ L􏰆 i + 1 ◦ g􏰆 i ◦ L􏰆 i ◦ · · · ◦ g 1 ◦ L 1 = : 􏰆f ,
while 􏰆f ∈ Nr (I,d′ ,…,d′ ,O,g1,…,gr−1,σr ). In the general case we simply
1 r−1
apply the result we just proved several times to increase the number of units
until we reach the specified amount in each hidden layer.
2

Problem 3
As suggested in the hint, the crucial observation is that Id(x)=x=ReLU(x)−ReLU(−x), x∈R.
To make use of the identity (1), define 1
−1 0  
1  −1 

(1)
while
 1  Md := ∈R
and,foranyW =[Wi,j]1≤i≤d′,1≤j≤d ∈Rd′×d,
W −W W −W ··· W
−W  1,d
 −1   … 
  ..
. 0 1

−1
1,1 1,1 1,2 1,2 1,d W −W W −W ··· W
−W  2,d 
 2,1 2,1
W:=. . . . . . .∈R .
2,2 2,2 2,d
 . . . . .. . . 
d′×2d
2d ×d

Wd′,1 −Wd′,1 Wd′,2 −Wd′,2 ··· Wd′,d −Wd′,d Moreover,writeReLUd(x):=􏰄ReLU(x1),…,ReLU(xd)􏰅∈Rd forx=(x1,…,xd)∈Rd.
Now note that ReLU2d(Mdx)=􏰄ReLU(x1),ReLU(−x1),…,ReLU(xd),ReLU(−xd)􏰅,
W ReLU2d(Mdx)=W ReLUd(x)−W ReLUd(−x) =W􏰄ReLUd(x)−ReLUd(−x)􏰅 (2)
=Wx,
by (1). Consider now
f =σ◦Lr◦ReLUdr−1◦Lr−1◦···◦ReLUd1◦L1 ∈Nr(I,d1,…,dr−1,O;ReLU,…,ReLU,σ),
whereLr(x)=Wrx+br,x∈Rdr−1.DefineL􏰆r :Rdr−1 →R2dr−1 by L􏰆r(x):=Mdr−1x
3

andL􏰆r+1 :R2dr−1 →Rdr by Thanks to (2),
L􏰆r+1(x):=Wrx+br.
L 􏰓ReLU 􏰄L (x)􏰅􏰔=Wrx+bi =L (x), x∈Rdr−1.
􏰆r+1 2dr−1 􏰆r r Consequently,
f =σ◦L􏰆r+1◦ReLU2dr−1 ◦L􏰆r ◦ReLUdr−1 ◦Lr−1◦···◦ReLUd1 ◦L1, 􏰎 􏰍􏰌 􏰏
=Lr
whereby
f ∈Nr+1(I,d1,…,dr−1,2dr−1,O;ReLU,…,ReLU,σ). Problem 4
(a) Notefirstthat
eλxi eλxi 1
􏰁dj=1 eλxj = eλxi 􏰁dj=1 eλ(xj −xi ) = 1+􏰁j̸=i eλ(xj −xi ) ,
where x j − xi < 0 for any j ̸= i , whereby lim 􏰂eλ(xj−xi)=0, so that λ→∞ j ̸=i eλxi lim􏰁d λx =1. λ→∞ j=1e j Since the components of softmax sum up to one, it follows that eλxk lim 􏰁d λx =0 fork̸=i. λ→∞ j=1e j Thus,limλ→∞Softmaxd(λx1,...,λxd)equalstheargmaxfunctionof(x1,...,xd), which returns one for the largest coordinate and zero for others — softmax can thus be seen as a smoothed version of argmax, as the latter is the limit of the former when the scaling of the inputs is blown up. (b) ThelayerCk withk=(k−l′,...,kl)canberepresentedwith k−l′ k−l′+1 ··· kl−1 kl 0  k′k′···kk   −l −l +1 l−1 W:= . . . . ∈R . l  (d−l−l′)×d  .. .. ··· .. ..   0 k−l′ k−l′+1 ··· kl−1 kl 4