School of Computing and Information Systems The University of Melbourne
COMP90049 Introduction to Machine Learning (Semester 1, 2022) Week 10: Solutions
1. Consider the two levels deep network illustrated below. It is composed of three perceptron. The two perceptron of the first level implement the AND and OR function, respectively.
Determine the weights θ1, θ2 and bias θ0 such that the network implements the XOR function. The initial weights are set to zero, i.e., θ01 = θ11 = θ21 = 0, and the learning rate 𝜂 (eta) is set to 0.1.
Copyright By PowCoder代写 加微信 powcoder
• The input function for the perceptron on level 2 is the weighted sum (Σ) of its input.
• The activation function f for the perceptron on level 2 is a step function:
• Assume that the weights for the perceptron of the first level are given.
Learning Algorithm for XOR:
For each training example x 𝑝←(𝑝,𝑓 (𝑥),𝑓 (𝑥))
𝑦)←𝑓(∑( 𝜃𝑝) ‘)! ‘ ‘
𝑦 ← 𝑡𝑎𝑟𝑔𝑒𝑡 𝑜𝑓 𝑥
For i = 1: n
∆𝜃’ ←𝜂(𝑦−𝑦))𝑝’ 𝜃’ ← 𝜃’ + ∆𝜃’
To calculate the output of the two level 1 perceptron, we can use the following table (Remember since the weights for these perceptron are given, we don’t need to do the iterative learning):
Train instance # 𝑥* 𝑥+ 𝑝* 𝑝+ y
𝑓 (𝑥) 𝑓 (𝑥) 𝑥 𝑋𝑂𝑅𝑥
“#$ %& * +
110011 201011 311110 400000
Based on the results from above table, our input signals (training instances) for level 2 perceptron (XOR)
are𝑃 =<𝑝 ,𝑝 ,𝑝 >=<−1,𝑝 ,𝑝 > andtheparametersareθ=<θ01,θ11,θ21>=<0,0,0>. ‘!*+*+
So, for the first epoch we have:
<𝑝!,𝑝*,𝑝+> <-1,0,1>
<-1,0,1> <-1,1,1>
<-1,0,0> <-1,0,1>
<-1,0,1> <-1,1,1>
<-1,0,1> <-1,0,1> <-1,1,1> <-1,0,0>
𝜃! ×𝑝! +𝜃* ×𝑝* +𝜃+ ×𝑝+
0x(-1)+0x0+0x1=0 Update θ to
<0, 0, 0> + <-0.1,0 ,0.1>=<-0.1,0 ,0.1>
(-0.1)x-1+0x0+(0.1)x1=0.2
(-0.1)x-1+0x1+(0.1)x1=0.2
Update θ to
<-0.1,0 ,0.1> + <0.1, -0.1, -0.1> = < 0, -0.1, 0>
0x-1+(-0.1)x0+0x0=0
0x-1+(-0.1)x0+0x1=0
Update θ to
< 0, -0.1, 0> + <-0.1, 0, 0.1> = < -0.1, -0.1, 0.1>
(-0.1)x-1+(-0.1)x0+(0.1)x1=0.2
0x-1+(-0.2)x0+0x0=0
(-0.1)x-1+(-0.2)x0+(0.1)x1=0.2 (-0.1)x-1+(-0.2)x1+(0.1)x1=0
0x-1+(-0.2)x0+(0.1)x1=0.1 0x-1+(-0.2)x0+(0.1)x1=0.1 0x-1+(-0.2)x1+(0.1)x1=-0.1 0x-1+(-0.2)x0+(0.1)x0=0
f(0.2)=1 1 f(0.2)=1 0
f(0)=0 0 f(0)=0 1
f(0.2)=1 1 0
f(0)=0 0 1
f(0.2)=1 1 f(0)=0 0 0
f(0.1)=1 1 f(0.1)=1 1 f(-0.1)=0 0 f(0)=0 0
0.1 x (1-0) x <-1, 0,1> = <-0.1, 0, 0.1>
No change required
0.1 x (0-1) x <-1, 1,1> = <0.1, -0.1 ,-0.1>
No change required
0.1 x (1-0) x <-1, 0,1> = <-0.1, 0, 0.1>
No change required
No change required
No change required No change required
No change required No change required No change required No change required
algorithm converges here. -0.2, 0.1>. In other words,
(-0.1) x -1 + (-0.1) x 1 + (0.1) x 1 = 0.1
Update θ to
< -0.1, -0.1, 0.1> + <0.1, -0.1, -0.1> = <0, -0.2, 0>
0.1 x (0-1) x <-1, 1,1> = <0.1, -0.1, -0.1>
0 x -1 + (-0.2) x 0 + 0 x 1 = 0
Update θ to
<0, -0.2, 0> + <-0.1, 0, 0.1> = < -0.1, -0.2, 0.1>
0.1 x (1-0) x <-1, 0,1> = <-0.1, 0, 0.1>
(-0.1) x -1 + (-0.2) x 0 + (0.1) x 0 = 0.1
Update θ to
< -0.1, -0.2, 0.1> + < 0.1, 0, 0> = < 0, -0.2, 0.1>
0.1x(0-1)x<-1,0,0> = < 0.1, 0, 0>
Since in the last epoch we didn’t have any updated for our weights (θ1), the
So, for our network to perform XOR function, the final θ1 would be θ1= < 0, θ10 =0,θ11 =-0.2andθ12 =0.1
2. Consider the following multilayer perceptron.
The network should implement the XOR function. Perform one epoch of backpropagation as introduced in the lecture on multilayer perceptron.
The activation function f for a perceptron is the sigmoid function: 𝑓(𝑥) = 1
The bias nodes are set to -1. They are not shown in the network Use the following initial parameter values:
𝜃($) =2 𝜃($) =−1 𝜃(') =−2 #$ #' #$
• The learning rate is set to η = 0.7
Compute the activations of the hidden and output neurons.
Therefore, we will have:
𝜃($) =6 𝜃($) =8 𝜃(') =6 $$ $' $$
𝜃($) =−6 𝜃($) =−8 𝜃(') =−6 '$ '' '$
Since our activation function here is the sigmoid (σ) function, in each node (neuron) we can calculate the output by applying the sigmoid function (𝜎(𝑥) = * ) on the weighted sum (Σ) of its input (that we
usually represent by 𝑧(0) where 𝑎 shows the level of the neuron (e.g., level 1, 2, 3) and 𝑏 is the index. .
So, for the neuron i in level 1, we will have:
𝑎(*) = 𝜎C𝑧(*)D= 𝜎C𝜃(*).𝑋D= 𝜎(𝜃(*) × 𝑥 +𝜃(*) × 𝑥 +𝜃(*) × 𝑥 )
' ' ' !' ! *' * +' +
𝑎(*)=𝜎(𝜃(*) ×𝑥 +𝜃(*) ×𝑥 +𝜃(*) ×𝑥) * !* ! ** * +* +
= 𝜎(2 × (−1) + 6𝑥* − 6𝑥+)
𝑎(*)=𝜎(𝜃(*) ×𝑥 +𝜃(*) ×𝑥 +𝜃(*) ×𝑥) + !+ ! *+ * ++ +
= 𝜎((−1) × (−1) + 8𝑥* − 8𝑥+)
Now the output of our network is simply applying the same rule on the inputs of our last neuron:
𝒚J = 𝑎(+) = 𝜎C𝑧(+)D = 𝜎C𝜃(+). 𝐴(*)D = 𝜎C𝜃(+) × 𝑎(*) + 𝜃(+) × 𝑎(*) + 𝜃(+) × 𝑎(*)D * * !* ! ** * +* +
= 𝜎((−2) × (−1) + 6𝑎*(*) − 6𝑎+(*))
You can find the schematic representation of these calculations in the following network.
ii. Compute the error of the network.
𝐸 = 1 ( y − 𝑦) ) + = 1 ( y − 𝑎 *( + ) ) + 22
iii. Backpropagate the error to determine ∆θij for all weights θij and updates the weight θij.
In neural networks with backpropagation, we want to minimize the error of our network by finding the optimum weights (θij) for our network. To do so we want to find the relation (dependency) between the error and the weights in each layer. Therefore, we use the derivatives of our error function.
45 𝜕𝜃(6) 5 45
𝛿(6) = U1 − 𝜎C𝑧(6)DV 𝜎C𝑧(6)DC𝑦 − 𝑎(6)D 5555
𝛿(6) = 𝜎C𝑧(6)D U1 − 𝜎C𝑧(6)DV 𝜃(6,*)𝛿(6,*) 5 5 5 5* *
𝑓𝑜𝑟 𝑡h𝑒 𝑙𝑎𝑠𝑡 𝑙𝑎𝑦𝑒𝑟
𝑓𝑜𝑟 𝑡h𝑒 𝑙𝑎𝑦𝑒𝑟 𝑏𝑒𝑓𝑜𝑟𝑒
Now we use our first training data (1, 0) to train the model:
𝜃(6) ← 𝜃(6) + ∆𝜃(6) 45 45 45
𝑤h𝑒𝑟𝑒: ∆𝜃(6) = −𝜂 𝜕𝐸 = 𝜂 𝛿(6)𝑎(67*)
𝜎((𝑧(𝑙)) 𝑘
𝑎(*) = 𝜎C𝑧(*)D=𝜎(6𝑥 −6𝑥 −2)=𝜎(6×1−6×0−2)= 𝜎(4)≅0.982 ***+
𝑎+(*) =𝜎C𝑧+(*)D= 𝜎(8𝑥* −8𝑥+ +1)=𝜎(8×1−8×0+1)= 𝜎(9)≅0.999
𝑎*(+) = 𝜎C𝑧*(+)D = 𝜎C2 + 6𝑎*(*) − 6𝑎+(*)D = 𝜎(2 + 6 × 0.982 − 6 × 0.999) = 𝜎(1.898) ≅ 0.8691
𝐸= 1(1− 0.8691)+ =0.0086 2
x1 x2 𝑎(*) 𝑎(*) 𝑎(+) Y (XOR) E(𝜃)= * (y − 𝑦))+ *+*+
1 0 𝜎(4) = 0.982 𝜎(9) = 0.999 0.8691 1 0.0086 Now we calculate the backpropagation error. Starting from the last layer, we will have:
𝛿*(+) = 𝜎C𝑧*(+)D U1 − 𝜎C𝑧*(+)DV (𝑦 − 𝑎*(+))
= 𝜎(1.898)C1 − 𝜎(1.898)D(1 − 0.8691) = 0.8691(1 − 0.8691)(0.13) = 0.0149
𝛿(*) = 𝜎C𝑧(*)DU1− 𝜎C𝑧(*)DV𝜃(+)𝛿(+) ** ****
= 𝜎(4)C1 − 𝜎(4)D × 6 × 0.0149 = 0.982(1 − 0.982)0.0882 = 0.0016
𝛿(*) = 𝜎C𝑧(*)DU1− 𝜎C𝑧(*)DV𝜃(+)𝛿(+) ++ ++**
= 𝜎(9)C1 − 𝜎(9)D × (−6) × 0.0149 = 0.999(1 − 0.999)(−0.0882) = −0.000001103
Using the learning rate of η = 0.7 we can now calculate ∆𝜃(6) : 45
∆𝜃(+) = 𝜂𝛿(+)𝑎(*) = 0.7 × 0.0149 × (−1) = −0.0104 !* *!
∆𝜃(+) = 𝜂𝛿(+)𝑎(*) = 0.7 × 0.0149 × 0.982 = 0.0102 ** **
∆𝜃(+) = 𝜂𝛿(+)𝑎(*) = 0.7 × 0.0149 × 0.999 = 0.0104 +* *+
∆𝜃(*) = 𝜂𝛿(*)𝑥 !* *!
∆𝜃(*) = 𝜂𝛿(*)𝑥 ** **
∆𝜃(*) =𝜂𝛿(*)𝑥 +* *+
∆𝜃(*) = 𝜂𝛿(*)𝑥 !+ +!
∆𝜃(*) = 𝜂𝛿(*)𝑥 *+ +*
∆𝜃(*) = 𝜂𝛿(*)𝑥 ++ ++
= 0.7 × 0.0016 × (−1) = −0.0011 = 0.7 × 0.0016 × 1 = 0.0011 =0.7×0.0016×0=0
= 0.7 × 0.000001103 × (−1) ≅ 0 = 0.7 × 0.000001103 × 1 ≅ 0
= 0.7 × 0.000001103 × 0 = 0
Based on these results we can update the network weights:
𝜃(+) = 𝜃(+) + ∆𝜃(+) = −2 + (−0.0104) = −2.0104 !* !* !*
𝜃(+) = 𝜃(+) + ∆𝜃(+) = 6 + 0.0102 = 6.0102 ** ** **
𝜃(+) = 𝜃(+) + ∆𝜃(+) = −6 + 0.0104 = −5.9896 +* +* +*
𝜃(*) = 𝜃(*) + ∆𝜃(*) = 2 + (−0.0011) = 1.9989 !* !* !*
𝜃(*) = 𝜃(*) + ∆𝜃(*) = 6 + 0.0011 = 6.0011 ** ** **
The rest of the weights do not change i.e.
𝜃(*) =−1, 𝜃(*) =−6, 𝜃(*) =−8, ∆𝜃(*) =8
!+ +* ++ *+
Now we use our next training instance (0,1):
𝑎*(*) = 𝜎C𝑧*(*)D = 𝜎(1.9989 + 6.00112𝑥* − 6𝑥+) = 𝜎(6.00112 × 0 − 6 × 1 + 1.9989) = 𝜎(−7.9989) ≅ 3.387𝑒 − 4
𝑎+(*) =𝜎C𝑧+(*)D= 𝜎(−1+ 8𝑥* −8𝑥+)=𝜎(8×0−8×1+1)= 𝜎(−7)≅9.11𝑒−4
𝑎*(+) = 𝜎C𝑧*(+)D = 𝜎C−2.01029 + 6.0101 × 𝜎(−7.9989) − 5.98972 × 𝜎(−7)D = 𝜎(2.007) = 0.8815
𝐸= 1(1− 0.8815)+ =0.007 2
𝑎(*) 𝑎(*) 𝑎(+) Y (XOR) E(𝜃)= * (y − 𝑦))+ *+* +
𝜎(4) 𝜎(9) 0.8691 1 0.0086 𝜎(−7.9989) 𝜎(−7) 0.8815 1 0.007
To calculate the backpropagation error, we will have1:
𝛿*(+) = 𝜎C𝑧*(+)D U1 − 𝜎C𝑧*(+)DV C𝑦 − 𝑎*(+)D
= 𝜎(2.007)C1 − 𝜎(2.007)D(1 − 0.8815) = 0.0124
𝛿(*) = 𝜎C𝑧(*)DU1− 𝜎C𝑧(*)DV𝜃(+)𝛿(+) ** ****
= 𝜎(−7.9989)C1 − 𝜎(−7.9989)D × 6.0102 × 0.0124 ≅ 0 𝛿(*) = 𝜎C𝑧(*)DU1− 𝜎C𝑧(*)DV𝜃(+)𝛿(+)
= 𝜎(−7)C1 − 𝜎(−7)D × (−5.9896) × 0.0124 ≅ 0
∆𝜃(+) = 𝜂𝛿(+)𝑎(*) = 0.7 × 0.0124 × (−1) = −0.0087 !* *!
∆𝜃(+) = 𝜂𝛿(+)𝑎(*) = 0.7 × 0.0124 × 3.387𝑒 − 4 ≅ 0 ** **
∆𝜃(+) =𝜂𝛿(+)𝑎(*) =0.7×0.0124×9.11𝑒−4≅0 +* *+
∆𝜃(*) =𝜂𝛿(*)𝑥 =0.7×(2.496𝑒−5)×(−1)≅0 !* *!
∆𝜃(*) =𝜂𝛿(*)𝑥 =0.7×(2.496𝑒−5)×0=0 ** **
∆𝜃(*) =𝜂𝛿(*)𝑥 =0.7×(2.496𝑒−5)×1≅0 +* *+
∆𝜃(*) =𝜂𝛿(*)𝑥 =0.7×(−6.7454𝑒−5)×(−1)≅0 !+ +!
∆𝜃(*) =𝜂𝛿(*)𝑥 =0.7×(−6.7454𝑒−5)×0=0 *+ +*
∆𝜃(*) =𝜂𝛿(*)𝑥 =0.7×(−6.7454𝑒−5)×1≅0 ++ ++
Based on these results we can update the network weights:
𝜃(+) = 𝜃(+) + ∆𝜃(+) = −2.01029 + (−0.0087) ≅ −2.0191 !* !* !*
Since the other values are very small, the rest of the weights do not change. Now we use our next training instance (1, 1):
𝑎*(*) = 𝜎C𝑧*(*)D = 𝜎(6.0102𝑥* − 5.9896𝑥+ + 2.0191)
= 𝜎(6.0102 × 1 − 5.9896 × 1 + 2.0191) = 𝜎(−1.9978)
𝑎+(*) =𝜎C𝑧+(*)D= 𝜎(8𝑥* −8𝑥+ +0.9999)=𝜎(8×1−8×1+0.9999)= 𝜎(0.9999)
𝑎*(+) = 𝜎C𝑧*(+)D = 𝜎C2.0191 + 6.0102 × 𝜎(−1.9978) − 5.9896 × 𝜎(0.9999)D = 𝜎(−1.6416) = 0.1622
𝐸= 1(0−0.1622)+ =0.0132 2
𝑎(*) 𝑎(*) 𝑎(+) Y (XOR) E(𝜃)= * (y − 𝑦))+ *+* +
0 1 𝜎(−7.9989) 1 1 𝜎(−1.9978)
𝜎(9) 0.8691 1 𝜎(−7) 0.8815 1
𝜎(0.9999) 0.1622 0
0.0086 0.007 0.0132
1 In this assignment, we make the simplifying assumption that any update < 1e-4 ≅ 0 to keep the computations feasible.
To calculate the backpropagation error, we will have:
𝛿*(+) = 𝜎C𝑧*(+)D U1 − 𝜎C𝑧*(+)DV C𝑦 − 𝑎*(+)D = −0.0221
𝛿(*) = 𝜎C𝑧(*)D U1 − 𝜎C𝑧(*)DV 𝜃(+)𝛿(+) = −0.0139 ** ****
𝛿(*) = 𝜎C𝑧(*)D U1 − 𝜎C𝑧(*)DV 𝜃(+)𝛿(+) = 0.0260 ++ ++**
∆𝜃(+) = 𝜂𝛿(+)𝑎(*) = 0.0154 !* *!
∆𝜃(+) = 𝜂𝛿(+)𝑎(*) = −0.0018 ** **
∆𝜃(+) = 𝜂𝛿(+)𝑎(*) = −0.0113 +* *+
∆𝜃(*) = 𝜂𝛿(*)𝑥 !* *!
∆𝜃(*) = 𝜂𝛿(*)𝑥 ** **
∆𝜃(*) = 𝜂𝛿(*)𝑥 +* *+
∆𝜃(*) = 𝜂𝛿(*)𝑥 !+ +!
∆𝜃(*) = 𝜂𝛿(*)𝑥 *+ +*
∆𝜃(*) = 𝜂𝛿(*)𝑥 ++ ++
= −0.0098 = −0.0098 = 0.0182
= 0.0182 = −0.0098 = 0.0182
Based on these results we can update the network weights:
𝜃(+) = 𝜃(+) + ∆𝜃(+) = −2.0037 !* !* !*
𝜃(+) = 𝜃(+) + ∆𝜃(+) = 6.0084 ** ** **
𝜃(+) = 𝜃(+) + ∆𝜃(+) = −6.0009 +* +* +*
𝜃(*) = 𝜃(*) + ∆𝜃(*) = 2.0086 !* !* !*
𝜃(*) = 𝜃(*) + ∆𝜃(*) = 5.9913 ** ** **
𝜃(*) = 𝜃(*) + ∆𝜃(*) = −6.0097 +* +* +*
𝜃(*) = 𝜃(*) + ∆𝜃(*) = −1.0181 !+ !+ !+
𝜃(*) = 𝜃(*) + ∆𝜃(*) = 8.0182 *+ *+ *+
𝜃(*) = 𝜃(*) + ∆𝜃(*) = −7.9819 ++ ++ ++
Now we use our next training instance (0, 0):
𝑎*(*) = 𝜎C𝑧*(*)D = 𝜎(5.9913𝑥* − 6.00097𝑥+ + 2.0037)
= 𝜎(5.9913 × 0 − 6.00097 × 0 + 2.0086) = 𝜎(2.0086)
𝑎+(*) = 𝜎C𝑧+(*)D = 𝜎(8.0182𝑥* − 7.9819𝑥+ + 1.0181) = 𝜎(1.0181)
𝑎*(+) = 𝜎C𝑧*(+)D = 𝜎C2.0037 + 6.0084 × 𝜎(2.0086) − 6.0009 × 𝜎(1.0181)D = 𝜎(−1.6938) = 0.1553
𝐸 = 1 ( 00.1553)+ = 0.0121 2
x1 x2 𝑎(*) 𝑎(*) 𝑎(+) Y (XOR) E(𝜃)= * (y − 𝑦))+ *+* +
1 0 𝜎(4) 𝜎(9) 0.8691 1 0.0086 0 1 𝜎(−7.9989) 𝜎(−7) 0.8815 1 0.007
1 1 𝜎(−1.9978) 𝜎(0.9999) 0.1622 0 0 0 𝜎(−2.0086) 𝜎(1.0181) 0.1553 0
To calculate the backpropagation error, we will have:
𝛿*(+) = 𝜎C𝑧*(+)D U1 − 𝜎C𝑧*(+)DV C𝑦 − 𝑎*(+)D = −0.0204
𝛿(*) = 𝜎C𝑧(*)D U1 − 𝜎C𝑧(*)DV 𝜃(+)𝛿(+) = −0.0128 ** ****
𝛿(*) = 𝜎C𝑧(*)D U1 − 𝜎C𝑧(*)DV 𝜃(+)𝛿(+) = 0.0238 ++ ++**
∆𝜃(+) = 𝜂𝛿(+)𝑎(*) = 0.0143 !* *!
∆𝜃(+) = 𝜂𝛿(+)𝑎(*) = −0.0017 ** **
∆𝜃(+) = 𝜂𝛿(+)𝑎(*) = −0.0105 +* *+
∆𝜃(*) = 𝜂𝛿(*)𝑥 = −0.0098 !* *!
∆𝜃(*) = 𝜂𝛿(*)𝑥 = 0 ** **
∆𝜃(*) = 𝜂𝛿(*)𝑥 = 0 +* *+
∆𝜃(*) = 𝜂𝛿(*)𝑥 = 0.0167 !+ +!
∆𝜃(*) = 𝜂𝛿(*)𝑥 = 0 *+ +*
∆𝜃(*) = 𝜂𝛿(*)𝑥 = 0 ++ ++
Based on these results we can update the network weights:
𝜃(+) = 𝜃(+) + ∆𝜃(+) = −2.0176 !* !* !*
𝜃(+) = 𝜃(+) + ∆𝜃(+) = 6.0067 ** ** **
𝜃(+) = 𝜃(+) + ∆𝜃(+) = −6.0113 +* +* +*
𝜃(*) = 𝜃(*) + ∆𝜃(*) = 2.0176 !* !* !*
𝜃(*) = 𝜃(*) + ∆𝜃(*) = 5.9913 ** ** **
𝜃(*) = 𝜃(*) + ∆𝜃(*) = −6.0097 +* +* +*
𝜃(*) = 𝜃(*) + ∆𝜃(*) = −1.0348 !+ !+ !+
𝜃(*) = 𝜃(*) + ∆𝜃(*) = 8.0182 *+ *+ *+
𝜃(*) = 𝜃(*) + ∆𝜃(*) = −7.9819 ++ ++ ++
And this would be the end of epoch one!J
0.0132 0.0121
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com