School of Computing and Information Systems (CIS) The University of Melbourne COMP90073
Security Analytics
Tutorial exercises: Week 8
1. StatesomerelationsbetweenautoencodersandPCA.
Copyright By PowCoder代写 加微信 powcoder
Solution: They are both feature representation learning methods. PCA is only linear transformation to the subspace while autoencoder is nonlinear transformation to the hidden units. If the autoencoder’s activation functions are linear, it is very similar to PCA method.
2. What is the complexity of the back-propagation algorithm for an autoencoder with L layers and K nodes per layer?
Solution: 𝑂𝑂(𝐾𝐾2𝐿𝐿) Thedominanttermisamultiplicationofavectorwitha𝐾𝐾×𝐾𝐾 matrixandthis has to be done in each of the L layers.
3. Assumethatyouinitializeallweightsinaneuralnettothesamevalueandyou do the same for the bias terms. Is this a good idea? Justify your answer.
Solution: This is a bad idea, since in this case every node on a particular level will learn the same feature.
4. An autoencoder is a neural network designed to learn feature representations in an unsupervised manner. Unlike a standard multi-layer network, an autoencoder has the same number of nodes in its output layer as its input layer. An autoencoder is trained to reconstruct its own input 𝑥𝑥, i.e. to minimize the reconstruction error. An autoencoder is shown below.
(𝐿𝐿 ) (𝐿𝐿 ) (𝐿𝐿 ) 123
following notation for this autoencoder:
Input Hidden Output
layer layer layer
Suppose the input is a set of 𝑃𝑃-dimensional unlabelled data �𝑥𝑥(𝑖𝑖)�𝑁𝑁 . Consider
an autoencoder with 𝐻𝐻 hidden units in the second layer 𝐿𝐿2. We will use the
• 𝑊𝑊𝑒𝑒 denotes the 𝑃𝑃 × 𝐻𝐻 weight matrix between 𝐿𝐿1 and 𝐿𝐿2
• 𝑊𝑊𝑑𝑑 denotes the 𝐻𝐻 × 𝑃𝑃 weight matrix between 𝐿𝐿2 and 𝐿𝐿3
• 𝜎𝜎 denotes the activation function for 𝐿𝐿2 and 𝐿𝐿3
• 𝑠𝑠(𝑖𝑖) = ∑𝑃𝑃 𝑊𝑊𝑒𝑒 𝑥𝑥(𝑖𝑖)
𝑗𝑗 𝑘𝑘=1𝑘𝑘𝑗𝑗𝑘𝑘
• h(𝑖𝑖) = 𝜎𝜎(∑𝑃𝑃 𝑊𝑊𝑒𝑒 𝑥𝑥(𝑖𝑖)) 𝑗𝑗 𝑘𝑘=1 𝑘𝑘𝑗𝑗 𝑘𝑘
• 𝑡𝑡(𝑖𝑖)=∑𝐻𝐻 𝑊𝑊𝑑𝑑h(𝑖𝑖) 𝑗𝑗 𝑘𝑘=1𝑘𝑘𝑗𝑗𝑘𝑘
• 𝑥𝑥�(𝑖𝑖)=𝜎𝜎�∑𝐻𝐻 𝑊𝑊𝑑𝑑h(𝑖𝑖)� 𝑗𝑗 𝑘𝑘=1 𝑘𝑘𝑗𝑗 𝑘𝑘
• 𝐽𝐽(𝑊𝑊𝑒𝑒, 𝑊𝑊𝑑𝑑)(𝑖𝑖) = �𝑥𝑥(𝑖𝑖) − 𝑥𝑥�(𝑖𝑖)�2 = ∑𝑃𝑃𝑗𝑗=1�𝑥𝑥𝑗𝑗(𝑖𝑖) − 𝑥𝑥�𝑗𝑗(𝑖𝑖)�2is the reconstruction error
for example 𝑥𝑥(𝑖𝑖)
• 𝐽𝐽(𝑊𝑊𝑒𝑒, 𝑊𝑊𝑑𝑑) = ∑𝑁𝑁𝑗𝑗=1 𝐽𝐽(𝑊𝑊𝑒𝑒, 𝑊𝑊𝑑𝑑)(𝑖𝑖) is the total reconstruction error
• (We add element 1 to the input layer and hidden layer so that no bias term has to be considered)
Fill in the following derivative equations for 𝑊𝑊 𝑒𝑒 and 𝑊𝑊 𝑑𝑑 . Use the notation defined above; there should be no new notation needed.
𝜕𝜕 𝐽𝐽 ( 𝑖𝑖 ) 𝑃𝑃
𝜕𝜕 𝑥𝑥� 𝑘𝑘 𝑘𝑘 𝑗𝑗 = 1
𝜕𝜕𝑊𝑊𝑑𝑑 𝑘𝑘 𝑘𝑘
𝜕𝜕 𝑥𝑥� ( 𝑖𝑖 ) =�� ⋅𝑗𝑗�
𝑗𝑗 = 𝜎𝜎′ �� 𝑊𝑊𝑒𝑒 𝑥𝑥(𝑖𝑖)� ⋅ 𝜕𝜕𝑊𝑊𝑑𝑑 𝑘𝑘𝑗𝑗 𝑘𝑘
𝜕𝜕𝐽𝐽𝑘𝑘𝑘𝑘 𝜕𝜕𝐽𝐽 𝑘𝑘=1 (𝑖𝑖)= (𝑖𝑖)⋅
𝜕𝜕𝑊𝑊𝑘𝑘𝑘𝑘 𝜕𝜕𝑠𝑠
𝜕𝜕𝐽𝐽 𝑒𝑒 𝐻𝐻𝑗𝑗(𝑖𝑖) 𝜕𝜕𝐽𝐽
𝜕𝜕𝑠𝑠(𝑖𝑖) = � � 𝜕𝜕𝑡𝑡(𝑖𝑖) ⋅ 𝑗𝑗(𝑖𝑖) 𝑘𝑘=1 𝑘𝑘(𝑖𝑖)
⋅ 𝜎𝜎′(𝑠𝑠(𝑖𝑖))� 𝑗𝑗
5. 3𝜎𝜎 rule is a common technique used for anomaly detection. Describe what is the intuition of this rule for anomaly detection? How our result will be effected if we use other values of 𝜎𝜎 (e.g., 2𝜎𝜎, or 4𝜎𝜎)?
A clear description can be find in
https://en.wikipedia.org/wiki/68%E2%80%9395%E2%80%9399.7_rule
Solution: 𝑗𝑗 𝑗𝑗
• 2(𝑥𝑥�(𝑖𝑖)−𝑥𝑥(𝑖𝑖))
• 𝜕𝜕𝑠𝑠𝑗𝑗(𝑖𝑖)=𝑥𝑥(𝑖𝑖)
𝑑𝑑 • 𝑊𝑊𝑘𝑘𝑘𝑘 𝑗𝑗𝑘𝑘
6. In the VAE, how sampling of the latent code is different during training and generation (generating a new sample)?
Solution: During training, we are drawing samples from the posterior distribution, because we are trying to reconstruct a specific datapoint. While, during generation, we want to generate samples from the prior distribution of latent codes.
During training, we are drawing h∼𝑃𝑃(h|𝑥𝑥), and then decoding with 𝑥𝑥̂ =𝑔𝑔(h) During generation, we are drawing h∼𝑃𝑃(h), and then decoding 𝑥𝑥=𝑔𝑔(h).
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com