School of Computing and Information Systems (CIS) The University of Melbourne COMP90073
Security Analytics
Tutorial exercises: Week 8
1. StatesomerelationsbetweenautoencodersandPCA.
Copyright By PowCoder代写 加微信 powcoder
2. What is the complexity of the back-propagation algorithm for an autoencoder
with L layers and K nodes per layer?
3. Assumethatyouinitializeallweightsinaneuralnettothesamevalueandyou
do the same for the bias terms. Is this a good idea? Justify your answer.
4. An autoencoder is a neural network designed to learn feature representations in an unsupervised manner. Unlike a standard multi-layer network, an autoencoder has the same number of nodes in its output layer as its input layer. An autoencoder is trained to reconstruct its own input 𝑥𝑥, i.e. to minimize the reconstruction error. An autoencoder is shown below.
• 𝜎𝜎 denotes the activation function for 𝐿𝐿2 and 𝐿𝐿3
• 𝑠𝑠(𝑖𝑖) = ∑𝑃𝑃 𝑊𝑊𝑒𝑒 𝑥𝑥(𝑖𝑖)
𝑗𝑗 𝑘𝑘=1𝑘𝑘𝑗𝑗𝑘𝑘
• h(𝑖𝑖) = 𝜎𝜎(∑𝑃𝑃 𝑊𝑊𝑒𝑒 𝑥𝑥(𝑖𝑖)) 𝑗𝑗 𝑘𝑘=1 𝑘𝑘𝑗𝑗 𝑘𝑘
• 𝑡𝑡(𝑖𝑖)=∑𝐻𝐻 𝑊𝑊𝑑𝑑h(𝑖𝑖) 𝑗𝑗 𝑘𝑘=1𝑘𝑘𝑗𝑗𝑘𝑘
• 𝑥𝑥�(𝑖𝑖)=𝜎𝜎�∑𝐻𝐻 𝑊𝑊𝑑𝑑h(𝑖𝑖)� 𝑗𝑗 𝑘𝑘=1 𝑘𝑘𝑗𝑗 𝑘𝑘
(𝐿𝐿 ) (𝐿𝐿 ) (𝐿𝐿 ) 123
Input Hidden Output
layer layer layer
Suppose the input is a set of 𝑃𝑃-dimensional unlabelled data �𝑥𝑥(𝑖𝑖)�𝑁𝑁 . Consider
an autoencoder with 𝐻𝐻 hidden units in the second layer 𝐿𝐿2. We will use the
following notation for this autoencoder:
• 𝑊𝑊𝑒𝑒 denotes the 𝑃𝑃 × 𝐻𝐻 weight matrix between 𝐿𝐿1 and 𝐿𝐿2
• 𝑊𝑊𝑑𝑑 denotes the 𝐻𝐻 × 𝑃𝑃 weight matrix between 𝐿𝐿2 and 𝐿𝐿3
• 𝐽𝐽(𝑊𝑊𝑒𝑒, 𝑊𝑊𝑑𝑑)(𝑖𝑖) = �𝑥𝑥(𝑖𝑖) − 𝑥𝑥�(𝑖𝑖)�2 = ∑𝑃𝑃𝑗𝑗=1�𝑥𝑥𝑗𝑗(𝑖𝑖) − 𝑥𝑥�𝑗𝑗(𝑖𝑖)�2is the reconstruction error
for example 𝑥𝑥(𝑖𝑖)
• 𝐽𝐽(𝑊𝑊𝑒𝑒, 𝑊𝑊𝑑𝑑) = ∑𝑁𝑁𝑗𝑗=1 𝐽𝐽(𝑊𝑊𝑒𝑒, 𝑊𝑊𝑑𝑑)(𝑖𝑖) is the total reconstruction error
• (We add element 1 to the input layer and hidden layer so that no bias term has to be considered)
Fill in the following derivative equations for 𝑊𝑊𝑒𝑒 and 𝑊𝑊𝑑𝑑 . Use the notation defined above; there should be no new notation needed.
𝜕𝜕 𝐽𝐽 ( 𝑖𝑖 ) 𝑃𝑃 𝜕𝜕 𝑥𝑥� ( 𝑖𝑖 )
𝜕𝜕 𝑥𝑥� 𝑘𝑘 𝑘𝑘 𝑗𝑗 = 1
𝜕𝜕𝑊𝑊𝑑𝑑 𝑘𝑘 𝑘𝑘
𝑗𝑗 = 𝜎𝜎′ �� 𝑊𝑊𝑒𝑒 𝑥𝑥(𝑖𝑖)� ⋅ 𝜕𝜕𝑊𝑊𝑑𝑑 𝑘𝑘𝑗𝑗 𝑘𝑘
𝜕𝜕𝐽𝐽𝑘𝑘𝑘𝑘 𝜕𝜕𝐽𝐽 𝑘𝑘=1 (𝑖𝑖)= (𝑖𝑖)⋅
𝜕𝜕𝑊𝑊𝑘𝑘𝑘𝑘 𝜕𝜕𝑠𝑠
𝜕𝜕𝐽𝐽 𝑒𝑒 𝐻𝐻𝑗𝑗(𝑖𝑖) 𝜕𝜕𝐽𝐽
𝜕𝜕𝑠𝑠(𝑖𝑖) = � � 𝜕𝜕𝑡𝑡(𝑖𝑖) ⋅ 𝑗𝑗(𝑖𝑖) 𝑘𝑘=1 𝑘𝑘(𝑖𝑖)
⋅ 𝜎𝜎′(𝑠𝑠(𝑖𝑖))� 𝑗𝑗
5. 3𝜎𝜎 rule is a common technique used for anomaly detection. Describe what is the intuition of this rule for anomaly detection? How our result will be effected if we use other values of 𝜎𝜎 (e.g., 2𝜎𝜎, or 4𝜎𝜎)?
6. In the VAE, how sampling of the latent code is different during training and generation (generating a new sample)?
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com