COMP9444 Neural Networks and Deep Learning Quiz 8 Deep RL and Unsupervised Learning
This is an optional quiz to test your understanding of Deep RL and Unsupervised Learning.
1. Write out the steps in the REINFORCE algorithm, making sure to define any symbols you use.
for each trial
run trial and collect states , acions
for = 1 to length(trial)
θ ← θ + η( total – b) ∇θ log πθ( |
end end
and reward )
θ = parameters of policy, η = learning rate, total = total reward received during trial,
= baseline (constant), ∇θ = gradient with respect to θ, πθ( | ) = probability of performing action in state .
2. In the context of Deep Q-Learning, explain the following: a. Experience Replay
The agent(s) choose actions according to their current Q-function, using an ε- greedy strategy, and contribute to a central database of experiences in the form ( ). Another thread samples experiences asynchronously from the
experience database, and updates the Q-function by gradient descent, to minimize
[ + γ max ( ) – ( )]2
b. Double Q-Learning
Two sets of Q values are maintained. The current Q-network is used to select
actions, and a slightly older Q-network w̄ is used for the target value. 3. What is the Energy function for these architectures:
a. Boltzmann Machine
b. Restricted Boltzmann Machine
Remember to define any variables you use. a. Boltzmann Machine
