HW09
The instructions are very simple: download the file and fill in the TODO items. You will submit your solution as a .py file¡£
Several notes:
1. The TODO items are the least tricky parts of the code and have substantial hints. Also,
there aren’t very many of them.
2. At the same time, there’s some fairly interesting/useful stuff in there (like how to do things manually with gradient values in Keras, which is surprisingly non-trivial to figure out from scratch), for those who might want to spend more time investigating and experimenting with the code.
3. Even when you implement all of the methods correctly, it won’t work that great most of the time. In my experience, it gets significant traction sometimes (which shows that there’s not just a gradient ascent sign error or something), but it seems highly dependent on the luck of the initialization of the weights. You are not responsible for making the algorithm perform well. You are just responsible for implementing it correctly. However, if someone
does figure out how to tweak the hyper-parameters or otherwise modify the code to get better results, please share that information with the rest of us.
4. There are some sample method calls in the last function, to help you with incremental troubleshooting.
5. You can call A.diagnostics() to see some information about what’s going on during training. This is not necessary for the assignment but would be helpful if you want to troubleshoot the performance .
6. There’s a file on the internet which bears some superficial similarity to this one. It doesn’t carry risk of academic dishonesty issues, though, because it is going to be seriously useless. Just a heads up.
7. The code is kind of intended to be run in a Jupyter notebook, because of the fancy progress meter. If you want to run it in another context, replace the tqdm import with from tqdm import tqdm
8. I updated the last page of the RL lesson on Data Gymnasia to reflect the specific method used in this homework. This is also not really necessary but might be helpful.