1. 2.
http://cs229.stanford.edu/notes/cs229-notes12.pdf
“There are 4 locations (labeled by different letters), and our job is to pick up the passenger at one location and drop him off at another. We receive +20 points for a successful drop-off and lose 1 point for every time-step it takes. There is also a 10 point penalty for illegal pick-up and drop-off actions.”
https://www.learndatasci.com/tutorials/reinforcement-q-learning-scratch-python-openai-gym/
https://star-ai.github.io/Rendering-OpenAi-Gym-in-Colaboratory/ https://colab.research.google.com/drive/18LdlDDT87eb8cCTHZsXyS9ksQPzL3i6H
https://github.com/openai/mujoco-py 30 days free trial =>
http://mujoco.org/