代写代考 ECS170 W22 - PowCoder代写

Deep Reinforcement Learning
ECS170 W22

Review – Machine Learning

Types of machine learning?

Review – Machine Learning
Types of machine learning?
● Supervised learning: X, y, train f(X’)->y’ ~ y
● Unsupervised learning: X, no y
● Representation learning: X∈Rn f(X) -> X’∈Rm , m< action s.t.
action maximizes reward

Review – Reinforcement Learning
● Q(st, at) = rt + γmax(Q(st+1))

Review – Reinforcement Learning
● Q(st, at) = (1-α)Q(st, at)+α(rt + γmax(Q(st+1)))

Review – Reinforcement Learning
● Q(st, at) = (1-α)Q(st, at)+α(rt + γmax(Q(st+1)))
● Algorithm:
○ Select action a
○ Transition s –a→s’
○ Update Q-table

Review – Reinforcement Learning
● Q(st, at) = (1-α)Q(st, at)+α(rt + γmax(Q(st+1))) ● Algorithm:
○ Draw random number, if > ε exploit, otherwise explore
○ Transition s –a→s’
○ Update Q-table
○ Decrement ε

Review – Limitations?

Review – Limitations?
● We will stubble upon a reward
● What to do about big problems?
● Learning is too specific

Moving Forward
● Grounding Reinforcement Learning
● Neural Networks ● Q-Networks

Different Perspectives On Q-Learning
● Learning the rules of the game

Different Perspectives On Q-Learning
● Dynamic programming

Different Perspectives On Q-Learning
● Dynamic programming
○ Break big problem up into subproblems
○ Solve a subproblem optimally
○ Construct an optimal solution based on optimal subproblems

Different Perspectives On Q-Learning
● Analogous to psychology

Different Perspectives On Q-Learning
● Analogous to psychology ○ Pavlovian conditioning
■ Positive feedback to encourage actions
■ Negative feedback to discourage actions

Proof Of Optimality – A Unique Circumstance
● Stationary distribution assumption ● Overtraining
Predictive Inequity in Object Detection Wilson et al 2019

Proof Of Optimality – A Unique Circumstance
● Stationary distribution assumption ● Overtraining
● Q’ monotonically improves
● Q’ never overestimates Q

Proof Of Optimality – A Unique Circumstance
● Stationary distribution assumption
● Overtraining
● Q’ monotonically improves
● Q’ never overestimates Q
● Q’ gets better and better with more training even playing against low skill opponents

Proof Of Convergence
● Formally: limn->∞Q’=Q
● Δn=maxs,a|Qn’(s,a) – Q(s,a)| – biggest mistake after n epochs

Proof Of Convergence
● Formally: limn->∞Q’=Q
● Δn=maxs,a|Qn’(s,a) – Q(s,a)| – biggest mistake after n epochs
● |Q’n+1(s,a) – Q(s,a)|

Proof Of Convergence
● Formally: limn->∞Q’=Q
● Δn=maxs,a|Qn’(s,a) – Q(s,a)| – biggest mistake after n epochs
● |Q’n+1(s,a) – Q(s,a)|≤γΔn

Proof Of Convergence
● Formally: limn->∞Q’=Q
● Δn=maxs,a|Qn’(s,a) – Q(s,a)| – biggest mistake after n epochs
● Δn+1≤γΔn

Proof Of Convergence
● Formally: limn->∞Q’=Q
● Δn=maxs,a|Qn’(s,a) – Q(s,a)| – biggest mistake after n epochs
● Δn+1≤γΔn ● Δn+i≤γiΔn

Proof Of Convergence
● Formally: limn->∞Q’=Q
● Δn=maxs,a|Qn’(s,a) – Q(s,a)| – biggest mistake after n epochs
● Δn+1≤γΔn
● Δn+i≤γiΔn
● Assumptions?

Group Discussion
Break into groups of 2-3 and come up with what you feel is the biggest limitations of Q-learning and why?
Come up with any new limitations? How can a deep learner help them?

Neural Network – Terminology & Structure

Neural Network – How Decisions Are Made

Neural Network – How Decisions Are Made
Output – 6

Neural Network – How Decisions Are Made

Neural Network – How Decisions Are Made
Output – Dog

Is this based in reality?

Is this based in reality?
Mahendran et al 20014, frequency penalization Mordvintsev et al 20016, transform robustness

Neural Network – How Networks Trained?

Let’s Make One!

程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts