Deep Reinforcement Learning
ECS170 W22
Review – Machine Learning
Copyright By PowCoder代写 加微信 powcoder
Types of machine learning?
Review – Machine Learning
Types of machine learning?
● Supervised learning: X, y, train f(X’)->y’ ~ y
● Unsupervised learning: X, no y
● Representation learning: X∈Rn f(X) -> X’∈Rm , m<
action maximizes reward
Review – Reinforcement Learning
● Q(st, at) = rt + γmax(Q(st+1))
Review – Reinforcement Learning
● Q(st, at) = rt + γmax(Q(st+1))
Review – Reinforcement Learning
● Q(st, at) = rt + γmax(Q(st+1))
Review – Reinforcement Learning
● Q(st, at) = rt + γmax(Q(st+1))
Review – Reinforcement Learning
● Q(st, at) = (1-α)Q(st, at)+α(rt + γmax(Q(st+1)))
Review – Reinforcement Learning
● Q(st, at) = (1-α)Q(st, at)+α(rt + γmax(Q(st+1)))
● Algorithm:
○ Select action a
○ Transition s –a→s’
○ Update Q-table
Review – Reinforcement Learning
● Q(st, at) = (1-α)Q(st, at)+α(rt + γmax(Q(st+1))) ● Algorithm:
○ Draw random number, if > ε exploit, otherwise explore
○ Transition s –a→s’
○ Update Q-table
○ Decrement ε
Review – Limitations?
Review – Limitations?
● We will stubble upon a reward
● What to do about big problems?
● Learning is too specific
Moving Forward
● Grounding Reinforcement Learning
● Neural Networks ● Q-Networks
Different Perspectives On Q-Learning
● Learning the rules of the game
Different Perspectives On Q-Learning
● Dynamic programming
Different Perspectives On Q-Learning
● Dynamic programming
○ Break big problem up into subproblems
○ Solve a subproblem optimally
○ Construct an optimal solution based on optimal subproblems
Different Perspectives On Q-Learning
● Dynamic programming
○ Break big problem up into subproblems
○ Solve a subproblem optimally
○ Construct an optimal solution based on optimal subproblems
Different Perspectives On Q-Learning
● Dynamic programming
○ Break big problem up into subproblems
○ Solve a subproblem optimally
○ Construct an optimal solution based on optimal subproblems
Different Perspectives On Q-Learning
● Dynamic programming
○ Break big problem up into subproblems
○ Solve a subproblem optimally
○ Construct an optimal solution based on optimal subproblems
Different Perspectives On Q-Learning
● Analogous to psychology
Different Perspectives On Q-Learning
● Analogous to psychology ○ Pavlovian conditioning
■ Positive feedback to encourage actions
■ Negative feedback to discourage actions
Proof Of Optimality – A Unique Circumstance
● Stationary distribution assumption ● Overtraining
Predictive Inequity in Object Detection Wilson et al 2019
Proof Of Optimality – A Unique Circumstance
● Stationary distribution assumption ● Overtraining
● Q’ monotonically improves
● Q’ never overestimates Q
Proof Of Optimality – A Unique Circumstance
● Stationary distribution assumption
● Overtraining
● Q’ monotonically improves
● Q’ never overestimates Q
● Q’ gets better and better with more training even playing against low skill opponents
Proof Of Optimality – A Unique Circumstance
● Stationary distribution assumption
● Overtraining
● Q’ monotonically improves
● Q’ never overestimates Q
● Q’ gets better and better with more training even playing against low skill opponents
● Formally: limn->∞Q’=Q
Proof Of Convergence
● Formally: limn->∞Q’=Q
● Δn=maxs,a|Qn’(s,a) – Q(s,a)| – biggest mistake after n epochs
Proof Of Convergence
● Formally: limn->∞Q’=Q
● Δn=maxs,a|Qn’(s,a) – Q(s,a)| – biggest mistake after n epochs
● |Q’n+1(s,a) – Q(s,a)|
Proof Of Convergence
● Formally: limn->∞Q’=Q
● Δn=maxs,a|Qn’(s,a) – Q(s,a)| – biggest mistake after n epochs
● |Q’n+1(s,a) – Q(s,a)|=|rt + γmax(Q’n(s’,a’)) – (rt + γmax(Q(s’,a’)))|
Proof Of Convergence
● Formally: limn->∞Q’=Q
● Δn=maxs,a|Qn’(s,a) – Q(s,a)| – biggest mistake after n epochs
● |Q’n+1(s,a) – Q(s,a)|=|γmax(Q’n(s’,a’)) – γmax(Q(s’,a’))|
Proof Of Convergence
● Formally: limn->∞Q’=Q
● Δn=maxs,a|Qn’(s,a) – Q(s,a)| – biggest mistake after n epochs
● |Q’n+1(s,a) – Q(s,a)|=γ|max(Q’n(s’,a’)) – max(Q(s’,a’))|
Proof Of Convergence
● Formally: limn->∞Q’=Q
● Δn=maxs,a|Qn’(s,a) – Q(s,a)| – biggest mistake after n epochs
● |Q’n+1(s,a) – Q(s,a)|≤γmaxa’(|(Q’n(s’)) – (Q(s’))|)
Proof Of Convergence
● Formally: limn->∞Q’=Q
● Δn=maxs,a|Qn’(s,a) – Q(s,a)| – biggest mistake after n epochs
● |Q’n+1(s,a) – Q(s,a)|≤γmaxa’,s’’(|(Q’n(s’’)) – (Q(s’’))|)
Proof Of Convergence
● Formally: limn->∞Q’=Q
● Δn=maxs,a|Qn’(s,a) – Q(s,a)| – biggest mistake after n epochs
● |Q’n+1(s,a) – Q(s,a)|≤γΔn
Proof Of Convergence
● Formally: limn->∞Q’=Q
● Δn=maxs,a|Qn’(s,a) – Q(s,a)| – biggest mistake after n epochs
● Δn+1≤γΔn
Proof Of Convergence
● Formally: limn->∞Q’=Q
● Δn=maxs,a|Qn’(s,a) – Q(s,a)| – biggest mistake after n epochs
● Δn+1≤γΔn ● Δn+i≤γiΔn
Proof Of Convergence
● Formally: limn->∞Q’=Q
● Δn=maxs,a|Qn’(s,a) – Q(s,a)| – biggest mistake after n epochs
● Δn+1≤γΔn
● Δn+i≤γiΔn
● Assumptions?
Group Discussion
Break into groups of 2-3 and come up with what you feel is the biggest limitations of Q-learning and why?
Come up with any new limitations? How can a deep learner help them?
Neural Network – Terminology & Structure
Neural Network – Terminology & Structure
Neural Network – Terminology & Structure
Neural Network – Terminology & Structure
Neural Network – How Decisions Are Made
Neural Network – How Decisions Are Made
Neural Network – How Decisions Are Made
Neural Network – How Decisions Are Made
Neural Network – How Decisions Are Made
Neural Network – How Decisions Are Made
Neural Network – How Decisions Are Made
Neural Network – How Decisions Are Made
Neural Network – How Decisions Are Made
Output – 6
Neural Network – How Decisions Are Made
Neural Network – How Decisions Are Made
Neural Network – How Decisions Are Made
Neural Network – How Decisions Are Made
Output – Dog
Is this based in reality?
Is this based in reality?
Mahendran et al 20014, frequency penalization Mordvintsev et al 20016, transform robustness
Neural Network – How Networks Trained?
Neural Network – How Networks Trained?
Neural Network – How Networks Trained?
Neural Network – How Networks Trained?
Neural Network – How Networks Trained?
Neural Network – How Networks Trained?
Let’s Make One!
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com