程序代写代做代考 algorithm chain deep learning Bayesian decision tree AI graph CMPUT 366 F20: More on RNN & Learning Outcomes

CMPUT 366 F20: More on RNN & Learning Outcomes
Vadim Bulitko & James Wright
December 1, 2020
CMPUT 366 F20: More on RNN & Learning Outcomes
1

Lecture Outline
More on RNNs
PM 7.1-7.2 GBC 10
Final exam details Learning outcomes
CMPUT 366 F20: More on RNN & Learning Outcomes 2

RNN: Overview
CMPUT 366 F20: More on RNN & Learning Outcomes 3

RNN: Details
Forward pass:
Negative loss-likelihood loss:
CMPUT 366 F20: More on RNN & Learning Outcomes 4

Softmax (GBC 6.2.2.3)
A continuous and differentiable version of argmax:
x = (x,…,xi,…,xn) exi
softmax(x)i = 􏰈nj= exj
softmax(x) = (softmax(x), . . . , softmax(x)n)
Each component of softmax output vector is in [à, ] and they sum up to , defining a probability distribution of n categories
a soft version of argmax which gives a one-hot encoding would be better to call it “soft arg max”
CMPUT 366 F20: More on RNN & Learning Outcomes 5

Probabilistic Predictions
Rather than predicting exactly what a target value will be, many common algorithms predict a probability distribution over possible values
especially for classification tasks
One-hot encoding is the common data representation for this scheme: target features of training examples have a single  for the true value target values predicted by f are probabilities that sum to 
CMPUT 366 F20: More on RNN & Learning Outcomes 6

Likelihood
The likelihood for a dataset E of examples and hypothesis f is the probability of independently observing the examples according to the probabilities assigned by the hypothesis:
P(E|f) = 􏰇 P(e|f) e∈E
This has a clear interpretation in a Bayesian sense
Numerical stability issues: product of probabilities shrinks exponentially floating point underflows almost immediately
CMPUT 366 F20: More on RNN & Learning Outcomes 7

Log-Likelihood & Log loss
The log-likelihood for a dataset E of examples and hypothesis f is the log-probability of independently observing the examples according to the probabilities assigned by the hypothesis:
log P(E|f) = log 􏰇 P(e|f) = 􏰆 log P(e|f) e∈E e∈E
The underflow issue is remedied:
sum of logs shrinks more slowly than product of probabilities
Log is monotonic so maximizing log-likelihood imposes the same preference over hypotheses as maximizing likelihood:
P(E|f)>P(E|f2) ⇐⇒ logP(E|f)>logP(E|f2)
Log loss is negative of log likelihood divided by the number of examples:
−  􏰆logP(e|f) |E| e∈E
CMPUT 366 F20: More on RNN & Learning Outcomes
8

RNN: Encoder – Decoder
CMPUT 366 F20: More on RNN & Learning Outcomes 9

Final Exam Details
9am – 11am (Edmonton time) on Thursday, December 17, 2020 On e-Class
Questions from a pool
As a courtesy, you can take the final within this 24-hour window:
9am Dec 17 – 9am Dec 18, Edmonton time
But our support will happen only 9 – 11am Edmonton time
Open book
Solo effort, no collaboration of any kind Covers all material
Practice questions: midterm and assignments
Learning outcomes listed below
CMPUT 366 F20: More on RNN & Learning Outcomes 10

Introduction to AI
define the major representational dimensions
classify a problem by representational dimensions
map problem characteristics to desired behaviour specifications
CMPUT 366 F20: More on RNN & Learning Outcomes 11

Search
define a directed graph
represent a problem as a state-space graph explain how a generic searching algorithm works
CMPUT 366 F20: More on RNN & Learning Outcomes 12

Search
demonstrate how depth-first search will work on a graph
demonstrate how breadth-first search will work on a graph
demonstrate how iterative deepening DFS will work
demonstrate how least cost first search will work on a graph
predict the space and time requirements for depth-first and breadth-first searches
CMPUT 366 F20: More on RNN & Learning Outcomes 13

Search
devise a useful heuristic function for a problem
demonstrate how A* search will work on a graph
predict the space and time requirements for A* search
demonstrate how iterative-deepening works for a particular problem
CMPUT 366 F20: More on RNN & Learning Outcomes 14

Agent-centered Search
discuss differences between off-line and on-line search discuss agent-centered search
define hill climbing
explain why hill climbing is not complete
define real-time heuristic search
define LRTA*
explain why LRTA* is complete
come up with a search graph which shows LRTA*’s scrubbing discuss ways to remedy scrubbing
CMPUT 366 F20: More on RNN & Learning Outcomes 15

Reinforcement Learning
define a Markov decision process and a policy
define and give expressions for the state-value function and the action-value function
state the Bellman optimality equations
define returns and give expressions for episodic and discounted continuing returns
represent a problem as a Markov decision process discuss sources of a reward function
CMPUT 366 F20: More on RNN & Learning Outcomes 16

Reinforcement Learning
trace an execution of iterative policy evaluation
state the policy improvement theorem and use it to improve a given policy trace an execution of the value iteration algorithm
CMPUT 366 F20: More on RNN & Learning Outcomes 17

Reinforcement Learning
explain how Monte Carlo estimation for state values works explain the difference between prediction and control
CMPUT 366 F20: More on RNN & Learning Outcomes 18

Reinforcement Learning
define on-policy and off-policy learning
explain what exploring starts are and what purpose they serve define an ε-soft and ε-greedy policies
explain why they are useful
CMPUT 366 F20: More on RNN & Learning Outcomes 19

Reinforcement Learning
trace an execution of the TD(0) algorithm trace an execution of the Q-learning algorithm trace an execution of the Sarsa algorithm
CMPUT 366 F20: More on RNN & Learning Outcomes 20

Uncertainty
define a random variable
describe the semantics of probability apply the chain rule
apply Bayes’ theorem
CMPUT 366 F20: More on RNN & Learning Outcomes 21

Uncertainty
define a belief network
build a correct belief network for a given joint distribution
compute marginal and conditional probabilities from a joint distribution
CMPUT 366 F20: More on RNN & Learning Outcomes 22

Uncertainty
justify why a belief network is a correct encoding of a joint distribution identify the factorization of a joint distribution encoded by a belief network answer queries about independence based on a belief network
CMPUT 366 F20: More on RNN & Learning Outcomes 23

Uncertainty
define the factor objects and factor operations used in variable elimination explain the origins of the efficiency improvements of variable elimination define the high-level steps of variable elimination
trace an execution of variable elimination
CMPUT 366 F20: More on RNN & Learning Outcomes 24

Supervised Learning
define supervised learning task, classification, regression, loss function
represent categorical target values in multiple ways
identify an appropriate loss function for different tasks
explain how a separate test set estimates generalization performance
define à/ error, absolute error, (log-)likelihood loss, mean squared error, worst-case error
CMPUT 366 F20: More on RNN & Learning Outcomes 25

Supervised Learning
construct a decision tree using given features, splitting conditions, and stopping conditions
compute log-loss of a given decision tree define overfitting
discuss several techniques (pseudo-counts, regularization and cross validation) to reduce overfitting
CMPUT 366 F20: More on RNN & Learning Outcomes 26

Supervised Learning
define linear regression
define linear classification and logistic regression
discuss limits of linear classification and linear separability
explain stochastic gradient descent algorithm, including all basic concepts (e.g., partial derivatives, gradient, batch, etc.)
CMPUT 366 F20: More on RNN & Learning Outcomes 27

Deep Learning
define a rectified linear unit and give an expression for its value
describe how the units in a feedforward network are connected
explain at a high level what the Universal Approximation Theorem means
explain at a high level how feedforward neural networks are trained
discuss how non-linearity is used by multi-layer neural networks to learn non-linearly separable hypotheses
CMPUT 366 F20: More on RNN & Learning Outcomes 28

Deep Learning
define sparse interactions and parameter sharing
define the convolution operation
define the pooling operation
explain why convolutional networks are more efficient to train
describe how the units/layers in a convolutional neural network are connected explain the assumptions of recurrent neural networks
explain teacher forcing explain LSTM at a high level
CMPUT 366 F20: More on RNN & Learning Outcomes 29