CSC 311: Introduction to Machine Learning
Lecture 1 – Introduction and Nearest Neighbors
. of Toronto, Fall 2021
Intro ML (UofT) CSC311-Lec1 1 / 54
Copyright By PowCoder代写 加微信 powcoder
This course
Broad introduction to machine learning
I Algorithms and principles for supervised learning
I nearest neighbors, decision trees, ensembles, linear regression, logistic regression, SVMs
I Unsupervised learning: PCA, K-means, mixture models I Basics of reinforcement learning
Coursework is aimed at advanced undergrads. We will use multivariate calculus, probability, and linear algebra.
Intro ML (UofT) CSC311-Lec1
Course Information
Course Website: https://www.cs.toronto.edu/~rgrosse/courses/csc311_f21/ Main source of information is the course webpage; check regularly!
Announcements, grades, & links: Quercus. Did you receive the announcement?
Discussions: Piazza.
Sign up: piazza.com/utoronto.ca/fall2021/csc311
Your grade does not depend on your participation on Piazza. It’s just a good way for asking questions, discussing with your instructor, TAs and your peers. We will only allow questions that are related to the course materials/assignments/exams.
Intro ML (UofT) CSC311-Lec1 3 / 54
Office hours: Gather Town (see Quercus for link) , Monday 10am–noon
. Krishnan, Monday 6pm-7pm , Monday 8–9pm
You only need to pay attention to the course website for content and Quercus for links.
Intro ML (UofT) CSC311-Lec1 4 / 54
Course Information
For the first two weeks of class, lectures will be delivered synchronously via Zoom, and recorded for asynchronous viewing by enrolled students.
Current plan is to resume in-person lectures starting Sept. 23.
I This will depend on how the COVID-19 situation evolves, as well as on guidance from the University and from public health authorities. I One section will be held virtually. You won’t be at a disadvantage if
you choose to attend the virtual section.
I If you attend in-person, you must attend your assigned
section. To request to switch sections, please see the course website for instructions.
Tutorials and office hours will be virtual throughout the term.
All information about attending virtual lectures, tutorials, and office hours will be sent to enrolled students through Quercus.
Last year’s video lectures are also available through Quercus.
Intro ML (UofT) CSC311-Lec1
Course Information
You may download recorded lectures for your own academic use, but you should not copy, share, or use them for any other purpose.
During lecture, please keep yourself on mute unless called upon.
In case of illness, you should fill out the absence declaration form on ACORN and notify the instructors to request special consideration.
For accessibility services: If you require additional academic accommodations, please contact UofT Accessibility Services as soon as possible, studentlife.utoronto.ca/as.
Intro ML (UofT) CSC311-Lec1
In-Person Attendance?
Still in Step 3 of Ontario’s reopening plan, but educational institutions were granted an exception to capacity limits on in-person gatherings. https://www.ontario.ca/page/ reopening-ontario
This decision was made from a public health perspective. This is not the same thing as safety to you as an individual.
Consider your individual risk tolerance (especially if you live with someone vulnerable!).
Continue to follow University and public health guidelines.
Intro ML (UofT) CSC311-Lec1 7 / 54
In-Person Attendance?
The MicroCovid Calculator (https://www.microcovid.org/) is helpful for managing your own level of risk.
Intro ML (UofT) CSC311-Lec1 8 / 54
Course Information
Recommended readings will be given for each lecture. But the following will be useful throughout the course:
Hastie, Tibshirani, and Friedman: “The Elements of Statistical Learning”
: “Pattern Recognition and Machine Learning”, 2006. : “Machine Learning: a Probabilistic Perspective”, 2012.
: “Information Theory, Inference, and Learning Algorithms”, 2003.
-Shwartz & -David: “Understanding Machine Learning: From Theory to Algorithms”, 2014.
: ”Bayesian Reasoning and Machine Learning”, 2012.
. Sutton and . Barto: ”Reinforcement Learning: An Introduction” (2nd ed.), 2018.
There are lots of freely available, high-quality ML resources.
Intro ML (UofT) CSC311-Lec1 9 / 54
Requirements and Marking
(35%) 3 homework assignments
I Combination of pen & paper derivations and programming exercises I Weighted equally
(5%) minor assignments (e.g. embedded ethics) I Good faith effort = full credit
(20%) project
I Groups of 2–3
I Implement and evaluate some ML algorithms, try out a new idea,
write a report
(40%) midterm test and final exam
I Both online
I See website for times and dates
I Weighting: higher of (15% midterm, 25% final) or (10% midterm,
30% final)
Intro ML (UofT) CSC311-Lec1 10 / 54
More on Assignments
Collaboration on the assignments is not allowed. Each student is responsible for his/her own work. Discussion of assignments should be limited to clarification of the handout itself, and should not involve any sharing of pseudocode or code or simulation results. Violation of this policy is grounds for a semester grade of F, in accordance with university regulations.
The schedule of assignments will be posted on the course webpage.
Assignments should be handed in by deadline; a late penalty of 10% per day will be assessed thereafter (up to 3 days, then submission is blocked).
Extensions will be granted only in special situations, and you will need to complete an absence declaration form and notify us to request special consideration, or otherwise have a written request approved by the course instructors at least one week before the due date.
Intro ML (UofT) CSC311-Lec1 11 / 54
Related Courses
More advanced ML courses such as CSC413 (Neural Networks and Deep Learning) and CSC412 (Probabilistic Learning and Reasoning) both build upon the material in this course.
If you’ve already taken an applied statistics course, there will be some overlap.
This is the third academic year this course is listed only as an undergrad course. Previously it was CSC411, with a bit more content and heavier workload. We borrow liberally from the previous editions.
Intro ML (UofT) CSC311-Lec1 12 / 54
What is learning?
”The activity or process of gaining knowledge or skill by studying, practicing, being taught, or experiencing something.”
dictionary
“A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.”
Intro ML (UofT) CSC311-Lec1 13 / 54
What is machine learning?
For many problems, it’s difficult to program the correct behavior by hand
I recognizing people and objects I understanding human speech
Machine learning approach: program an algorithm to automatically learn from data, or from experience Why might you want to use a learning algorithm?
I hard to code up a solution by hand (e.g. vision, speech)
I system needs to adapt to a changing environment (e.g. spam
detection)
I want the system to perform better than the human programmers I privacy/fairness (e.g. ranking search results)
Intro ML (UofT) CSC311-Lec1
Relations to statistics
It’s similar to statistics…
I Both fields try to uncover patterns in data
I Both fields draw heavily on calculus, probability, and linear algebra,
and share many of the same core algorithms it’s not exactly statistics…
I Stats is more concerned with helping scientists and policymakers draw good conclusions; ML is more concerned with building autonomous agents
I Stats puts more emphasis on interpretability and mathematical rigor; ML puts more emphasis on predictive performance, scalability, and autonomy
…but machine learning and statistics rely on similar mathematics.
Intro ML (UofT) CSC311-Lec1 15 / 54
What is machine learning?
Types of machine learning
I Supervised learning: have labeled examples of the correct
I Reinforcement learning: learning system (agent) interacts with
the world and learns to maximize a scalar reward signal
I Unsupervised learning: no labeled examples – instead, looking
for “interesting” patterns in the data
Intro ML (UofT) CSC311-Lec1 16 / 54
History of machine learning
1957 — Perceptron algorithm (implemented as a circuit!)
1959 — wrote a learning-based checkers program that could defeat him
1969 — Minsky and Papert’s book Perceptrons (limitations of linear models)
1980s — Some foundational ideas
I Connectionist psychologists explored neural models of cognition I 1984 — formalized the problem of learning as PAC
I 1988 — Backpropagation (re-)discovered by and
colleagues
I 1988 — ’s book Probabilistic Reasoning in Intelligent
Systems introduced Bayesian networks
Intro ML (UofT) CSC311-Lec1 17 / 54
History of machine learning
1990s — the “AI Winter”, a time of pessimism and low funding But looking back, the ’90s were also sort of a golden age for ML research
I Markov chain Monte Carlo
I variational inference
I kernels and support vector machines I boosting
I convolutional networks
I reinforcement learning
2000s — applied AI fields (vision, NLP, etc.) adopted ML 2010s — deep learning
I 2010–2012 — neural nets smashed previous records in speech-to-text and object recognition
I increasing adoption by the tech industry
I 2016 — AlphaGo defeated the human Go champion
I 2018-now — generating photorealistic images and videos I 2020 — GPT3 language model
now — increasing attention to ethical and societal implications
Intro ML (UofT) CSC311-Lec1
Computer vision: Object detection, semantic segmentation, pose estimation, and almost every other task is done with ML.
Instance segmentation –
Intro ML (UofT)
CSC311-Lec1 19 / 54
Speech: Speech to text, personal assistants, speaker identification…
Intro ML (UofT) CSC311-Lec1 20 / 54
NLP: Machine translation, sentiment analysis, topic modeling, spam filtering.
Intro ML (UofT) CSC311-Lec1 21 / 54
Playing Games
Intro ML (UofT) CSC311-Lec1 22 / 54
E-commerce & Recommender Systems : Amazon, netflix, …
Intro ML (UofT) CSC311-Lec1 23 / 54
Why this class?
Why not jump straight to csc412/413, and learn neural nets first?
The principles you learn in this course will be essential to understand and apply neural nets.
The techniques in this course are still the first things to try for a new ML problem.
I E.g., try logistic regression before building a deep neural net! There’s a whole world of probabilistic graphical models.
Intro ML (UofT) CSC311-Lec1 24 / 54
Why this class?
2017 Kaggle survey of data science and ML practitioners: what data science methods do you use at work?
Intro ML (UofT) CSC311-Lec1 25 / 54
Implementing machine learning systems
You will often need to derive an algorithm (with pencil and paper), and then translate the math into code.
Array processing (NumPy)
I vectorize computations (express them in terms of matrix/vector operations) to exploit hardware efficiency
I This also makes your code cleaner and more readable!
Intro ML (UofT) CSC311-Lec1 26 / 54
Implementing machine learning systems
Neural net frameworks: PyTorch, TensorFlow, JAX, etc. I automatic differentiation
I compiling computation graphs
I libraries of algorithms and network primitives I support for graphics processing units (GPUs)
Why take this class if these frameworks do so much for you? I So you know what to do if something goes wrong!
I Debugging learning algorithms requires sophisticated detective
work, which requires understanding what goes on beneath the hood. I That’s why we derive things by hand in this class!
Intro ML (UofT) CSC311-Lec1 27 / 54
Preliminaries and Nearest Neighbor Methods
Intro ML (UofT) CSC311-Lec1 28 / 54
Introduction
Today (and for much of this course) we focus on supervised learning. This means we are given a training set consisting of inputs and
corresponding labels, e.g.
object recognition image captioning document classification speech-to-text
Inputs image image
text audio waveform
Labels object category caption document category text
Intro ML (UofT)
CSC311-Lec1
Input Vectors
What an image looks like to the computer:
[Image credit: ]
Intro ML (UofT) CSC311-Lec1 30 / 54
Input Vectors
Machine learning algorithms need to handle lots of types of data: images, text, audio waveforms, credit card transactions, etc.
Common strategy: represent the input as an input vector in Rd
I Representation = mapping to another space that’s easy to manipulate
I Vectors are a great representation since we can do linear algebra!
Intro ML (UofT) CSC311-Lec1 31 / 54
Input Vectors
Can use raw pixels:
Can do much better if you compute a vector of meaningful features.
Intro ML (UofT) CSC311-Lec1 32 / 54
Input Vectors
Mathematically, our training set consists of a collection of pairs of an input vector x ∈ Rd and its corresponding target, or label, t
I Regression: t is a real number (e.g. stock price)
I Classification: t is an element of a discrete set {1, . . . , C }
I These days, t is often a highly structured object (e.g. image)
Denote the training set {(x(1), t(1)), . . . , (x(N), t(N))}
I Note: these superscripts have nothing to do with exponentiation!
Intro ML (UofT) CSC311-Lec1 33 / 54
Nearest Neighbors
Suppose we’re given a novel input vector x we’d like to classify.
The idea: find the nearest input vector to x in the training set and copy
its label.
Can formalize “nearest” in terms of Euclidean distance
||x(a) − x(b)||2 = utX(x(a) − x(b))2
Algorithm:
1. Find example (x∗,t∗) (from the stored training set) closest to
x. That is:
x∗ = argmin distance(x(i), x) x(i)∈train. set
2. Output y = t∗
Note: we don’t need to compute the square root. Why?
Intro ML (UofT) CSC311-Lec1
Nearest Neighbors: Decision Boundaries
We can visualize the behavior in the classification setting using a Voronoi diagram.
Intro ML (UofT) CSC311-Lec1 35 / 54
Nearest Neighbors: Decision Boundaries
Decision boundary: the boundary between regions of input space assigned to different categories.
Intro ML (UofT) CSC311-Lec1 36 / 54
Nearest Neighbors: Decision Boundaries
Example: 2D decision boundary
Intro ML (UofT) CSC311-Lec1 37 / 54
Nearest Neighbors
[Pic by ]
Nearest neighbors sensitive to noise or mis-labeled data (“class noise”). Solution?
Intro ML (UofT) CSC311-Lec1 38 / 54
k-Nearest Neighbors
[Pic by ]
Nearest neighbors sensitive to noise or mis-labeled data (“class noise”). Solution?
Smooth by having k nearest neighbors vote
Intro ML (UofT) CSC311-Lec1
k-Nearest Neighbors
[Pic by ]
Nearest neighbors sensitive to noise or mis-labeled data (“class noise”). Solution?
Smooth by having k nearest neighbors vote Algorithm (kNN):
1. Find k examples {x(i),t(i)} closest to the test instance x 2. Classification output is majority class
y=argmaxXI(t(z) =t(i))
I{statement} is the identity function and is equal to one whenever the statement is true. We could also write this as δ(t(z), t(i)), with δ(a, b) = 1 if a = b, 0 otherwise.
Intro ML (UofT) CSC311-Lec1 38 / 54
K-Nearest neighbors
[Image credit: ”The Elements of Statistical Learning”]
Intro ML (UofT) CSC311-Lec1 39 / 54
K-Nearest neighbors
[Image credit: ”The Elements of Statistical Learning”]
Intro ML (UofT) CSC311-Lec1 40 / 54
k-Nearest Neighbors
Tradeoffs in choosing k? Small k
I Good at capturing fine-grained patterns
I May overfit, i.e. be sensitive to random idiosyncrasies in the
training data Large k
I Makes stable predictions by averaging over lots of examples I May underfit, i.e. fail to capture important regularities
Balancing k
I Optimal choice of k depends on number of data points n.
I Nice theoretical properties if k → ∞ and k → 0. √n
I Rule of thumb: choose k < n.
I We can choose k using validation set (next slides).
Intro ML (UofT) CSC311-Lec1
K-Nearest neighbors
We would like our algorithm to generalize to data it hasn’t seen before. We can measure the generalization error (error rate on new examples)
using a test set.
[Image credit: ”The Elements of Statistical Learning”]
Intro ML (UofT) CSC311-Lec1 42 / 54
Validation and Test Sets
k is an example of a hyperparameter, something we can’t fit as part of the learning algorithm itself
We can tune hyperparameters using a validation set:
The test set is used only at the very end, to measure the generalization performance of the final configuration.
Intro ML (UofT) CSC311-Lec1 43 / 54
Pitfalls: The Curse of Dimensionality
Low-dimensional visualizations are misleading! In high dimensions, “most” points are far apart.
If we want the nearest neighbor of any query x to be closer than ε, how many points do we need to guarantee it?
The volume of a single ball of radius ε around each point is O(εd) The total volume of [0, 1]d is 1.
Therefore O (1ε )d points are needed to cover the volume.
[Image credit: ”The Elements of Statistical Learning”]
Intro ML (UofT) CSC311-Lec1
Pitfalls: The Curse of Dimensionality
In high dimensions, “most” points are approximately the same distance.
We can show this by applying the rules of expectation and covariance of random variables in surprising ways. (“optional” homework question coming up...)
Picture to keep in mind:
Intro ML (UofT) CSC311-Lec1 45 / 54
Pitfalls: The Curse of Dimensionality
Saving grace: some datasets (e.g. images) may have low intrinsic dimension, i.e. lie on or near a low-dimensional manifold.
Image credit:
https://scikit- learn.org/stable/modules/generated/sklearn.datasets.make_swiss_roll.html
The neighborhood structure (and hence the Curse of Dimensionality) depends on the intrinsic dimension.
The space of megapixel images is 3 million-dimensional. The true number of degrees of freedom is much smaller.
Intro ML (UofT) CSC311-Lec1 46 / 54
Pitfalls: Normalization
Nearest neighbors can be sensitive to the ranges of different features. Often, the units are arbitrary:
Simple fix: normalize each dimension to be zero mean and unit variance. I.e., compute the mean μj and standard deviation σj, and take
x ̃ j = x j − μ j σj
Caution: depending on the problem, the scale might be important!
Intro ML (UofT) CSC311-Lec1 47 / 54
Pitfalls: Computational Cost
Number of computations at training time: 0
Number of computations at test time, per query (na ̈ıve algorithm)
I Calculuate D-dimensional Euclidean distances with N data points: O(ND)
I Sort the distances: O(N log N )
This must be done for each query, which is very expensive by the
standards of a learning algorithm!
Need to store the entire dataset in memory!
Tons of work has gone into algorithms and data structures for efficient nearest neighbors with high dimensions and/or large datasets.
Intro ML (UofT) CSC311-Lec1 48 / 54
Example: Digit Classification
Decent performance when lots of data
[SlideIcnrterodiMt:LD(.UCoflTau)s] CSC311-Lec1 49/54
Example: Digit Classification
KNN can perform a lot better with a good similarity measure.
Example: shape contexts for object recognition. In order to achieve invariance to image transformations, they tried to warp one image to match the other image.
I Distance measure: average distance between corresponding points on warped images
Achieved 0.63% error on MNIST, compared with 3% for Euclidean KNN. Competitive with conv nets at the time, but required careful engineering.
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com