Machine Learning CS229/STATS229
Instructors: Moses Charikar and Chris Ré
Hope everyone stays safe and healthy in these difficult times!
1. Administrivia
cs229.stanford.edu
(you may need to refresh to see the latest version) 2. Topics Covered in This Course
Who we are
• We have wonderful course coordinators (Swati and Amelie). They are your resource for any admin decision
• We have fantastic TAs! Please be kind and generous with them!
Pre-requisite
• Probability (CS109 or STAT 116)
• distribution,randomvariable,expectation,conditional
probability, variance, density
• Linear algebra (Math 104, Math 113, or CS205) • matrixmultiplication
• eigenvector
• Basic programming (in Python)
• Will be reviewed in Friday sections (recorded)
This is a mathematically intense course. But that’s why it’s exciting and rewarding!
Honor Code
Do’s
• form study groups (with arbitrary number of people); discuss and work on homework problems in groups
• write down the solutions independently
• write down the names of people with whom you’ve discussed the
homework
• read the longer description on the course website
Don’ts
• copy, refer to, or look at any official or unofficial previous years’ solutions in preparing the answers
Honor Code for Submission In Pairs • Students submitting in a pair act as one unit
• mayshareresources(suchasnotes)witheachotherandwritethesolutions together
• Both students should fully understand all the answers in their submission
• Each student in the pair must understand the solution well enough in order to reconstruct it by him/herself
Course Project
• We encourage you to form a group of 1-3 people • same criterion for 1-3 people
• More information and previous course projects can be found on course website
• List of potential topics
• Athletics & Sensing Devices • Audio & Music
• Computer Vision
• Finance & Commerce
• General Machine Learning
• Life Sciences
• Natural Language
• Physical Sciences
• Theory
• Reinforcement Learning • Covid-19
•
Ed:
• •
Other Information on Course Website
cs229.stanford.edu
All announcements and questions (unless you would only reach out to a subset of course staff)
• For logistical questions, please look at course FAQ first Finding study groups friends
• • • • • • •
Ø If you enrolled in the class but do not have access to Ed, it should come within a day. If it has been more than that, send Amelie Byun an email (aebyun@Stanford.edu)
Nooks: Office Hours
Videos on canvas: Under Panopto Videos tab (will be uploaded EOD) Course calendar & Syllabus for deadlines
Canvas calendar for office hours/ section/ lecture dates and links Gradescope: You will be automatically enrolled in course Gradescope
Late days policy
FAQ on the course website
… Course feel …
• This class is almost all “whiteboard” and mathematical
• We try to be self contained, but there are a diverse set of backgrounds. • Please ask questions! When you ask questions, we’re so happy!!
• Some of you will learn from lectures, notes, each other. Find what works for you.
• Please be generous with the staff (and yourself!)
• We’re getting better (we hope) at this virtual experience.
• We really want to help you learn this material, and that’s why I love this class.
1. Administrivia
cs229.stanford.edu
2. Topics Covered in This Course
Definition of Machine Learning
Arthur Samuel (1959): Machine Learning is the field of study that gives the computer the ability to learn without being explicitly programmed.
Photos from Wikipedia
Definition of Machine Learning
Tom Mitchell (1998): a computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.
Experience (data): games played by the program (with itself)
Performance measure: winning rate
Image from Tom Mitchell’s homepage
Taxonomy of Machine Learning (A Simplistic View Based on Tasks)
Supervised Unsupervised
Learning
Learning
Reinforcement Learning
Taxonomy of Machine Learning (A Simplistic View Based on Tasks)
Supervised Unsupervised Learning Learning
Reinforcement Learning
can also be viewed as tools/methods
Supervised Learning
Housing Price Prediction
Ø Given: a dataset that contains 𝑛 samples 𝑥!,𝑦! ,…(𝑥”,𝑦”)
Ø Task: if a residence has 𝑥 square feet, predict its price?
15th sample
(𝑥 !” ,𝑦 !” )
𝑥 = 800 𝑦=?
Housing Price Prediction
Ø Given: a dataset that contains 𝑛 samples 𝑥!,𝑦! ,…(𝑥”,𝑦”)
Ø Task: if a residence has 𝑥 square feet, predict its price?
𝑥 = 800 𝑦=?
Ø Lecture 2&3: fitting linear/quadratic functions to the dataset
More Features
• Suppose we also know the lot size • Task: find a function that maps
(size, lot size) → price features/input label/output
𝑥 ∈ R! 𝑦 ∈ R ØDataset: 𝑥! ,𝑦! ,…,(𝑥” ,𝑦” )
where𝑥($) =(𝑥!$ ,𝑥&$ )
Ø “Supervision” refers to 𝑦(!), … , 𝑦(“)
𝑥”
𝑦
𝑥!
High-dimensional Features
• 𝑥∈R! forlarge𝑑 • E.g.,
𝑥! — living size
𝑥# — lot size
𝑥$ — # floors 𝑦
⋮ — condition
⋮ — zip code
⋮⋮ 𝑥%
— price
𝑥=
ØLec. 6-7: infinite dimensional features (kernels)
Ø Lec. 10-11: select features based on data (deep learning)
Regression vs Classification
• regression: if 𝑦 ∈ R is a continuous variable • e.g.,priceprediction
• classification: the label is a discrete variable
• e.g.,thetaskofpredictingthetypesofresidence
(size, lot size) → house or townhouse?
𝑦 = house or townhouse?
Lecture 3&4: classification
Supervised Learning in Computer Vision
• Image Classification
• 𝑥 = raw pixels of the image, 𝑦 = the main object
ImageNet Large Scale Visual Recognition Challenge. Russakovsky et al.’2015
Supervised Learning in Computer Vision
• Object localization and detection
• 𝑥 = raw pixels of the image, 𝑦 = the bounding boxes
ImageNet Large Scale Visual Recognition Challenge. Russakovsky et al.’2015
Supervised Learning in Natural Language
Processing
• Machine translation
𝑥𝑦
Ø Note: this course only covers the basic and fundamental techniques of supervised learning (which are not enough for solving hard vision or NLP problems.)
Ø CS224N and CS231N, if you are interested in the particular applications.
Unsupervised Learning
Unsupervised Learning
• Dataset contains no labels: 𝑥 ” , … 𝑥 #
• Goal (vaguely-posed): to find interesting structures in the data
supervised unsupervised
Clustering
Clustering
ØLecture 12&13: k-mean clustering, mixture of Gaussians
Clustering Genes
Cluster 1
Cluster 7
Individuals
Identifying Regulatory Mechanisms using Individual Variation Reveals Key Role for Chromatin Modification. [Su-In Lee, Dana Pe’er, Aimee M. Dudley, George M. Church and Daphne Koller. ’06]
Genes
Latent Semantic Analysis (LSA)
documents
Ø Lecture 14: principal component analysis (used in LSA)
Image credit: https://commons.wikimedia.org/wiki/File:Topic_ detection_in_a_document-word_matrix.gif
words
Word Embeddings
Represent words by vectors
Unlabeled dataset
Ø word
Ø relation
&'()*& &'()*&
vector direction
models
Italy France
Rome
Paris Berlin
Germany
Word2vec [Mikolov et al’13] GloVe [Pennington et al’14]
GPT3
• Lecture16
• (maybewholecoursenextyear!!)
• https://hai.stanford.edu/news/how-large-language-models-will-transform-science-
society-and-ai
Software 2.0 is eating Software 1.0
1000x Productivity: Google shrinks language translation code from 500k LoC to 500 lines of dataflow.
https://jack-clark.net/2017/10/09/import-ai-63-google-shrinks-language-translation-code- from-500000-to-500-lines-with-ai-only-25-of-surveyed-people-believe-automationbetter-jobs
AI driven by data— not the model
“Software 2.0”, Andrej Karpathy, https://medium.com/@karpathy/software-2-0-a64152b37c35
… you probably used SW2.0 in the last hour…
Lec 15: basic theory of these new systems “Weak Supervision theory”. Also new course on ML Engineering next year!
“SW2.0 will add 30 trillion dollars to public equity markets” … this is not trading advice.
Reinforcement Learning
• The algorithm can collect data interactively
Try the strategy and Improve the collect feedback strategy based on
the feedback
Data collection
Training
Taxonomy of Machine Learning (A Simplistic View Based on Tasks)
Supervised Unsupervised Learning Learning
Reinforcement Learning
can also be viewed as tools/methods
Other Tools/Topics In This Course
• Deep learning basics
• Introduction to learning theory • Biasvariancetradeoff
• Featureselection
• MLadvice
• Broader aspects of ML • Robustness/fairness
Thank you!