Machine Learning Introduction
Bryan Plummer Slides adapted from Kate Saenko
Saenko 1
8 year-gap
about me
A.S., MCC B.S. & PhD, UIUC At BU 2018- Tenure Track 2020-
• Research: Artificial Intelligence – Deep Learning for Vision
– Vision and language understanding
– Representation learning, Explainable AI, Efficient Neural Networks
2
Today
• What is machine learning? • Supervised learning intro
• Course logistics
Saenko 3
Why Do We Need Machine Learning?
Saenko 4
Machine Learning: Why do we need it?
• Help automate boring, hard tasks
• Hard to program computer
directly to do the task
• Instead, program a computer to
learn from examples
• Often use “big data” examples
Saenko 5
Machine Learning:
used in lots of ways in our everyday life!
Saenko
6
Fraud alert!!
Machine Learning in Real Life: Smart Cars
• Stanford/Google one of the first to develop self-driving cars
• Cars “see” using many sensors: radar, laser, cameras
Saenko 7
Machine Learning in Real Life: Medical and Scientific Data
Saenko 8
Machine Learning in Real Life: Robotics
Saenko 9
Vierreck, Ten Pas, Saenko, Platt. Learning a Visuomotor Controller for Real World Robotic Grasping… CORL17
Machine Learning in Real Life:
Image Classification handwritten digits
face tagging on social media
Saenko 10
Machine Learning in Real Life: Computational Finance
Saenko Figure from J.Co11rso
Machine Learning from Big Data Artificial Neural Network
Support Vector Machine
Saenko 12
Introduction: What is Machine Learning?
Saenko 13
Machine Learning • Branch of Artificial Intelligence
• “creating machine algorithms that can learn from data”
• Closely related to
– Pattern recognition
– Data Mining – Big Data
– Deep learning
Saenko 14
Types of learning
• Supervised • Unsupervised
• Reinforcement
Saenko 15
Supervised Learning
• Given a training set consisting of inputs and outputs, learn to map novel inputs to outputs
• The novel inputs are called a test set
• Outputscanbe
– Categorical (classification) – Continuous (regression)
Saenko 16
Example of Supervised Learning
recognize coins
• Given training set consisting of coin denomination (penny, nickel, dime, quarter), mass and size
• Learn to predict denomination
• What is input? Output?
Saenko 17
Unsupervised Learning
• Given training set consisting of coin denomination (penny, nickel, dime, quarter) mass and size
• Learn… something?
Saenko 18
Reinforcement Learning
learn to pick up coins
• Given only input, but can take action
• Predict output (action), get a reward for it
Saenko 19
Supervised Learning
Cost functions
Example: house price prediction
Housing Prices (Portland, OR)
Price (in 1000s of dollars)
500 400 300 200 100
0
0 500
1000
Size (feet2)
2500
3000 Andrew Ng
1500
2000
Supervised Learning
What should the learner be??
Want:
input x
?
output y
y
500 400 300 200 100
0
0
1000
2000
3000
x
Hypothesis h
h : a function parametrized by θ
Want:
input x
hθ
output y
y
500 400 300 200 100
0
0
1000
2000
3000
x
Given:
Training Set {xi, yi}
But what if y ≠ yi ??
How to learn θ ?
Want:
input xi
hθ
output y
y
500 400 300 200 100
0
0
1000
2000
3000
x
Given:
Cost function
Training Set {xi, yi}
Cost function Cost(y, yi)
learning == minimizing cost
Want:
input xi
hθ
output y
y
500 400 300 200 100
0
0
1000
2000
3000
x
Given:
Supervised Learning
Training Set {xi, yi}
Cost function Cost(y, yi)
learning == minimizing cost
min Cost( hθ(xi), yi ) θ
input xi hθ* output y
Learn θ*: Want:
Training set:
Size in feet2 (x) 2104 1416 1534
852
Price ($) in 1000’s (y) 460
232
315
178
Training set
Notation:
……
m = Number of training examples
x(i) = “input” variable / features
y(i) = “output” variable / “target” variable
Linear hypothesis:
‘s: Parameters
500
400
300
200
100
0
0 1000 2000 3000
What should h be?
min Cost(hθ, {xi, yi}) θ
What’s a good cost function for this problem?
500
400
300
200
100
0
How about “Sum of squared differences”
Cost Function: Goal:
Hypothesis:
‘s: Parameters
0 1000
2000 3000
Hypothesis:
‘s: Parameters
Cost Function:
500
400
300
200
100
0
0 1000 2000 3000
2-dimensional θ
Plotting cost for 2-dimensional θ (for fixed , this is a function of x) (function of the parameters )
Plotting cost for 2-dimensional θ (for fixed , this is a function of x) (function of the parameters )
Note, squared loss cost is convex in parameters
SSD cost function is convex Minima?
Non-convex cost function Minima?
J(0,1)
0
1
Later
• How to minimize the SSD cost function – Direct solution
– Indirect solution
Saenko 35
Introduction: Course Overview
Saenko 36
Class website
• Main class website https://piazza.com/bu/fall2020/cs542/home
37
Textbook • Required textbook
Bishop, C. M. Pattern Recognition and Machine Learning. Springer. 2007
• Other suggested textbooks
Duda, R.O., Hart, P.E., and Stork, D.G. Pattern Classification. Wiley-Interscience. 2nd Edition. 2001.
Marsland, S. Machine Learning: An Algorithmic Perspective. CRC Press. 2009.Theodoridis, S. and Koutroumbas, K. Pattern Recognition. Edition 4. Academic Press, 2008.
Russell, S. and Norvig, N. Artificial Intelligence: A Modern Approach. Prentice Hall Series in Artificial Intelligence. 2003.
Bishop, C. M. Neural Networks for Pattern Recognition. Oxford University Press. 1995. Hastie, T., Tibshirani, R. and Friedman, J. The Elements of StatisticalLearning. Springer. 2001. Koller, D. and Friedman, N. Probabilistic Graphical Models. MIT Press. 2009.
Saenko 38
Problem Sets
• Weekly problems sets – Python coding problems
– Written math problems
– Important to prepare you for the exams!
• Self-graded
– you will submit code, answers, and your own grade – we will randomly check to verify
Saenko 39
Class Challenge Individual end-of-term project
• Based on a real-world problem, hosted as a Kaggle-like challenge for our class
• Goal is to design a machine learning approach and apply it to the problem
• Deliverables: github
Saenko 40
Lecture Class Rotations
CAS 522 – 32 student capacity
• As of yesterday, 63 students have indicated they might attend in-person (or have not responded to the poll)
• Check Piazza for rotations before coming to class as they may shift during the semester
• Wipe down chairs before sitting down
• Wear a mask and be prepared to show your badge
41
Discussion/Lab Rotations
• Check Piazza for rotations before coming to class as they may shift during the semester
• As of yesterday, A3 and A4 require rotations, A2 and A5 don’t need rotations (but may change, email me if you would like to change sections)
• Only attend the discussion section that you are registered for (especially if you want to attend in-person)
• Wipe down chairs before sitting down
• Wear a mask and be prepared to show your badge
42
Next Class
review of expected mathematical skills for the course
• Reference reading on matrix calculus and linear algebra can be found here
• Matrix derivatives cheat sheet
• also see http://www.matrixcalculus.org/
Preliminaries
43
Questions
Saenko 44