Introduction to Machine Learning Linear Classifiers and the
Perceptron Algorithm
Prof. Kutty
Copyright By PowCoder代写 加微信 powcoder
Today’s Agenda
• Announcements
– Quiz 1 due Sunday 10pm
– HW1 out this Wednesday, due in two weeks
• early submission point!
– Alternate midterm time is set 11:20am-1:20pm 2/23; MUST SIGN UP in case of hard conflict
• form will be released after add/drop deadline.
• Deadline for submission will be specified
– Python tutorial on Thursday (recorded)
– On canvas:
• Announcements; Lecture recordings, Discussions and discussion recordings; python tutorial recording; Course schedule (optional reading); logistics FAQs and more…
• Machine Learning
• Supervised Learning
• Linear Classifiers
– Linear classifiers: what and why intuitively
– Linear classifiers: algebraically and geometrically
– Classification Algorithm: Perceptron
» notionoferror
» ‘ideal’case:linearseparability » withoutandwithoffset
– Non-linear separability and linear classifiers
FAQ: How do I earn the quiz component (9.5% of overall grade)?
Total points
Total quiz points
To get 9.5/9.5 i.e., perfect score on quiz component Youneed≥ !! ∗”
Ifyourscoreis< !! ∗" "##
your final exam will be worth 34.5%
Section 1:
From Intuition to Formal Definitions
Supervised Learning - Classification
binary classification problem
• Problem: predict whether a given review is helpful or unhelpful
numeridoread
– Features: star rating (1 – 5 stars), length of review
(max length of 200 words) – Labels: helpful/unhelpful
Datapoint label pair
• Given Labeled Data:
• Predict label for new datapoint
Classification as an ML problem
Feature vector
– Features are statistics or attributes that describe the data.
– Represent data in terms of vectors.
star rating (as a fraction of 5 stars);
length of review (as a fraction of 200 words)
In general
ith data point as would be represented 1129
feature vector Ici E
by convention this a column vector
Classification as an ML problem
1. Feature vector
– Features are statistics or attributes that describe the data. – Represent data in terms of vectors.
≥ 10 people then + else -
In general
classification
Classification as an ML problem
Training dataset
In general
h training data points can y É
are labelled
training data
This is a Supervised Learning problem
Supervised Learning -
Classification
• Labeled Dataset:
– Features: star rating (1 – 5 stars),
length of review (max length of 200 words). – Labels: helpful/unhelpful
contrast with Regression
(fractional) star rating
(fractional) review length
label space
be continuous
Problem: predict whether a (new unlabeled) review is helpful (+) or unhelpful (-)
Supervised Learning - Classification
• Geometrically:
A. Predict +
B. Predict -
Star rating
star rating
datasets 0.2 0.8
unlabeled data point
Review length
https://docs.google.com/document/d/1JWGiT8gAD4-fNbq8j41xh1A-ruXlS_x7mwnOpkmN7Do/
review length
Supervised Learning
Linear Binary Classifier
Supervised Learning - Classification
• Goal: Learn a linear decision boundary • Geometrically:
star rating
review length
Supervised Learning - Classification
• Geometrically:
A. Predict +
B. Predict -
https://docs.google.com/document/d/1JWGiT8gAD4-fNbq8j41xh1A-ruXlS_x7mwnOpkmN7Do/
Why a linear decision boundary?
• Let ! be the set of classifiers under consideration ! – e.g., ! is the set of all hyperplanes (where the ambient space is R )
• Too many choices not always a good thing – May lead to overfitting
• Solution?
– Constrain possible choices i.e., !
• Caution!
– ! cannot be too constrained either – May lead to underfitting
• This problem is called model selection
Example Dataset 1
Example Dataset 2
Section 2: Linear Classifiers
Hyperplanes in R!
Jez sTT y interce b
Definition: A hyperplane in R!
can be specified by parameter vector #̅ ∈ R! and offset . ∈ R
Itisthesetofpoints43 ∈R suchthat#⋅43+9=0 ! ̅PT
#̅⋅43=<"4"+⋯+<#4#=><$ 4$ $%"
e.g., the origin (does/does not) satisfy the hyperplane specified by the line in the figure
Oz at Kz OfM
<3 is orthogonal to the hyperplane
<3 ⋅ ? ̅ = 0
in R! this is the set of lines
Linear Classifier
Goal: Learn a linear decision boundary i.e., constrain possible choices # to hyperplanes
simplifying assumptions:
• constrain ! to be the set of all hyperplanes that go through the origin
• e.g., in R! this is the set of lines that go through the origin
• constrain problem to datasets that are linearly separable
a hyperplane without offset
Linear separability: intuition
linearly separable
-20 -15 -10 -5 0
5 10 15 20
-15 -10 -5 0 5 10 15 20
not linearly separab
+- -5 - +-10
Recall from Linear Algebra
f o r # ̅ , 43 ∈ C # • L2 norm of #̅
# ̅ ' ' = < '" + ⋯ + < '# ̅#̅f
• dot product
#⋅43=<"4"+⋯+<#4#=∑$%"<$ 4$= # '
Case1:0°≤&<90°and270°<&≤360° E I 7 0
A. '̅̅⋅*)=0 B . ' ̅ ⋅ *) < / C. '⋅*)>/
1 1 0 1 1 2 7 O
112112 7 O angleweengand
Case 2: 90° < & < 270° OIco g
Case 3: & ∈ {90°, 270°}
By Geek3 - Own work, CC BY 3.0, meet https://commons.wikimedia.org/w/index.php?curid=10904768
Linear Classification in 2D
The hyperplane )( ⋅ ,̅ = 0 can be described by the vector #" )( is orthogonal to the hyperplane )( ⋅ ,̅ = 0
This suggests a natural way to define a linear classifier:
Linear Classification in 2D
More precisely,
learn '̅ such that the hyperplane '̅ ⋅ 2̅ = 0 separates the positive from the negative examples
Define linear classifier as
h?̅;#̅ =sign(#̅⋅?̅)
The choice of #̅ determines:
1. orientation of the hyperplane
e.g., in R!
2. the predicted class label
Note: In d dimensions this defines a hyperplane that goes through the origin
Goal: Learn a linear decision boundary through the origin
Selecting #̅
• Find #̅ that works well on training data I/ =
• Training error
K/(#̅) I 2 yes
Indicator function:
3 =41, 3istrue 0, otherwise
Notice that
In I fly ish sea E t wouldonlybe achieved when equality lies on the decision
the datapoint we rewrite as shown boundary For this
True label
Predicted label h 2̅(&);4̅
$(&)h 2̅(&);4̅
on the next slide
Training error gives us the fraction of misclassified
datapoints in a givendataset
Linear Classifier
Given:trainingdataI/= ?̅0,J0 / 0%1
2̅(&)∈R( B(&) ∈R
Goal: Learn a linear decision boundary (through the origin)
that minimizes training error
h ?̅;#̅ =sign(#̅⋅?̅)
K/#̅ =M>J(0)≠h?̅0;#̅
= ∑/ J(0) #⋅?̅0 ≤0 / 0%1
simplifying assumptions: linear separability en sea lies on the hyperplane
Linear separability: definition
Given training examples
I/= ?̅0,J0
we say the data are linearly separable if there exists #̅ ∈ R!
such that yo
forP = 1,…,M.Wereferto
-20 -15 -10 -5 0 5+10 15 20 – –5 –
as a separating hyperplane
Section 3: The Algorithm
Perceptron Algorithm
Oninput!1= $̅2,&2 I
Initialize k = 0, (‘ 5 = )’
while there exists a misclassified point
misclassifiedby
for* = 1,…,- if&(2).̅8⋅$̅(2) ≤0
datapoint Ock
.̅ 894 = .̅ 8 + &(2)$̅(2)
Case1: ya tI
Ket Eck zit
Case2: goin I Eckel
corresponds
If the data are linearly separable
then this algorithm finds a separating hyperplane
while there exists a misclassified point fori=1,…,n
C = 0, D = 1 ‘̅(“)=’̅(#)+2̅(“)= 6,6)
‘̅(#) = 0,0 )
i f 5 ( $ ) ≠ 7 8( ( $ ) ; )( ” (“&’ (”
($) ($) ) =) +5 8(
‘̅(*)=’̅(“)−2̅(*)=−3,5)
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com