CS代考 Introduction to Machine Learning Linear Classifiers and the

Introduction to Machine Learning Linear Classifiers and the
Perceptron Algorithm
Prof. Kutty

Copyright By PowCoder代写 加微信 powcoder

Today’s Agenda
• Announcements
– Quiz 1 due Sunday 10pm
– HW1 out this Wednesday, due in two weeks
• early submission point!
– Alternate midterm time is set 11:20am-1:20pm 2/23; MUST SIGN UP in case of hard conflict
• form will be released after add/drop deadline.
• Deadline for submission will be specified
– Python tutorial on Thursday (recorded)
– On canvas:
• Announcements; Lecture recordings, Discussions and discussion recordings; python tutorial recording; Course schedule (optional reading); logistics FAQs and more…
• Machine Learning
• Supervised Learning
• Linear Classifiers
– Linear classifiers: what and why intuitively
– Linear classifiers: algebraically and geometrically
– Classification Algorithm: Perceptron
» notionoferror
» ‘ideal’case:linearseparability » withoutandwithoffset
– Non-linear separability and linear classifiers

FAQ: How do I earn the quiz component (9.5% of overall grade)?
Total points
Total quiz points
To get 9.5/9.5 i.e., perfect score on quiz component Youneed≥ !! ∗”
Ifyourscoreis< !! ∗" "## your final exam will be worth 34.5% Section 1: From Intuition to Formal Definitions Supervised Learning - Classification binary classification problem • Problem: predict whether a given review is helpful or unhelpful numeridoread – Features: star rating (1 – 5 stars), length of review (max length of 200 words) – Labels: helpful/unhelpful Datapoint label pair • Given Labeled Data: • Predict label for new datapoint Classification as an ML problem Feature vector – Features are statistics or attributes that describe the data. – Represent data in terms of vectors. star rating (as a fraction of 5 stars); length of review (as a fraction of 200 words) In general ith data point as would be represented 1129 feature vector Ici E by convention this a column vector Classification as an ML problem 1. Feature vector – Features are statistics or attributes that describe the data. – Represent data in terms of vectors. ≥ 10 people then + else - In general classification Classification as an ML problem Training dataset In general h training data points can y É are labelled training data This is a Supervised Learning problem Supervised Learning - Classification • Labeled Dataset: – Features: star rating (1 – 5 stars), length of review (max length of 200 words). – Labels: helpful/unhelpful contrast with Regression (fractional) star rating (fractional) review length label space be continuous Problem: predict whether a (new unlabeled) review is helpful (+) or unhelpful (-) Supervised Learning - Classification • Geometrically: A. Predict + B. Predict - Star rating star rating datasets 0.2 0.8 unlabeled data point Review length https://docs.google.com/document/d/1JWGiT8gAD4-fNbq8j41xh1A-ruXlS_x7mwnOpkmN7Do/ review length Supervised Learning Linear Binary Classifier Supervised Learning - Classification • Goal: Learn a linear decision boundary • Geometrically: star rating review length Supervised Learning - Classification • Geometrically: A. Predict + B. Predict - https://docs.google.com/document/d/1JWGiT8gAD4-fNbq8j41xh1A-ruXlS_x7mwnOpkmN7Do/ Why a linear decision boundary? • Let ! be the set of classifiers under consideration ! – e.g., ! is the set of all hyperplanes (where the ambient space is R ) • Too many choices not always a good thing – May lead to overfitting • Solution? – Constrain possible choices i.e., ! • Caution! – ! cannot be too constrained either – May lead to underfitting • This problem is called model selection Example Dataset 1 Example Dataset 2 Section 2: Linear Classifiers Hyperplanes in R! Jez sTT y interce b Definition: A hyperplane in R! can be specified by parameter vector #̅ ∈ R! and offset . ∈ R Itisthesetofpoints43 ∈R suchthat#⋅43+9=0 ! ̅PT #̅⋅43=<"4"+⋯+<#4#=><$ 4$ $%" e.g., the origin (does/does not) satisfy the hyperplane specified by the line in the figure Oz at Kz OfM <3 is orthogonal to the hyperplane <3 ⋅ ? ̅ = 0 in R! this is the set of lines Linear Classifier Goal: Learn a linear decision boundary i.e., constrain possible choices # to hyperplanes simplifying assumptions: • constrain ! to be the set of all hyperplanes that go through the origin • e.g., in R! this is the set of lines that go through the origin • constrain problem to datasets that are linearly separable a hyperplane without offset Linear separability: intuition linearly separable -20 -15 -10 -5 0 5 10 15 20 -15 -10 -5 0 5 10 15 20 not linearly separab +- -5 - +-10 Recall from Linear Algebra f o r # ̅ , 43 ∈ C # • L2 norm of #̅ # ̅ ' ' = < '" + ⋯ + < '# ̅#̅f • dot product #⋅43=<"4"+⋯+<#4#=∑$%"<$ 4$= # ' Case1:0°≤&<90°and270°<&≤360° E I 7 0 A. '̅̅⋅*)=0 B . ' ̅ ⋅ *) < / C. '⋅*)>/
1 1 0 1 1 2 7 O
112112 7 O angleweengand
Case 2: 90° < & < 270° OIco g Case 3: & ∈ {90°, 270°} By Geek3 - Own work, CC BY 3.0, meet https://commons.wikimedia.org/w/index.php?curid=10904768 Linear Classification in 2D The hyperplane )( ⋅ ,̅ = 0 can be described by the vector #" )( is orthogonal to the hyperplane )( ⋅ ,̅ = 0 This suggests a natural way to define a linear classifier: Linear Classification in 2D More precisely, learn '̅ such that the hyperplane '̅ ⋅ 2̅ = 0 separates the positive from the negative examples Define linear classifier as h?̅;#̅ =sign(#̅⋅?̅) The choice of #̅ determines: 1. orientation of the hyperplane e.g., in R! 2. the predicted class label Note: In d dimensions this defines a hyperplane that goes through the origin Goal: Learn a linear decision boundary through the origin Selecting #̅ • Find #̅ that works well on training data I/ = • Training error K/(#̅) I 2 yes Indicator function: 3 =41, 3istrue 0, otherwise Notice that In I fly ish sea E t wouldonlybe achieved when equality lies on the decision the datapoint we rewrite as shown boundary For this True label Predicted label h 2̅(&);4̅ $(&)h 2̅(&);4̅ on the next slide Training error gives us the fraction of misclassified datapoints in a givendataset Linear Classifier Given:trainingdataI/= ?̅0,J0 / 0%1 2̅(&)∈R( B(&) ∈R Goal: Learn a linear decision boundary (through the origin) that minimizes training error h ?̅;#̅ =sign(#̅⋅?̅) K/#̅ =M>J(0)≠h?̅0;#̅
= ∑/ J(0) #⋅?̅0 ≤0 / 0%1
simplifying assumptions: linear separability en sea lies on the hyperplane

Linear separability: definition
Given training examples
I/= ?̅0,J0
we say the data are linearly separable if there exists #̅ ∈ R!
such that yo
forP = 1,…,M.Wereferto
-20 -15 -10 -5 0 5+10 15 20 – –5 –
as a separating hyperplane

Section 3: The Algorithm

Perceptron Algorithm
Oninput!1= $̅2,&2 I
Initialize k = 0, (‘ 5 = )’
while there exists a misclassified point
misclassifiedby
for* = 1,…,- if&(2).̅8⋅$̅(2) ≤0
datapoint Ock
.̅ 894 = .̅ 8 + &(2)$̅(2)
Case1: ya tI
Ket Eck zit
Case2: goin I Eckel
corresponds
If the data are linearly separable
then this algorithm finds a separating hyperplane

while there exists a misclassified point fori=1,…,n
C = 0, D = 1 ‘̅(“)=’̅(#)+2̅(“)= 6,6)
‘̅(#) = 0,0 )
i f 5 ( $ ) ≠ 7 8( ( $ ) ; )( ” (“&’ (”
($) ($) ) =) +5 8(
‘̅(*)=’̅(“)−2̅(*)=−3,5)

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com