程序代写 Introduction to Machine Learning The Kernel Trick, Regularization and

Introduction to Machine Learning The Kernel Trick, Regularization and
Regression
Prof. Kutty

Announcements
• Midterm Conflict forms will be out today; due in 1 week
– hardconflictsonly
– proofrequired
• Project 1 is out
– getstartedtoday(orsooner!)

Today’s Agenda
• Recap: Dual Formulation to kernelize SVMs
• Section 1: The kernel function
• Section 2: Regularization
• Section 3: Linear Regression with Squared Loss

Recap: (Efficient) Non-linear decision boundaries with SVMs
SVMs and the kernel trick

Dual formulation of Hard-Margin SVM
subject to ,
(‘) ̅ ‘ . ⋅ 0̅
≥ 1 for 3 ∈
Assuming data are linearly separable
acidgird 1, … , 7
• Boundary that classifies
IERXn 1XnXn
max ↵i ↵i↵jy(i)y(j)x ̄(i) ·x ̄(j)
↵ ̄,↵i0 i=1 2 i=1 j=1
É IiIigo aces the training set correctly
• That is maximally removed from training examples closest to the decision boundary

Feature mapping with SVM
max ↵i ↵i↵jy(i)y(j)x ̄(i) ·x ̄(j)
↵ ̄,↵i0 i=1 2 i=1 j=1 Xn 1XnXn
max ↵i 2 ↵i↵j y(i)y(j)((x ̄(i)) · (x ̄(j))) ↵ ̄ i=1 i=1j=1
subject to ↵i 0 8i = 1,..,n Issue: potentially very inefficient

Kernels and Feature Maps
Feature map
takes input x ̄ 2 X (e.g., x ̄ 2 Rd ) and maps it to feature spaceF (e.g., Rp)
Each kernel has an associated feature mapping
Intuitively, the kernel function takes two inputs and gives their similarity in the feature space
8 :9 , ; ̅ = = :9 ⋅ = ; ̅
forif respace
:X !Fin K:X⇥X!R
• Sometimes it is much more efficient to compute !($̅ ! , $̅(#)) directly

Kernelized Dual SVM
↵i 2 ↵i↵j y(i)y(j)((x ̄(i)) · (x ̄(j)))
i=1 i=1 j=1 Effed
subject to ↵i 0 8i = 1,..,n
↵i 2 ↵i↵jy(i)y(j)K(x ̄(i),x ̄(j))
↵ ̄ i=1 i=1j=1
subject to ↵i 0 8i = 1,..,n

Classifying a new example
Recall: ̄ Xn computed explicitly ⇤ (i) (i)(i)
✓= ↵iyx ̄(x ̄) i=1
AB’,’=(0̅’) ⋅=(0̅)
= AB’,(‘)8 0̅(‘),0̅ ‘)*

Section 1: The kernel function

Inferring the feature map
% ‘&, *̅ = ‘& ⋅ *̅ ! what is the corresponding feature map “(%̅)
To check Ichnia find
whether or not corresponding feature

ya VE t Ev.ua

Assuming the original feature space is 2-dimensional, find the G kernel that corresponds to the following (x ̄)
(x ̄) = [r, p2rx1, p2rx2, p2x1x2, x21, x2]
(i) (j) (i) (j) 2 , x ̄ ) = ( r + x ̄ · x ̄ )
b) K(x ̄(i), x ̄(j)) = (x ̄(i) · x ̄(j))2 + r c) K(x ̄(i), x ̄(j)) = r(x ̄(i) · x ̄(j))2
d) K(x ̄(i), x ̄(j)) = r2(x ̄(i) · x ̄(j))2 e) K(x ̄(i), x ̄(j)) = (x ̄(i) · x ̄(j))2
up niff Hint separate out terms that include vi
r from terms that do not

t 2r air t 2r
2ruzvztzuiviuzvztu.lv have t 4,4 42 v2

Examples of Valid Kernels
Linear Kernel
* ,+ , / ̅ = ,+ ⋅ / ̅
Quadratic Kernel
*,+,/̅ = ,+⋅/̅+8!with8≥0
mas’sfirst
RBF Kernel (aka Gaussian Kernel)
* ,+ , / ̅ = e x p − F ,+ − / ̅ !
dimension f e a t u r e m a p

Gaussian Kernel
The r parameter

Visualization
RBF kernel
K(x ̄(i), x ̄(j)) = exp(||x ̄(i) x ̄(j)||2)
This example courtesy

controls how much I should higher degree polynomial
care about Visualization feature map is
SVM with RBF kernel underlying
sum of polynomials
K(x ̄(i), x ̄(j)) = exp(||x ̄(i) x ̄(j)||2)

RBF Kernels
! #” , & ̅ = e x p − , #” − & ̅ !

Taylor series 0
Cu.org i polynomial
e2I naO nl
the RBF dimensional

Kernel Algebra
Let ,!(⋅,⋅) and ,”(⋅,⋅) be valid kernels, then the following are valid kernels:
K(x ̄, z ̄) = K1(x ̄, z ̄) + K2(x ̄, z ̄) K(x ̄, z ̄) = ↵K1(x ̄, z ̄)
K(x ̄, z ̄) = K1(x ̄, z ̄)K2(x ̄, z ̄) direct product
there exists 0 KCI 3 Oct lotz
i e in all these cases
sum scalar product ↵ > 0

Mercer’s Theorem
Intuition – –
A function !: R ×R → R is a valid kernel
for any ,+(.), … , ,+ / with ,+(!) ∈ R- and finite / Jeca the /×/ matrix G with 0!# = !(,+(!), ,+(#))
is positive-semidefinite
That is, G is 0
• symmetric 0 = 0
• and∀3∈R/ 3003≥0
In other words
for such a function !(76, 8̅), there exists a function φ such that !(76, 8̅) = φ(76) · φ(8̅)

Section 2: Regularization

Bias Variance Tradeoff
bias models are not ableto fully capture patterns in the dataset
variance modus are sensitive to noise and in the dataset

Bias-Variance Tradeoff
Estimation Error (variance)
*low varianceàconstant function
*high varianceàhigh degree polynomial,
RBF kernel
Structural Error (bias)
*low biasàlinear regression applied to linear data, RBF kernel
*high biasàconstant function, linear model applied to non-linear data

How to find boundaries that generalize well?
• Regularization
• Maximum margin separator
As we will see, these are related

Regularization
Idea: prefer a simpler hypothesis
• will push parameters toward some default value
(typically zero)
• resists setting parameters away from default value when data
weakly tells us otherwise
hyperparameter auction ggamelpficatite penalty trainingerror
̄ ̄ welldoes Jn,(✓) = + Rn(✓) how w rt Sn
regularization I do term/penalty; λ ≥ 0

What should !($̅) be?
• Desirable characteristics:
– should force components of 9̅ to be small (close to zero) – Convex, Smooth
• A popular choice – l1 norms
– Let’s use l2 norm as the penalty function

Soft-Margin SVM: exercise
Claim: Soft margin SVM is an optimization problem with hinge loss as objective function and L2-norm regularizer
min 1 $#,&,(‘
+ ∑)*!4) subjectto= +A ≥1−4) and4) ≥0 forF∈ 1,…,I
$#! + ())̅)
+ \max{1−`! 9̅⋅$̅! +b,0}
43 , 5 2 / ! 6 .
Derivation left as exercise
• Writed! ≥1−`(!) 9̅⋅$̅ ! +b andd! ≥0
• Observe that the objective function
includes the terms min ∑/ d 7& !6.!

Soft-Margin SVM: exercise
Claim: Soft margin SVM is an optimization problem with hinge loss as objective function and L2-norm regularizer
min 1 $#,&,(‘
+ ∑)*!4) subjectto= +A ≥1−4) and4) ≥0 forF∈ 1,…,I
+ \max{1−`! 9̅⋅$̅! +b,0}
T.ngeiossfmd.no
$#! + ())̅)
43 , 5 2 / ! 6 .
Derivation left as exercise
• Writed! ≥1−`(!) 9̅⋅$̅ ! +b andd! ≥0
• Observe that the objective function
includes the terms min ∑/ d 7& !6.!

Section 3: Linear Regression with Squared Loss

Supervised Learning
– Given data f (in feature space) and the labels g – Learn to predict g from f
• Labels could be discrete or continuous – Discrete labels: classification
– Continuous labels: regression

Regression vs Classification
Classification problem: y ∈ {−1, 1} or y ∈ {0, 1}
Regression problem: y ∈ R
Regression function f : Rd → R where f ∈ F

Linear Regression
A linear regression function is simply a linear function of the feature vector:
f ( x ̄ ; θ , b ) = θ · x ̄ + b
Learning task:
Choose parameters in response to training set
Sn ={x ̄(i),y(i)}ni=1 x ̄∈Rd y∈R

程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts