Machine Learning as Optimisation Problem
77
Machine Learning as Optimisation Problem
• Objective function: 𝐽 Θ
Θ∗ = argmin 𝐽(Θ) Θ
• Most typical:
1𝑛
𝐽Θ= 𝑛Lh𝑥𝑖;𝜃,𝑦𝑖 +𝜆RΘ
𝑖=1
parameters
78
Logistic regression
• Linear logistic classifier (LLC) •Before:L01 𝑔,𝑎 =ቊ1if𝑔≠𝑎
•𝐿𝐿𝐶𝑥;𝜃,𝜃0 =𝜎𝜃𝑇𝑥+𝜃0
•Sigmoid:𝜎𝑧 = 1 1+𝑒 −𝑧
0 otherwise 𝑥1
80
Making LLC into a classifier
• To make a classifier, predict +1 when 𝜎 𝜃𝑇𝑥 + 𝜃0 > 0.5 𝑥1
•➔Hypothesis class: LLC, with parameters 𝜃,𝜃0. 82
Negative log likelihood loss function
• Idea: Loss on all data inversely related to probability that 𝜃, 𝜃0 assign to my data
•𝑔(𝑖)=𝜎(𝜃𝑇𝑥𝑖 +𝜃0)
𝑛 𝑔(𝑖) if 𝑦(𝑖) = 1
ෑ൝1−𝑔𝑖 otherwise(𝑖𝑓𝑦𝑖 =0) 𝑖=1
𝑛 (𝑖) (𝑖) ෑ𝑔(𝑖)𝑦 ⋅ 1−𝑔𝑖 (1−𝑦 )
𝑖=1
83
Negative log likelihood loss function
𝑛 (𝑖) (𝑖) ෑ𝑔(𝑖)𝑦 ⋅ 1−𝑔𝑖 (1−𝑦 )
𝑖=1
𝑛
= −(𝑦(𝑖) log 𝑔(𝑖) + 1 − 𝑦(𝑖) log(1 − 𝑔(𝑖))) 𝑖=1
𝑛
=L𝑁𝐿𝐿(𝑔𝑖 ,𝑦(𝑖)) 𝑖=1
Also called log loss or cross-entropy loss
L𝑁𝐿𝐿 𝑔,𝑦 =−(𝑦log 𝑔 + 1−𝑦 log 1−𝑔 )
84
Regularisation
1𝑛
𝐽Θ= 𝑛Lh𝑥𝑖;𝜃,𝑦𝑖 +𝜆RΘ
𝑖=1
• Typical choices:
•𝑅Θ = Θ−Θ𝑝𝑟𝑖𝑜𝑟 •𝑅Θ=Θ
2 2
86
Logistic Regression objective
1𝑛
𝐽 𝜃,𝜃;𝒟= L 𝜎𝜃𝑇𝑥𝑖+𝜃,𝑦𝑖 +𝜆𝜃2 𝐿𝑅0𝑛𝑁𝐿𝐿 0
𝑖=1
87