CS计算机代考程序代写 algorithm PowerPoint Presentation

PowerPoint Presentation

Lecturer: Ben Rubinstein

Lecture 16. Learning with
expert advice

COMP90051 Statistical Machine Learning

This lecture

• Learning from expert advice / multiplicative weights
∗ Learner listens to some/all experts making predictions
∗ True outcomes are ADVERSARIAL!
∗ Learner updates weights over experts based on their losses
∗ Algorithms all forms of “multiplicative weights”
∗ Nice clean bounds on total mistakes/loss: by “potential function” technique

• Infallible expert (one always perfect)
∗ Majority Vote Algorithm

• Imperfect experts (none guaranteed perfect) – increasingly better
∗ Weighted Majority Vote Algorithm by Halving
∗ Weighted Majority Voting by General Multiplicative Weights
∗ Probabilistic Experts Algorithm

willoweit.
Typewritten Text
对抗的

willoweit.
Typewritten Text
the data will not be iid

willoweit.
Typewritten Text
万无一失的

COMP90051 Statistical Machine Learning

An infallible expert and the
Majority Algorithm

Warming up example

COMP90051 Statistical Machine Learning

Warm-up: Case of the infallible expert

• Experts 𝐸𝐸1, … ,𝐸𝐸𝑛𝑛 predict the stock market daily
∗ Each expert prediction is binary: stocks will go up/down

• Learner’s game, daily:
∗ Observe predictions of all experts
∗ Make a prediction of its own
∗ Observe outcome (could be anything!)
∗ Goal: minimise number total mistakes

• Infallible expert assumption:
∗ 1 or more experts makes no mistakes

4
NYSE. CCA: skeeze

willoweit.
Underline

COMP90051 Statistical Machine Learning

Infallible Expert Algorithm: Majority Vote

1. Initialise set of experts who haven’t
made mistakes 𝐸𝐸 = 1, … ,𝑛𝑛

2. Repeat per round
a) Observe predictions 𝐸𝐸𝑖𝑖

for all 𝑖𝑖 ∈ 1, … ,𝑛𝑛
b) Make majority prediction

arg max𝑦𝑦∈ −1,1 ∑𝑖𝑖∈𝐸𝐸 1 𝐸𝐸𝑖𝑖 = 𝑦𝑦
c) Observe correct outcome
d) Remove mistaken experts from 𝐸𝐸

CCA3.0: Krisada, Noun Project

willoweit.
Underline

COMP90051 Statistical Machine Learning

Mistake Bound for Majority Vote

Proposition: Under infallible expert assumption,
majority vote makes total mistakes 𝑀𝑀 ≤ log2 𝑛𝑛

Proof

• Loop invariant: If algorithm makes a
mistake, then at least 𝐸𝐸 /2 experts must have been wrong

• I.e. for every incorrect prediction, 𝐸𝐸 reduced by at least half. I.e.
after 𝑀𝑀 mistakes, 𝐸𝐸 ≤ 𝑛𝑛/2𝑀𝑀

• By infallibility, at all times 1 ≤ 𝐸𝐸

• Combine to 1 ≤ 𝐸𝐸 ≤ 𝑛𝑛/2𝑀𝑀, then solve for 𝑀𝑀.

Intuition: Halving
(e.g. tree data

structures!)? Expect
to see log

𝑬𝑬 is the
potential
function

willoweit.
Highlight

willoweit.
Underline

COMP90051 Statistical Machine Learning

How is this “online learning”?

Learning

• Weights on which experts are
worth listening to

• (Infallible case: 0/1 weights)

• Making predictions/
taking actions

• Incurring loss (so far 0/1)

• IID “Distribution” replaced by
adversarial outcomes

Online

• A repeated game

7http://cesa-bianchi.di.unimi.it/predbook/

COMP90051 Statistical Machine Learning

Mini Summary

• Learning with expert advice paradigm
∗ Abstraction of online learning problem
∗ Adversarial feedback
∗ Later: Applications abound

• Bounds on mistakes (later losses) “easy”
∗ Involve “potential function” technique
∗ Later: interested in scaling with best expert performance

Next: Imperfect experts. Down weight don’t drop bad
experts

COMP90051 Statistical Machine Learning

Imperfect experts and the
Halving Algorithm

Similar proof technique; similar algorithm;
much more interesting setting

COMP90051 Statistical Machine Learning

No one’s perfect

• No more guarantee of an infallible expert

• What breaks?
∗ We could end up with 𝐸𝐸 = ∅, how to predict then?
∗ No sense: “Zero tolerance” dropping experts on a mistake

• Very general setting / very few assumptions
∗ Not assuming anything about expert error rates
∗ Not assuming anything about correlation of expert errors
∗ Not assuming anything about outcome observations. Not

even stochastic (could be adversarial!)

COMP90051 Statistical Machine Learning

Imperfect experts: Halving Algorithm

1. Initialise 𝑤𝑤𝑖𝑖 = 1 weight of expert 𝐸𝐸𝑖𝑖
2. Repeat per round

a) Observe predictions 𝐸𝐸𝑖𝑖
for all 𝑖𝑖 ∈ 1, … ,𝑛𝑛

b) Make weighted majority prediction
arg max𝑦𝑦∈ −1,1 ∑𝑖𝑖∈𝐸𝐸 𝑤𝑤𝑖𝑖1 𝐸𝐸𝑖𝑖 = 𝑦𝑦

c) Observe correct outcome
d) Downweigh each mistaken expert 𝐸𝐸𝑖𝑖

𝑤𝑤𝑖𝑖 ⟵ 𝑤𝑤𝑖𝑖/2

CCA3.0: Krisada, Noun Project

COMP90051 Statistical Machine Learning

Mistake Bound for Halving

Proposition: If the best expert makes 𝑚𝑚 mistakes, then
weighted majority vote makes 𝑀𝑀 ≤ 2.4 𝑚𝑚 + log2 𝑛𝑛 mistakes.

Proof
• Invariant: If algorithm makes a mistake, then weight of wrong experts

is at least half the total weight W = ∑𝑖𝑖=1
𝑛𝑛 𝑤𝑤𝑖𝑖

• Weight of wrong experts reduced by 1/2, therefore total weight
reduced by at least 3/4. I.e. after 𝑀𝑀 mistakes, 𝑊𝑊 ≤ 𝑛𝑛 3/4 𝑀𝑀

• Best expert 𝐸𝐸𝑖𝑖 has 𝑤𝑤𝑖𝑖 = (1/2)𝑚𝑚

• Combine to (1/2)𝑚𝑚= 𝑤𝑤𝑖𝑖 ≤ 𝑊𝑊 ≤ 𝑛𝑛 3/4 𝑀𝑀

• Taking logs −𝑚𝑚 ≤ log2𝑛𝑛 + 𝑀𝑀log2 3/4 , solving 𝑀𝑀 ≤
𝑚𝑚+log2𝑛𝑛
log2 4/3

COMP90051 Statistical Machine Learning

Compare, compare: What’s going on?
• Price of imperfection (vs. infallibility) is 𝒪𝒪 𝑚𝑚

∗ Infallible case: 𝑀𝑀 ∈ 𝒪𝒪 log𝑛𝑛
∗ Imperfect case: 𝑀𝑀 ∈ 𝒪𝒪 𝑚𝑚 + log𝑛𝑛

• Scaling to many experts is no problem

• Online learning vs. PAC frameworks

Modelling of losses Ultimate goal

PAC
i.i.d. losses
(due to e.g. Hoeffding)

(For ERM; L6c) Small estimation error
𝑅𝑅 𝑓𝑓𝑚𝑚 − 𝑅𝑅 𝑓𝑓∗ . Bounded in terms of
family’s VC dimension

Online
learning

Adversarial/arbitrary
losses

Small 𝑀𝑀 −𝑚𝑚. Bounded in terms of
number of experts.

willoweit.
Highlight

COMP90051 Statistical Machine Learning

Mini Summary

• Imperfect expert setting
∗ Don’t drop bad experts, just halve their weight
∗ Predict by weighted majority, not simply majority
∗ Mistake bound follows similar “potential function” pattern!

• Learning with expert advice paradigm
∗ Key difference to PAC is adversarial feedback
∗ Similarity: Also concerned with performance

relative to “best in class”

Next: Imperfect experts continued. Generalising halving.

COMP90051 Statistical Machine Learning

From Halving to
Multiplying weights by 1 − 𝜀𝜀

Generalising weighted majority.

COMP90051 Statistical Machine Learning

Useful (but otherwise boring) inequalities

• Lemma 1: For any 𝜀𝜀 ∈ 0,0.5 , we have
−𝜀𝜀 − 𝜀𝜀2 ≤ log𝑒𝑒 1 − 𝜀𝜀 < −𝜀𝜀 • Proof: ∗ Upper bound by Tayler expansion, dropping all by first term (as they’re negative) ∗ Lower bound by convexity of exp(−𝜀𝜀 − 𝜀𝜀2) • Lemma 2: For all 𝜀𝜀 ∈ [0,1] we have, 1 − 𝜀𝜀𝑥𝑥 > �
(1 − 𝜀𝜀)𝑥𝑥, if 𝑥𝑥 ∈ [0,1]

(1 + 𝜀𝜀)−𝑥𝑥, if 𝑥𝑥 ∈ [−1,0]

• Proof: by convexity of the RHS functions

COMP90051 Statistical Machine Learning

Weighted Majority Vote Algorithm

1. Initialise 𝑤𝑤𝑖𝑖 = 1 weight of expert 𝐸𝐸𝑖𝑖
2. Repeat per round

a) Observe predictions 𝐸𝐸𝑖𝑖
for all 𝑖𝑖 ∈ 1, … ,𝑛𝑛

b) Make weighted majority prediction
arg max𝑦𝑦∈ −1,1 ∑𝑖𝑖∈𝐸𝐸 𝑤𝑤𝑖𝑖1 𝐸𝐸𝑖𝑖 = 𝑦𝑦

c) Observe correct outcome
d) Downweigh each mistaken expert 𝐸𝐸𝑖𝑖

𝑤𝑤𝑖𝑖 ⟵ (1 − 𝜀𝜀)𝑤𝑤𝑖𝑖

CCA3.0: Krisada, Noun Project

1 − 𝜀𝜀 1 − 𝜀𝜀

COMP90051 Statistical Machine Learning

Mistake Bound

Proposition: If the best expert makes 𝑚𝑚 mistakes, then 1 − 𝜀𝜀 –
weighted majority makes 𝑀𝑀 ≤ 2 1 + 𝜀𝜀 𝑚𝑚 + (2log𝑒𝑒𝑛𝑛)/𝜀𝜀 mistakes.

Proof
• Whenever learner mistakes, at least half of total weight reduced by

factor of 1 − 𝜀𝜀 . So after 𝑀𝑀 mistakes, 𝑊𝑊 ≤ 𝑛𝑛 1 − 𝜀𝜀/2 𝑀𝑀

• Best expert 𝐸𝐸𝑖𝑖 has 𝑤𝑤𝑖𝑖 = (1 − 𝜀𝜀)𝑚𝑚

• Combine to (1 − 𝜀𝜀)𝑚𝑚= 𝑤𝑤𝑖𝑖 ≤ 𝑊𝑊 ≤ 𝑛𝑛 1 − 𝜀𝜀/2 𝑀𝑀

• Taking logs: 𝑚𝑚log𝑒𝑒(1 − 𝜀𝜀) ≤ log𝑒𝑒𝑛𝑛 + 𝑀𝑀log𝑒𝑒 1 − 𝜀𝜀/2
• Lemma 1 replaces both log𝑒𝑒(1 − 𝜀𝜀): −m ε + 𝜀𝜀2 ≤ log𝑒𝑒𝑛𝑛 −𝑀𝑀𝜀𝜀/2
• Solving for 𝑀𝑀 proves the bound.

Bound improves dependence on m
compared to halving. Why?

COMP90051 Statistical Machine Learning

Dependence in 𝑚𝑚 provably near optimal!

• New to lower bounds? example shows an analysis or
even an algorithm can’t do better than some limit

• Weighted majority almost achieves 2𝑚𝑚 dependence,
with 2 1 + 𝜀𝜀 𝑚𝑚 (considering no. experts fixed)

• Example with 𝑀𝑀 = 2𝑚𝑚
∗ Consider 𝑛𝑛 = 2 with 𝐸𝐸1 (𝐸𝐸2) correct on odd (even) days
∗ Then best expert makes mistakes half the time
∗ But after 1st round, for any 𝜀𝜀, majority vote is wrong all the time,

as incorrect expert gets more than half weight

• Consequence? Can’t improve the constant 2 factor in 𝑚𝑚

COMP90051 Statistical Machine Learning

Mini Summary

• Imperfect expert setting continued…

• From halving to multiplicative weights!
∗ Mistake bound proved as usual via “potential function” trick
∗ Bound’s dependence on best expert improved to 2 + 𝜀𝜀 factor

• Lower bound / impossibility result
∗ Factor of 2 is optimal for (deterministic) multiplicative weights!

Next: Imperfect experts continued. Randomise!!

COMP90051 Statistical Machine Learning

The probabilistic
experts algorithm

wherein randomisation helps us do better!

COMP90051 Statistical Machine Learning

Probabilistic experts algorithm

• Change 1 from mistakes: Loss ℓ𝑖𝑖
(𝑡𝑡) ∈ [0,1] of 𝐸𝐸𝑖𝑖, round 𝑡𝑡

• Change 2: Randomised algorithm means, bounding
expected losses (sound familiar? It should…. a risk!)

1. Initialise 𝑤𝑤𝑖𝑖 = 1 weight of expert 𝐸𝐸𝑖𝑖
2. Repeat per round

a) Observe predictions 𝐸𝐸𝑖𝑖
for all 𝑖𝑖 ∈ 1, … ,𝑛𝑛

b) Predict 𝐸𝐸𝑖𝑖 of expert 𝑖𝑖 with probability
𝑤𝑤𝑖𝑖
𝑊𝑊

where 𝑊𝑊 = ∑𝑖𝑖=1
𝑛𝑛 𝑤𝑤𝑖𝑖

c) Observe losses

d) Update each weight 𝑤𝑤𝑖𝑖 ⟵ (1 − 𝜀𝜀)
ℓ𝑖𝑖

(𝑡𝑡)
𝑤𝑤𝑖𝑖

COMP90051 Statistical Machine Learning

Probabilistic experts: Expected loss bound

• Proposition: Expected loss of the probabilistic
experts algorithm is 𝐿𝐿 ≤ log𝑒𝑒𝑛𝑛

𝜀𝜀
+ (1 + 𝜀𝜀)𝐿𝐿∗ where 𝐿𝐿∗

is the minimum loss over experts.

• Proof: next, follows similar
“potential” pattern

• Beats deterministic! Shaves off
optimal constant 2

• Generalises in many directions.
Active area of research in ML,
control, economics, in top labs.

23
http://cesa-bianchi.di.unimi.it/predbook/

COMP90051 Statistical Machine Learning

Proof: Upper bounding potential function

• Learner’s round 𝑡𝑡 expected loss: 𝐿𝐿𝑡𝑡 =
∑𝑖𝑖=1
𝑛𝑛 𝑤𝑤𝑖𝑖

(𝑡𝑡)ℓ𝑖𝑖
(𝑡𝑡)

W(𝑡𝑡)

• By Lemma 2, since losses are [0,1]:
updated𝑤𝑤𝑖𝑖

(𝑡𝑡+1) ← (1 − 𝜀𝜀)ℓ𝑖𝑖
(𝑡𝑡)
𝑤𝑤𝑖𝑖

(𝑡𝑡) ≤ 1 − 𝜀𝜀ℓ𝑖𝑖
(𝑡𝑡) 𝑤𝑤𝑖𝑖

(𝑡𝑡)

• Rearrange to obtain recurrence relation:

W t + 1 ≤�
𝑖𝑖=1

𝑛𝑛

1 − 𝜀𝜀ℓ𝑖𝑖
𝑡𝑡 𝑤𝑤𝑖𝑖

(𝑡𝑡) = �
𝑖𝑖=1

𝑛𝑛

𝑤𝑤𝑖𝑖
(𝑡𝑡) 1 − 𝜀𝜀

∑𝑖𝑖=1
𝑛𝑛 𝑤𝑤𝑖𝑖

(𝑡𝑡)ℓ𝑖𝑖
𝑡𝑡

∑𝑖𝑖=1
𝑛𝑛 𝑤𝑤𝑖𝑖

(𝑡𝑡)

= W(𝑡𝑡) 1 − 𝜀𝜀𝐿𝐿𝑡𝑡
• Initialisation gave W 0 = 𝑛𝑛, so telescoping we get:

W T ≤ 𝑛𝑛�
𝑡𝑡=1

𝑇𝑇
1 − 𝜀𝜀𝐿𝐿𝑡𝑡

COMP90051 Statistical Machine Learning

Proof: Lower bounding potential, Wrap up

• Proved upper bound: W T ≤ 𝑛𝑛∏𝑡𝑡=1𝑇𝑇 1 − 𝜀𝜀𝐿𝐿𝑡𝑡
• Lower bound from best expert total loss 𝐿𝐿∗:

W(𝑇𝑇) ≥ (1 − 𝜀𝜀)𝐿𝐿
∗

• Combining bounds and taking log’s:

𝐿𝐿∗log𝑒𝑒 1 − 𝜀𝜀 ≤ log𝑒𝑒𝑛𝑛 + �
𝑡𝑡=1

𝑇𝑇

log𝑒𝑒 1 − 𝜀𝜀𝐿𝐿𝑡𝑡

• By Lemma 1: −𝐿𝐿∗ 𝜀𝜀 + 𝜀𝜀2 ≤ log𝑒𝑒𝑛𝑛 − 𝜀𝜀 ∑𝑡𝑡=1𝑇𝑇 𝐿𝐿𝑡𝑡
• Linearity of expectation 𝐿𝐿 = ∑𝑡𝑡=1𝑇𝑇 𝐿𝐿𝑡𝑡, rearranging:

𝐿𝐿 ≤ log𝑒𝑒𝑛𝑛𝜀𝜀 + (1 + 𝜀𝜀)𝐿𝐿
∗

COMP90051 Statistical Machine Learning

Applications of multiplicative weights [Kale thesis 2007]

• Learning quantum states from noisy measurements

• Derandomising algorithms

• Solving certain
zero-sum games

• Fast graph partitioning

• Fast solving of
semidefinite
programming problems

• Portfolio optimisation

• A basis for boosting

• Sparse vector technique in
differential privacy

26
Public domain

COMP90051 Statistical Machine Learning

Mini Summary

• Introducing randomisation to learning with experts
∗ Algorithm choosing a random expert to follow
∗ Weights become probabilities
∗ Mistakes generalise to losses

• Loss bound
∗ Have to bound expected loss (hey, risk!!)
∗ Shaves off that 2 factor. Proves that randomisation really helps!

Next: Only observe reward of chosen expert  bandits!

�Lecturer: Ben Rubinstein
This lecture
An infallible expert and the Majority Algorithm
Warm-up: Case of the infallible expert
Infallible Expert Algorithm: Majority Vote
Mistake Bound for Majority Vote
How is this “online learning”?
Mini Summary
Imperfect experts and the Halving Algorithm
No one’s perfect
Imperfect experts: Halving Algorithm
Mistake Bound for Halving
Compare, compare: What’s going on?
Mini Summary
From Halving to �Multiplying weights by 1−𝜀
Useful (but otherwise boring) inequalities
Weighted Majority Vote Algorithm
Mistake Bound
Dependence in 𝑚 provably near optimal!
Mini Summary
The probabilistic �experts algorithm
Probabilistic experts algorithm
Probabilistic experts: Expected loss bound
Proof: Upper bounding potential function
Proof: Lower bounding potential, Wrap up
Applications of multiplicative weights [Kale thesis 2007]
Mini Summary

Related Posts