CS计算机代考程序代写 algorithm PowerPoint Presentation

PowerPoint Presentation

Lecturer: Ben Rubinstein

Lecture 16. Learning with
expert advice

COMP90051 Statistical Machine Learning

Copyright: University of Melbourne

COMP90051 Statistical Machine Learning

This lecture

• Learning from expert advice / multiplicative weights
∗ Learner listens to some/all experts making predictions
∗ True outcomes are ADVERSARIAL!
∗ Learner updates weights over experts based on their losses
∗ Algorithms all forms of “multiplicative weights”
∗ Nice clean bounds on total mistakes/loss: by “potential function” technique

• Infallible expert (one always perfect)
∗ Majority Vote Algorithm

• Imperfect experts (none guaranteed perfect) – increasingly better
∗ Weighted Majority Vote Algorithm by Halving
∗ Weighted Majority Voting by General Multiplicative Weights
∗ Probabilistic Experts Algorithm

2

willoweit.
Typewritten Text
对抗的

willoweit.
Typewritten Text
the data will not be iid

willoweit.
Typewritten Text
万无一失的

COMP90051 Statistical Machine Learning

An infallible expert and the
Majority Algorithm

3

Warming up example

COMP90051 Statistical Machine Learning

Warm-up: Case of the infallible expert

• Experts 𝐸𝐸1, … ,𝐸𝐸𝑛𝑛 predict the stock market daily
∗ Each expert prediction is binary: stocks will go up/down

• Learner’s game, daily:
∗ Observe predictions of all experts
∗ Make a prediction of its own
∗ Observe outcome (could be anything!)
∗ Goal: minimise number total mistakes

• Infallible expert assumption:
∗ 1 or more experts makes no mistakes

4
NYSE. CCA: skeeze

willoweit.
Underline

COMP90051 Statistical Machine Learning

Infallible Expert Algorithm: Majority Vote

1. Initialise set of experts who haven’t
made mistakes 𝐸𝐸 = 1, … ,𝑛𝑛

2. Repeat per round
a) Observe predictions 𝐸𝐸𝑖𝑖

for all 𝑖𝑖 ∈ 1, … ,𝑛𝑛
b) Make majority prediction

arg max𝑦𝑦∈ −1,1 ∑𝑖𝑖∈𝐸𝐸 1 𝐸𝐸𝑖𝑖 = 𝑦𝑦
c) Observe correct outcome
d) Remove mistaken experts from 𝐸𝐸

5

CCA3.0: Krisada, Noun Project

willoweit.
Underline

COMP90051 Statistical Machine Learning

Mistake Bound for Majority Vote

Proposition: Under infallible expert assumption,
majority vote makes total mistakes 𝑀𝑀 ≤ log2 𝑛𝑛

Proof

• Loop invariant: If algorithm makes a
mistake, then at least 𝐸𝐸 /2 experts must have been wrong

• I.e. for every incorrect prediction, 𝐸𝐸 reduced by at least half. I.e.
after 𝑀𝑀 mistakes, 𝐸𝐸 ≤ 𝑛𝑛/2𝑀𝑀

• By infallibility, at all times 1 ≤ 𝐸𝐸

• Combine to 1 ≤ 𝐸𝐸 ≤ 𝑛𝑛/2𝑀𝑀, then solve for 𝑀𝑀.

6

Intuition: Halving
(e.g. tree data

structures!)? Expect
to see log

𝑬𝑬 is the
potential
function

willoweit.
Highlight

willoweit.
Underline

willoweit.
Underline

COMP90051 Statistical Machine Learning

How is this “online learning”?

Learning

• Weights on which experts are
worth listening to

• (Infallible case: 0/1 weights)

• Making predictions/
taking actions

• Incurring loss (so far 0/1)

• IID “Distribution” replaced by
adversarial outcomes

Online

• A repeated game

7http://cesa-bianchi.di.unimi.it/predbook/

COMP90051 Statistical Machine Learning

Mini Summary

• Learning with expert advice paradigm
∗ Abstraction of online learning problem
∗ Adversarial feedback
∗ Later: Applications abound

• Bounds on mistakes (later losses) “easy”
∗ Involve “potential function” technique
∗ Later: interested in scaling with best expert performance

Next: Imperfect experts. Down weight don’t drop bad
experts

8

COMP90051 Statistical Machine Learning

Imperfect experts and the
Halving Algorithm

9

Similar proof technique; similar algorithm;
much more interesting setting

COMP90051 Statistical Machine Learning

No one’s perfect

• No more guarantee of an infallible expert

• What breaks?
∗ We could end up with 𝐸𝐸 = ∅, how to predict then?
∗ No sense: “Zero tolerance” dropping experts on a mistake

• Very general setting / very few assumptions
∗ Not assuming anything about expert error rates
∗ Not assuming anything about correlation of expert errors
∗ Not assuming anything about outcome observations. Not

even stochastic (could be adversarial!)

10

COMP90051 Statistical Machine Learning

Imperfect experts: Halving Algorithm

1. Initialise 𝑤𝑤𝑖𝑖 = 1 weight of expert 𝐸𝐸𝑖𝑖
2. Repeat per round

a) Observe predictions 𝐸𝐸𝑖𝑖
for all 𝑖𝑖 ∈ 1, … ,𝑛𝑛

b) Make weighted majority prediction
arg max𝑦𝑦∈ −1,1 ∑𝑖𝑖∈𝐸𝐸 𝑤𝑤𝑖𝑖1 𝐸𝐸𝑖𝑖 = 𝑦𝑦

c) Observe correct outcome
d) Downweigh each mistaken expert 𝐸𝐸𝑖𝑖

𝑤𝑤𝑖𝑖 ⟵ 𝑤𝑤𝑖𝑖/2

11

CCA3.0: Krisada, Noun Project

COMP90051 Statistical Machine Learning

12

Mistake Bound for Halving

Proposition: If the best expert makes 𝑚𝑚 mistakes, then
weighted majority vote makes 𝑀𝑀 ≤ 2.4 𝑚𝑚 + log2 𝑛𝑛 mistakes.

Proof
• Invariant: If algorithm makes a mistake, then weight of wrong experts

is at least half the total weight W = ∑𝑖𝑖=1
𝑛𝑛 𝑤𝑤𝑖𝑖

• Weight of wrong experts reduced by 1/2, therefore total weight
reduced by at least 3/4. I.e. after 𝑀𝑀 mistakes, 𝑊𝑊 ≤ 𝑛𝑛 3/4 𝑀𝑀

• Best expert 𝐸𝐸𝑖𝑖 has 𝑤𝑤𝑖𝑖 = (1/2)𝑚𝑚

• Combine to (1/2)𝑚𝑚= 𝑤𝑤𝑖𝑖 ≤ 𝑊𝑊 ≤ 𝑛𝑛 3/4 𝑀𝑀

• Taking logs −𝑚𝑚 ≤ log2𝑛𝑛 + 𝑀𝑀log2 3/4 , solving 𝑀𝑀 ≤
𝑚𝑚+log2𝑛𝑛
log2 4/3

COMP90051 Statistical Machine Learning

Compare, compare: What’s going on?
• Price of imperfection (vs. infallibility) is 𝒪𝒪 𝑚𝑚

∗ Infallible case: 𝑀𝑀 ∈ 𝒪𝒪 log𝑛𝑛
∗ Imperfect case: 𝑀𝑀 ∈ 𝒪𝒪 𝑚𝑚 + log𝑛𝑛

• Scaling to many experts is no problem

• Online learning vs. PAC frameworks

13

Modelling of losses Ultimate goal

PAC
i.i.d. losses
(due to e.g. Hoeffding)

(For ERM; L6c) Small estimation error
𝑅𝑅 𝑓𝑓𝑚𝑚 − 𝑅𝑅 𝑓𝑓∗ . Bounded in terms of
family’s VC dimension

Online
learning

Adversarial/arbitrary
losses

Small 𝑀𝑀 −𝑚𝑚. Bounded in terms of
number of experts.

willoweit.
Highlight

COMP90051 Statistical Machine Learning

Mini Summary

• Imperfect expert setting
∗ Don’t drop bad experts, just halve their weight
∗ Predict by weighted majority, not simply majority
∗ Mistake bound follows similar “potential function” pattern!

• Learning with expert advice paradigm
∗ Key difference to PAC is adversarial feedback
∗ Similarity: Also concerned with performance

relative to “best in class”

Next: Imperfect experts continued. Generalising halving.

14

COMP90051 Statistical Machine Learning

From Halving to
Multiplying weights by 1 − 𝜀𝜀

15

Generalising weighted majority.

COMP90051 Statistical Machine Learning

Useful (but otherwise boring) inequalities

• Lemma 1: For any 𝜀𝜀 ∈ 0,0.5 , we have
−𝜀𝜀 − 𝜀𝜀2 ≤ log𝑒𝑒 1 − 𝜀𝜀 < −𝜀𝜀 • Proof: ∗ Upper bound by Tayler expansion, dropping all by first term (as they’re negative) ∗ Lower bound by convexity of exp(−𝜀𝜀 − 𝜀𝜀2) • Lemma 2: For all 𝜀𝜀 ∈ [0,1] we have, 1 − 𝜀𝜀𝑥𝑥 > �
(1 − 𝜀𝜀)𝑥𝑥, if 𝑥𝑥 ∈ [0,1]

(1 + 𝜀𝜀)−𝑥𝑥, if 𝑥𝑥 ∈ [−1,0]

• Proof: by convexity of the RHS functions

16

COMP90051 Statistical Machine Learning

Weighted Majority Vote Algorithm

1. Initialise 𝑤𝑤𝑖𝑖 = 1 weight of expert 𝐸𝐸𝑖𝑖
2. Repeat per round

a) Observe predictions 𝐸𝐸𝑖𝑖
for all 𝑖𝑖 ∈ 1, … ,𝑛𝑛

b) Make weighted majority prediction
arg max𝑦𝑦∈ −1,1 ∑𝑖𝑖∈𝐸𝐸 𝑤𝑤𝑖𝑖1 𝐸𝐸𝑖𝑖 = 𝑦𝑦

c) Observe correct outcome
d) Downweigh each mistaken expert 𝐸𝐸𝑖𝑖

𝑤𝑤𝑖𝑖 ⟵ (1 − 𝜀𝜀)𝑤𝑤𝑖𝑖

17

CCA3.0: Krisada, Noun Project

1 − 𝜀𝜀 1 − 𝜀𝜀

COMP90051 Statistical Machine Learning

18

Mistake Bound

Proposition: If the best expert makes 𝑚𝑚 mistakes, then 1 − 𝜀𝜀 –
weighted majority makes 𝑀𝑀 ≤ 2 1 + 𝜀𝜀 𝑚𝑚 + (2log𝑒𝑒𝑛𝑛)/𝜀𝜀 mistakes.

Proof
• Whenever learner mistakes, at least half of total weight reduced by

factor of 1 − 𝜀𝜀 . So after 𝑀𝑀 mistakes, 𝑊𝑊 ≤ 𝑛𝑛 1 − 𝜀𝜀/2 𝑀𝑀

• Best expert 𝐸𝐸𝑖𝑖 has 𝑤𝑤𝑖𝑖 = (1 − 𝜀𝜀)𝑚𝑚

• Combine to (1 − 𝜀𝜀)𝑚𝑚= 𝑤𝑤𝑖𝑖 ≤ 𝑊𝑊 ≤ 𝑛𝑛 1 − 𝜀𝜀/2 𝑀𝑀

• Taking logs: 𝑚𝑚log𝑒𝑒(1 − 𝜀𝜀) ≤ log𝑒𝑒𝑛𝑛 + 𝑀𝑀log𝑒𝑒 1 − 𝜀𝜀/2
• Lemma 1 replaces both log𝑒𝑒(1 − 𝜀𝜀): −m ε + 𝜀𝜀2 ≤ log𝑒𝑒𝑛𝑛 −𝑀𝑀𝜀𝜀/2
• Solving for 𝑀𝑀 proves the bound.

Bound improves dependence on m
compared to halving. Why?

COMP90051 Statistical Machine Learning

Dependence in 𝑚𝑚 provably near optimal!

• New to lower bounds? example shows an analysis or
even an algorithm can’t do better than some limit

• Weighted majority almost achieves 2𝑚𝑚 dependence,
with 2 1 + 𝜀𝜀 𝑚𝑚 (considering no. experts fixed)

• Example with 𝑀𝑀 = 2𝑚𝑚
∗ Consider 𝑛𝑛 = 2 with 𝐸𝐸1 (𝐸𝐸2) correct on odd (even) days
∗ Then best expert makes mistakes half the time
∗ But after 1st round, for any 𝜀𝜀, majority vote is wrong all the time,

as incorrect expert gets more than half weight

• Consequence? Can’t improve the constant 2 factor in 𝑚𝑚

19

COMP90051 Statistical Machine Learning

Mini Summary

• Imperfect expert setting continued…

• From halving to multiplicative weights!
∗ Mistake bound proved as usual via “potential function” trick
∗ Bound’s dependence on best expert improved to 2 + 𝜀𝜀 factor

• Lower bound / impossibility result
∗ Factor of 2 is optimal for (deterministic) multiplicative weights!

Next: Imperfect experts continued. Randomise!!

20

COMP90051 Statistical Machine Learning

The probabilistic
experts algorithm

21

wherein randomisation helps us do better!

COMP90051 Statistical Machine Learning

Probabilistic experts algorithm

• Change 1 from mistakes: Loss ℓ𝑖𝑖
(𝑡𝑡) ∈ [0,1] of 𝐸𝐸𝑖𝑖, round 𝑡𝑡

• Change 2: Randomised algorithm means, bounding
expected losses (sound familiar? It should…. a risk!)

1. Initialise 𝑤𝑤𝑖𝑖 = 1 weight of expert 𝐸𝐸𝑖𝑖
2. Repeat per round

a) Observe predictions 𝐸𝐸𝑖𝑖
for all 𝑖𝑖 ∈ 1, … ,𝑛𝑛

b) Predict 𝐸𝐸𝑖𝑖 of expert 𝑖𝑖 with probability
𝑤𝑤𝑖𝑖
𝑊𝑊

where 𝑊𝑊 = ∑𝑖𝑖=1
𝑛𝑛 𝑤𝑤𝑖𝑖

c) Observe losses

d) Update each weight 𝑤𝑤𝑖𝑖 ⟵ (1 − 𝜀𝜀)
ℓ𝑖𝑖

(𝑡𝑡)
𝑤𝑤𝑖𝑖

22

COMP90051 Statistical Machine Learning

Probabilistic experts: Expected loss bound

• Proposition: Expected loss of the probabilistic
experts algorithm is 𝐿𝐿 ≤ log𝑒𝑒𝑛𝑛

𝜀𝜀
+ (1 + 𝜀𝜀)𝐿𝐿∗ where 𝐿𝐿∗

is the minimum loss over experts.

• Proof: next, follows similar
“potential” pattern

• Beats deterministic! Shaves off
optimal constant 2

• Generalises in many directions.
Active area of research in ML,
control, economics, in top labs.

23
http://cesa-bianchi.di.unimi.it/predbook/

COMP90051 Statistical Machine Learning

Proof: Upper bounding potential function

• Learner’s round 𝑡𝑡 expected loss: 𝐿𝐿𝑡𝑡 =
∑𝑖𝑖=1
𝑛𝑛 𝑤𝑤𝑖𝑖

(𝑡𝑡)ℓ𝑖𝑖
(𝑡𝑡)

W(𝑡𝑡)

• By Lemma 2, since losses are [0,1]:
updated𝑤𝑤𝑖𝑖

(𝑡𝑡+1) ← (1 − 𝜀𝜀)ℓ𝑖𝑖
(𝑡𝑡)
𝑤𝑤𝑖𝑖

(𝑡𝑡) ≤ 1 − 𝜀𝜀ℓ𝑖𝑖
(𝑡𝑡) 𝑤𝑤𝑖𝑖

(𝑡𝑡)

• Rearrange to obtain recurrence relation:

W t + 1 ≤�
𝑖𝑖=1

𝑛𝑛

1 − 𝜀𝜀ℓ𝑖𝑖
𝑡𝑡 𝑤𝑤𝑖𝑖

(𝑡𝑡) = �
𝑖𝑖=1

𝑛𝑛

𝑤𝑤𝑖𝑖
(𝑡𝑡) 1 − 𝜀𝜀

∑𝑖𝑖=1
𝑛𝑛 𝑤𝑤𝑖𝑖

(𝑡𝑡)ℓ𝑖𝑖
𝑡𝑡

∑𝑖𝑖=1
𝑛𝑛 𝑤𝑤𝑖𝑖

(𝑡𝑡)

= W(𝑡𝑡) 1 − 𝜀𝜀𝐿𝐿𝑡𝑡
• Initialisation gave W 0 = 𝑛𝑛, so telescoping we get:

W T ≤ 𝑛𝑛�
𝑡𝑡=1

𝑇𝑇
1 − 𝜀𝜀𝐿𝐿𝑡𝑡

24

COMP90051 Statistical Machine Learning

Proof: Lower bounding potential, Wrap up

• Proved upper bound: W T ≤ 𝑛𝑛∏𝑡𝑡=1𝑇𝑇 1 − 𝜀𝜀𝐿𝐿𝑡𝑡
• Lower bound from best expert total loss 𝐿𝐿∗:

W(𝑇𝑇) ≥ (1 − 𝜀𝜀)𝐿𝐿

• Combining bounds and taking log’s:

𝐿𝐿∗log𝑒𝑒 1 − 𝜀𝜀 ≤ log𝑒𝑒𝑛𝑛 + �
𝑡𝑡=1

𝑇𝑇

log𝑒𝑒 1 − 𝜀𝜀𝐿𝐿𝑡𝑡

• By Lemma 1: −𝐿𝐿∗ 𝜀𝜀 + 𝜀𝜀2 ≤ log𝑒𝑒𝑛𝑛 − 𝜀𝜀 ∑𝑡𝑡=1𝑇𝑇 𝐿𝐿𝑡𝑡
• Linearity of expectation 𝐿𝐿 = ∑𝑡𝑡=1𝑇𝑇 𝐿𝐿𝑡𝑡, rearranging:

𝐿𝐿 ≤ log𝑒𝑒𝑛𝑛𝜀𝜀 + (1 + 𝜀𝜀)𝐿𝐿

25

COMP90051 Statistical Machine Learning

Applications of multiplicative weights [Kale thesis 2007]

• Learning quantum states from noisy measurements

• Derandomising algorithms

• Solving certain
zero-sum games

• Fast graph partitioning

• Fast solving of
semidefinite
programming problems

• Portfolio optimisation

• A basis for boosting

• Sparse vector technique in
differential privacy

26
Public domain

COMP90051 Statistical Machine Learning

Mini Summary

• Introducing randomisation to learning with experts
∗ Algorithm choosing a random expert to follow
∗ Weights become probabilities
∗ Mistakes generalise to losses

• Loss bound
∗ Have to bound expected loss (hey, risk!!)
∗ Shaves off that 2 factor. Proves that randomisation really helps!

Next: Only observe reward of chosen expert  bandits!

27

�Lecturer: Ben Rubinstein
This lecture
An infallible expert and the Majority Algorithm
Warm-up: Case of the infallible expert
Infallible Expert Algorithm: Majority Vote
Mistake Bound for Majority Vote
How is this “online learning”?
Mini Summary
Imperfect experts and the Halving Algorithm
No one’s perfect
Imperfect experts: Halving Algorithm
Mistake Bound for Halving
Compare, compare: What’s going on?
Mini Summary
From Halving to �Multiplying weights by 1−𝜀
Useful (but otherwise boring) inequalities
Weighted Majority Vote Algorithm
Mistake Bound
Dependence in 𝑚 provably near optimal!
Mini Summary
The probabilistic �experts algorithm
Probabilistic experts algorithm
Probabilistic experts: Expected loss bound
Proof: Upper bounding potential function
Proof: Lower bounding potential, Wrap up
Applications of multiplicative weights [Kale thesis 2007]
Mini Summary