Analysis of Algorithms, I
CSOR W4231
Computer Science Department
Copyright By PowCoder代写 加微信 powcoder
Columbia University
The greedy principle: cache maintenance
Cache maintenance (the offline problem)
An optimal greedy algorithm for the offline problem:
Farthest-into-Future (FF) Proof of optimality of FF The online problem
Cache maintenance (the offline problem)
An optimal greedy algorithm for the offline problem: Farthest-into-Future (FF)
Proof of optimality of FF The online problem
Caching: the process of storing a small amount of data
in a fast memory, so as to reduce the amount of time spent interacting with slow memory.
Goal: when attempting to download a page, it should be in cache.
⇒ Need a cache maintenance algorithm to determine what to keep and what to evict from the cache.
n, the number of pages in the main memory
k, the size of the cache memory
a sequence of m requests r1, r2, . . . , rm for memory pages
#main memory pages n = 3
setofmainmemorypages={a,b,c} cachesizek=2
#requestsm=7
sequenceofrequests:a,b,c,b,c,a,b
To service a request, the corresponding page must be in the cache.
⇒ After the first k requests for distinct pages the cache is full. A request is received and serviced within the same time
Cache miss: a request for a page that is not in the cache.
⇒ If there is a cache miss, we must evict a page from the cache to bring in the requested page.
The expensive operation is the eviction; every cache miss incurs an eviction.
Our objective
At each time step 1 ≤ t ≤ m, we must decide which page (if any) to evict from the cache.
Definition 1 (Scheduling algorithm).
A schedule is a sequence of eviction decisions so that all m requests are serviced at time m. An algorithm that provides such a schedule is a scheduling algorithm.
Goal: find the schedule that minimizes the total number of cache misses.
Example 2.
#pagesinmainmemory: n=3
cache size: k = 2
sequence of m = 7 requests: a,b,c,b,c,a,b
timet: 1 2 3 4 5 6 7 requests: a, b, c, b, c, a, b
Example 2.
#pagesinmainmemory: n=3
cache size: k = 2
sequence of m = 7 requests: a,b,c,b,c,a,b
timet: 1 2 3 4 5 6 7
requests: eviction schedule S: cachecontents:
a, b, c, b, c, a, b
−, −, a, −, −, c, − {a}{a,b} {b,c} {b,c} {b,c} {b,a} {b,a}
− stands for “no eviction”
S = {−,−,a,−,−,c,−} evicts a at time 3, c at time 6 S incurs 2 cache misses (can’t do better)
Offline and online problems
Offline problem: the entire sequence of requests {r1,r2,…,rm} is part of the input (known at time t = 0).
Online problem (more natural): requests arrive one at a time; rt must be serviced at time t, before future requests rt+1,…,rm are seen.
A scheduling algorithm for the online problem can only base its eviction decision at time t on
1. the requests it has seen so far,
2. the eviction decisions it has made so far.
The optimal offline algorithm provides a lower bound on the performance of any online algorithm.
Cache maintenance (the offline problem)
An optimal greedy algorithm for the offline problem: Farthest-into-Future (FF)
Proof of optimality of FF The online problem
The Farthest-into-Future (FF) rule
Definition 3 (Farthest-into-Future (FF) rule).
When the page requested at time i is not in the cache, evict from the cache the page that is needed the farthest into the future and bring in the requested page.
Notation: we will denote the schedule produced by this algorithm SF F .
Example: the schedule S in Example 2 is the schedule produced by FF.
Reduced schedules
Definition 4 (Reduced schedule).
A reduced schedule brings a page in the cache at time t only if 1. the page is requested at time t; and
2. the page is not already in the cache.
1. In a sense, a reduced schedule performs the least amount of work at every time step.
2. FF is a reduced schedule.
There is an optimal reduced schedule
We can transform a non-reduced schedule into a reduced one that is at least as good, that is, incurs at most the same number of evictions.
The expensive memory operation is the eviction: so we should be minimizing #evictions and not #cache misses.
By Definition 4, #cache misses = #evictions in reduced schedules.
By Fact 5, we can focus solely on reduced schedules to find the optimal schedule.
⇒ Thus our original goal of minimizing #cache misses (rather than #evictions) is justified.
Proof of Fact 5
Let S′ be a schedule that is not reduced and solves an instance of cache maintenance.
We will construct a reduced schedule S that incurs at most as many evictions as S′.
Suppose that at time i, there is a request ri ̸= a but S′ evicts a page from the cache to bring in page a, although a is not requested at time i.
Then S pretends to bring in a but in fact does nothing: at thefirsttimestepj>isuchthatrj =a,Sbringsina.
⇒ We can charge the cache miss of S at time j to the eviction of S′ at the earlier time i.
Thus S performs at most as many evictions as S′.
Example 6.
#pagesinmainmemory: n=4
cache size: k = 3
sequence of m = 9 requests: a,b,c,d,b,c,a,d,b
rt = request at time t
CS(t) = contents of the cache of schedule S at end of time t
Non-reduced schedule S
rt a b c d b c a d
bring a b c d c d b –
evict – – – c d b c –
CS(t) a {a,b} {a,b,c} {a,b,d} {a,b,c} {a,c,d} {a,b,d} {a,b,d} {a,b,d}
Example 6.
#pagesinmainmemory: n=4
cache size: k = 3
sequence of m = 9 requests: a,b,c,d,b,c,a,d,b
rt = request at time t
CS(t) = contents of the cache of schedule S at end of time t
Non-reduced schedule S
rt a b c d b c a d
bring a b c d c d b –
evict – – – c d b c –
Blue entries denote cache misses.
Red entries denote evictions when no cache miss occurred.
S evicts pages d, b, c at times 5, 6, 7, even though these pages are not requested at these times (no cache misses). S′ performs the exact same evictions that S performed at times 5, 6, 7 at the later times 6, 8, 9 respectively, when these pages were actually requested (hence cache misses were incurred).
CS(t) a {a,b} {a,b,c} {a,b,d} {a,b,c} {a,c,d} {a,b,d} {a,b,d} {a,b,d}
Reduced schedule S′
rt a b c d b c a d
bring a b c d – c – d
evict – – – c – d – b
CS′(t) a {a,b} {a,b,c} {a,b,d} {a,b,d} {a,b,c} {a,b,c} {a,c,d} {a,b,d}
Cache maintenance (the offline problem)
An optimal greedy algorithm for the offline problem:
Farthest-into-Future (FF) Proof of optimality of FF The online problem
Optimality of Farthest-into-Future
Let Si be a reduced schedule that makes the same eviction decisions as SF F up to time i, that is, up to request i. Then there is a reduced schedule Si+1 that
1. makes the same eviction decisions as SF F up to time t=i+1, that is, up to request i+1;
2. Si+1 incurs no more total cache misses than Si. Proposition 1.
The schedule SF F provided by the Farthest-into-Future algorithm is optimal.
Proof of Proposition 1: case i = 0
cm(S) = total #cache misses of schedule S
S∗ is an optimal reduced schedule
Schedule S follows schedule S′ up to request i if S makes the same eviction decisions as S′ up to the i-th request
i = 0: trivially, S∗ follows SFF up to request i = 0. By Claim 1, we can construct a reduced schedule S1 such that
1. S1 follows SFF up to request i = 1, 2. cm(S1) ≤ cm(S∗).
Proof of Proposition 1: case i > 0
Notation: cm(S) = total #cache misses of schedule S
i = 1: now S1 is a reduced schedule that follows SF F up to request i = 1. By Claim 1, we can construct a reduced schedule S2 such that
1. S2 follows SFF up to request i = 2, 2. cm(S2) ≤ cm(S1).
i = 2: now S2 is a reduced schedule that follows SF F up to request i = 2. By Claim 1, we can construct a reduced schedule S3 such that
1. S3 follows SFF up to request i = 3, 2. cm(S3) ≤ cm(S2).
Proof of Proposition 1: Sm = SF F
Notation: cm(S) = total #cache misses of schedule S
Applyingtheclaimforevery3≤i≤m−1,weobtaina
reduced schedule Sm that
1. follows SF F up to time m,
2. cm(Sm) ≤ cm(Sm−1).
Tracing back all the inequalities, we obtain
cm(Sm) ≤ cm(S∗).
Finally, since Sm follows SFF up to time m, SFF = Sm. Hence cm(SF F ) = cm(Sm) ≤ cm(S∗).
Thus SFF is optimal.
Optimality of Farthest-into-Future
Before we prove Claim 1, we slightly simplify notation to obtain the following claim.
Let S be a reduced schedule that makes the same eviction decisions as SF F up to time t = i, that is, up to request i. Then there is a reduced schedule S′ such that
1. S′ makes the same eviction decisions as SFF up to time t=i+1, that is, up to request i+1;
2. S′ incurs no more total cache misses than S.
Proof of Claim 2: a case-by-case analysis
cm(S) = total #cache misses of schedule S
Ci(S) = contents of the cache of schedule S at the end of
time step i
Since S and SF F have made the same scheduling decisions up to
time i, the following statements hold at the end of time step i. 1. The contents of their caches are identical, that is,
Ci(S) = Ci(SF F ).
2. So far, S has the same number of cache misses as SF F .
Case 1: request ri+1 = x ∈ Ci(S)
1. If page x requested at time i + 1 is in Ci(S), then x∈Ci(SFF)(recallthatCi(S)=Ci(SFF));
no cache miss for either schedule.
beginning of time step i+1
Ci(S) Ci(SFF)
end of time step i+1
SetS′=S;then
1. S′ follows SFF up to time i + 1 (S does!); 2. cm(S′) ≤ cm(S).
ri+1 = x ∈ Ci(S), no evictions
Case 2: ri+1 = x ̸∈ Ci(S) (cont’d)
2. If page x requested at time i + 1 is not in Ci(S), then x̸∈Ci(SFF)(recallthatCi(S)=Ci(SFF));
both schedules must bring x in, hence incur a cache miss.
2.1: If S and SFF both evict the same page p, set S′ = S:
1. S′ follows SFF up to time i + 1 (S does!), 2. cm(S′) ≤ cm(S).
beginning of time step i+1
end of time step i+1
ri+1 =x∉Ci(S)andS,SFF evictp
Ci(S) Ci(SFF)
Case 2.2: ri+1 = x ̸∈ Ci(S)
2.2: If S evicts p but SFF evicts q:
ByconstructionofSFF,qmustberequestedlaterinthe
future than p (recall the Farthest-into-Future rule).
At the end of time step i + 1, the cache contents for the two schedules will differ in exactly one item.
beginning of time step i+1
Ci(SFF) . .q . .p
end of time step i+1
Ci+1(SFF) Ci+1(S’)
ri+1 = x ∉ Ci(S) and S evicts p, SFF evicts q
Case 2.2: S evicts p, SF F evicts q
At the end of time step i+1
the cache of S contains q;
the cache of SF F contains p;
the remaining k − 1 items in both caches are the same;
Ci+1(SF F ) = Ci+1(S) − {q} + {p}.
Since we want S′ to follow SFF up to time i+1, S′ evicts q
from its cache as well. Hence
Ci+1(S′) = Ci+1(SF F ) = Ci+1(S) − {q} + {p}.
Roadmap for case 2.2: S evicts p, SF F evicts q
Notation: cm(S) = total #cache misses of schedule S
At the end of time step i+1,
the cache contents of S,S′ differ in exactly one item;
S′ follows SFF up to time i+1;
#cache misses of S = #cache misses of S ′ .
Goal: Ensure that S′ does not incur more misses than S for i + 1 < t ≤ m, so that cm(S′) ≤ cm(S).
Idea: Set S′ = S as soon as the cache contents of S, S′ are the same again.
1. Make Ct(S′) equal Ct(S) at the earliest t > i + 1 possible, while not incurring unnecessary misses.
2. Once Ct(S′) = Ct(S), set S′ = S.
⇒ If S′ has not incurred more misses than S between steps
i + 2 and t, then cm(S′) ≤ cm(S) and the claim holds.
Case 2.2.1: rt = x ̸∈ {p,q}, x ̸∈ Ct(S), S evicts q
For all t > i + 1, S′ follows S until one of the following happens for the first time:
2.2.1: rt = y ̸∈ {p,q}, and y ̸∈ Ct(S), and S evicts q. Since Ct(S) and Ct(S′) only differ in p, q, then y ̸∈ Ct(S′).
Set S′ to evict p and bring in y. Then Ct(S′) = Ct(S)!
beginning of time step t > i+1
Ct(S) ..q… Ct(S’)
end of time step t
rt =y∉Ct(S)andy≠p,qandSevictsq
Set S′ = S henceforth: S′ follows SFF up to time i + 1 and cm(S′) ≤ cm(S).
Case 2.2.2.1: rt = p, S evicts q
2.2.2: rt = p
2.2.2.1: If S evicts q, Ct(S) = Ct(S′)!
beginning of time step t > i+1
end of time step t
Ct(S) ..q… Ct(S’)
rt =p,andSevictsq
Set S′ = S henceforth: S′ follows SFF up to time i + 1 and cm(S′) < cm(S).
Case 2.2.2.2: rt = p, S evicts y ̸= q
2.2.2: rt = p
2.2.2.2: If S evicts y ̸= q from its cache, then S′ evicts y as
well and brings in q. Then Ct(S′) = Ct(S).
beginning of time step t > i+1
Ct(S) ..q..y
Ct(S’) . . y. . p q
end of time step t
rt = p and S evicts y≠q
Set S′ = S henceforth: S′ follows SFF up to time i + 1 and cm(S′) ≤ cm(S).
2.2.2.2: S′ is no longer reduced
S′ is no longer reduced: q was brought in when there was no request for q at time t (recall that rt = p).
Fortunately, we can use Fact 1 to transform S′ into a reduced schedule S that
incurs at most the same total #evictions as S′;
still follows SF F up to time i + 1: all the real evictions of
the reduced S will happen after time i + 1.
Hence we return S as the schedule that satisfies Claim 2.
2.2.3: rt = q
Can’t happen!
SFF evicted q and not p, hence q appears farther in the future
Hence one of the cases 2.2.1, 2.2.2 will happen first.
Complete roadmap
1. x ∈ Ci(S) set S′=S
2. x!Ci(S)
request ri+1 = x
2.1: both S,SFF evict p
2.2: S evicts p, SFF evicts q
rt = y ∉{p,q} y ∈ Ct(S)?
rt ∈ “#$%&
2.2.2: rt =p
2.2.2.1: S evicts q
2.2.3: rt =q Impossible
(contradicts FF)
2.2.2.2: S evicts y≠q
S evicts r≠q
2.2.1: S evicts q
do S, SFF evict the same page?
does S evict q?
request rt, t > i+1
does S evict q?
continue at 2.2
S′ evicts r, continue at 2.2
S′ evicts p, set S′=S
S′ evicts nothing, set S′=S
S′ evicts y, brings q; set S′=S
Cache maintenance (the offline problem)
An optimal greedy algorithm for the offline problem:
Farthest-into-Future (FF) Proof of optimality of FF The online problem
The online problem
Offline problem: the entire sequence of requests {r1,r2,…,rm} is part of the input (known at time t = 0).
Online problem (more natural): requests arrive one at a time; rt must be serviced at time t, before rt+1, . . . , rm are seen.
An online scheduling algorithm can only base its eviction decision at time t on
1. the requests it has seen so far;
2. the eviction decisions it has made so far.
The optimal offline algorithm provides a lower bound on the performance of any online algorithm.
The Least Recently Used principle
The Least Recently Used (LRU) principle: evict the page that was requested the longest ago.
Intuition: a running program will generally keep accessing the things it’s just been accessing (locality of reference).
Essentially Farthest-into-Future (FF) reversed in time.
LRU behaves well on average inputs.
However an adversary can devise a specific sequence of online requests that will cause LRU to perform very badly compared to the optimal offline algorithm (how?).
Worst-case input to LRU
#pages in main memory: n = 3
sizeofthecache: k=2
sequence of online requests
a,b,c,a,b,c,…,a,b,c
⇒ LRU: every request starting at time t = 3 is a miss, hence 3M − 2 misses.
⇒ FF: no more than 3M/2 misses.
Competitive ratio
More generally, if we have a sequence of nM online requests as above, for some integer M , LRU will incur nM − k misses, while FF will incur no more than ⌈nM/k⌉ misses.
Hence LRU may perform up to a factor of k times worse than FF.
(Online analysis) competitive ratio: the worst-case ratio between the performance of the online algorithm and the performance of the optimal offline algorithm.
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com