程序代写代做代考 go graph algorithm CSC373

CSC373
Week 2: Greedy Algorithms
Nisarg Shah
373F20 – Nisarg Shah 1

Recap
• Divide & Conquer
➢ Master theorem
➢ Counting inversions in 𝑂(𝑛 log 𝑛)
➢ Finding closest pair of points in R2 in 𝑂 𝑛 log 𝑛
➢ Fast integer multiplication in 𝑂 𝑛log2 3
➢ Fast matrix multiplication in 𝑂 𝑛log2 7
➢ Finding 𝑘𝑡h smallest element (in particular, median) in 𝑂(𝑛)
373F20 – Nisarg Shah 2

Greedy Algorithms
• Greedy (also known as myopic) algorithm outline ➢ We want to find a solution 𝑥 that maximizes some
objective function 𝑓
➢ But the space of possible solutions 𝑥 is too large
➢ The solution 𝑥 is typically composed of several parts (e.g. 𝑥 may be a set, composed of its elements)
➢ Instead of directly computing 𝑥…
o Compute it one part at a time
o Select the next part “greedily” to get maximum immediate benefit (this needs to be defined carefully for each problem)
o May not be optimal because there is no foresight o But sometimes this can be optimal too!
373F20 – Nisarg Shah 3

Interval Scheduling
• Problem
➢ Job 𝑗 starts at time 𝑠 and finishes at time 𝑓
𝑗𝑗
➢ Two jobs are compatible if they don’t overlap
➢ Goal: find maximum-size subset of mutually compatible jobs
373F20 – Nisarg Shah 4

Interval Scheduling
• Greedy template
➢ Consider jobs in some “natural” order
➢ Take each job if it’s compatible with the ones already chosen
• What order?
➢ Earliest start time: ascending order of 𝑠
➢ Earliest finish time: ascending order of 𝑓 𝑗
➢ Shortest interval: ascending order of 𝑓 − 𝑠 𝑗𝑗
➢ Fewest conflicts: ascending order of 𝑐 , where 𝑐 is the 𝑗𝑗
number of remaining jobs that conflict with 𝑗
𝑗
373F20 – Nisarg Shah 5

Example
• Earliest start time: ascending order of 𝑠 𝑗
• Earliest finish time: ascending order of 𝑓 𝑗
• Shortest interval: ascending order of 𝑓 − 𝑠 𝑗𝑗
• Fewest conflicts: ascending order of 𝑐 , where 𝑐 is the number of
remaining jobs that conflict with 𝑗
𝑗𝑗
373F20 – Nisarg Shah 6

Interval Scheduling
• Does it work? Counterexamples for earliest start time
shortest interval fewest conflicts
373F20 – Nisarg Shah 7

Interval Scheduling
• Implementing greedy with earliest finish time (EFT) ➢ Sort jobs by finish time. Say 𝑓 ≤ 𝑓 ≤ ⋯ ≤ 𝑓
12𝑛
➢ When deciding on job 𝑗, we need to check whether it’s
compatible with all previously added jobs
o We only need to check if 𝑠 ≥ 𝑓∗, where 𝑖∗ is the last added job
𝑗𝑖
o This is because for any jobs 𝑖 added before 𝑖∗, 𝑓 ≤ 𝑓 ∗ 𝑖𝑖
o So we can simply keep track of the finish time of the last added job
➢ Running time: 𝑂 𝑛 log 𝑛
373F20 – Nisarg Shah 8

Interval Scheduling
• Optimality of greedy with EFT
➢ Suppose for contradiction that greedy is not optimal
➢ Say greedy selects jobs 𝑖1, 𝑖2, … , 𝑖𝑘 sorted by finish time ➢ Consider the optimal solution 𝑗 , 𝑗 , … , 𝑗 (also sorted by
finish time) which matches greedy for as long as possible
o That is, we want 𝑗 = 𝑖 , … , 𝑗 = 𝑖 for greatest possible 𝑟 11𝑟𝑟
12𝑚
373F20 – Nisarg Shah 9

Interval Scheduling
Another standard method is induction
• Optimality of greedy with EFT
➢ Both 𝑖𝑟+1 and 𝑗𝑟+1 were compatible with the previous
selection(𝑖 =𝑗,…,𝑖 =𝑗) 11𝑟𝑟
➢Consider the solution 𝑖 ,𝑖 ,…,𝑖 ,𝑖 ,𝑗 ,…,𝑗 1 2 𝑟 𝑟+1 𝑟+2 𝑚
o It should still be feasible (since 𝑓 ≤ 𝑓 ) 𝑖𝑟+1 𝑗𝑟+1
o It is still optimal
o And it matches with greedy for one more step (contradiction!)
373F20 – Nisarg Shah 10

Interval Partitioning
• Problem
➢ Job 𝑗 starts at time 𝑠 and finishes at time 𝑓
𝑗𝑗
➢ Two jobs are compatible if they don’t overlap
➢ Goal: group jobs into fewest partitions such that jobs in
the same partition are compatible
• One idea
➢ Find the maximum compatible set using the previous greedy EFT algorithm, call it one partition, recurse on the remaining jobs.
➢ Doesn’t work (check by yourselves)
373F20 – Nisarg Shah 11

Interval Partitioning
• Think of scheduling lectures for various courses into as few classrooms as possible
• This schedule uses 4 classrooms for scheduling 10 lectures
373F20 – Nisarg Shah 12

Interval Partitioning
• Think of scheduling lectures for various courses into as few classrooms as possible
• This schedule uses 3 classrooms for scheduling 10 lectures
373F20 – Nisarg Shah 13

Interval Partitioning
• Let’s go back to the greedy template!
➢ Go through lectures in some “natural” order
➢ Assign each lecture to an (arbitrary?) compatible classroom, and create a new classroom if the lecture conflicts with every existing classroom
• Order of lectures?
➢ Earliest start time: ascending order of 𝑠
➢ Earliest finish time: ascending order of 𝑓 𝑗
➢ Shortest interval: ascending order of 𝑓 − 𝑠 𝑗𝑗
➢ Fewest conflicts: ascending order of 𝑐 , where 𝑐 is the 𝑗𝑗
number of remaining jobs that conflict with 𝑗
𝑗
373F20 – Nisarg Shah 14

Interval Partitioning
• At least when you assign each lecture to an arbitrary compatible classroom, three of these heuristics do not work.
• The fourth one works! (next slide)
373F20 – Nisarg Shah 15

Interval Partitioning
373F20 – Nisarg Shah 16

Interval Partitioning
• Running time
➢ Key step: check if the next lecture can be scheduled at
some classroom
➢ Store classrooms in a priority queue
o key = latest finish time of any lecture in the classroom ➢ Is lecture 𝑗 compatible with some classroom?
o Same as “Is 𝑠 at least as large as the minimum key?” 𝑗
o If yes: add lecture 𝑗 to classroom 𝑘 with minimum key, and increase its key to 𝑓
o Otherwise: create a new classroom, add lecture 𝑗, set key to 𝑓 𝑗
➢ 𝑂(𝑛) priority queue operations, 𝑂(𝑛 log 𝑛) time
𝑗
373F20 – Nisarg Shah 17

Interval Partitioning
• Proof of optimality (lower bound)
➢ # classrooms needed ≥ maximum “depth” at any point
o depth = number of lectures running at that time
➢ We now show that our greedy algorithm uses only these many classrooms!
373F20 – Nisarg Shah 18

Interval Partitioning
• Proof of optimality (upper bound) ➢ Let 𝑑 = # classrooms used by greedy
➢ Classroom 𝑑 was opened because there was a schedule 𝑗 which was incompatible with some lectures already scheduled in each of 𝑑 − 1 other classrooms
➢ All these 𝑑 lectures end after 𝑠 𝑗
➢ Since we sorted by start time, they all start at/before 𝑠 𝑗
➢ So at time 𝑠 , we have 𝑑 mutually overlapping lectures 𝑗
➢ Hence, depth ≥ 𝑑
➢ So all schedules use ≥ 𝑑 classrooms. ∎
373F20 – Nisarg Shah 19

Interval Graphs
• Interval scheduling and interval partitioning can be seen as graph problems
• Input
➢ Graph 𝐺 = (𝑉, 𝐸)
➢ Vertices 𝑉 = jobs/lectures
➢ Edge 𝑖, 𝑗 ∈ 𝐸 if jobs 𝑖 and 𝑗 are incompatible
• Interval scheduling = maximum independent set (MIS)
• Interval partitioning = graph colouring
373F20 – Nisarg Shah 20

Interval Graphs
• MIS and graph colouring are NP-hard for general graphs
• But they’re efficiently solvable for “interval graphs” ➢ Graphs which can be obtained from incompatibility of
intervals
➢ In fact, this holds even when we are not given an interval representation of the graph
• Can we extend this result further?
➢ Yes! Chordal graphs
o Every cycle with 4 or more vertices has a chord
373F20 – Nisarg Shah 21

Minimizing Lateness
• Problem
➢ We have a single machine
➢ Each job 𝑗 requires 𝑡 units of time and is due by time 𝑑 𝑗𝑗
➢ If it’s scheduled to start at 𝑠 , it will finish at 𝑓 = 𝑠 + 𝑡 𝑗 𝑗𝑗𝑗
➢Lateness:l =max 0,𝑓 −𝑑 𝑗𝑗𝑗
➢ Goal: minimize the maximum lateness, 𝐿 = max l
• Contrast with interval scheduling ➢ We can decide the start time
➢ There are soft deadlines
𝑗
𝑗
373F20 – Nisarg Shah 22

Minimizing Lateness
• Example Input
An example schedule
373F20 – Nisarg Shah 23

Minimizing Lateness
• Let’s go back to greedy template
➢ Consider jobs one-by-one in some “natural” order
➢ Schedule jobs in this order (nothing special to do here, since we have to schedule all jobs and there is only one machine available)
• Natural orders?
➢ Shortest processing time first: ascending order of
processing time 𝑡 𝑗
➢ Earliest deadline first: ascending order of due time 𝑑 𝑗
➢ Smallest slack first: ascending order of 𝑑 − 𝑡 𝑗𝑗
373F20 – Nisarg Shah 24

Minimizing Lateness
• Counterexamples
➢ Shortest processing time first
o Ascending order of processing time 𝑡 𝑗
➢ Smallest slack first
o Ascending order of 𝑑 − 𝑡 𝑗𝑗
373F20 – Nisarg Shah 25

Minimizing Lateness
• By now, you should know what’s coming…
• We’ll prove that earliest deadline first works!
373F20 – Nisarg Shah 26

Minimizing Lateness
• Observation 1
➢ There is an optimal schedule with no idle time
373F20 – Nisarg Shah 27

Minimizing Lateness
• Observation 2
➢ Earliest deadline first has no idle time
• Let us define an “inversion”
➢ 𝑖,𝑗 such that 𝑑 < 𝑑 but 𝑗 is scheduled before 𝑖 𝑖𝑗 • Observation 3 ➢ By definition, earliest deadline first has no inversions • Observation 4 ➢ If a schedule with no idle time has an inversion, it has a pair of inverted jobs scheduled consecutively 373F20 - Nisarg Shah 28 Minimizing Lateness • Observation 5 ➢ Swapping adjacently scheduled inverted jobs doesn’t increase lateness but reduces #inversions by one • Proof ➢ Let 𝐿 and 𝐿′ denote lateness before/after swap ➢ Clearly, l = l′ for all 𝑘 ≠ 𝑖, 𝑗 𝑘𝑘 ➢ Also, clearly, l′ ≤ l 𝑖𝑖 373F20 - Nisarg Shah 29 Minimizing Lateness • Observation 5 ➢ Swapping adjacently scheduled inverted jobs doesn’t increase lateness but reduces #inversions by one • Proof ➢l′=𝑓′−𝑑 =𝑓−𝑑 ≤𝑓−𝑑 =l 𝑗𝑗𝑗𝑖𝑗𝑖𝑖𝑖 ′′′′ ➢𝐿 =max l,l,maxl ≤max l,l,maxl ≤𝐿 𝑖 𝑗𝑘≠𝑖,𝑗 𝑘 𝑖 𝑖𝑘≠𝑖,𝑗 𝑘 373F20 - Nisarg Shah 30 Minimizing Lateness • Observation 5 ➢ Swapping adjacently scheduled inverted jobs doesn’t increase lateness but reduces #inversions by one • Proof ➢ Check that swapping an adjacent inverted pair reduces the total #inversions by one 373F20 - Nisarg Shah 31 Minimizing Lateness • Proof of optimality of earliest deadline first ➢ Suppose for contradiction that it’s not optimal ➢ Consider an optimal schedule 𝑆∗ with fewest inversions among all optimal schedules o WLOG, suppose it has no idle time ➢ Because EDF is not optimal, 𝑆∗ has inversions ➢ By Observation 4, it has an adjacent inversion (𝑖, 𝑗) ➢ By Observation 5, swapping the adjacent pair keeps the schedule optimal but reduces the #inversions by 1 ➢ Contradiction! ∎ 373F20 - Nisarg Shah 32 Lossless Compression • Problem ➢ We have a document that is written using 𝑛 distinct labels ➢ Naïve encoding: represent each label using log 𝑛 bits ➢ If the document has length 𝑚, this uses 𝑚 log 𝑛 bits ➢ English document with no punctuations etc. ➢ 𝑛 = 26, so we can use 5 bits o 𝑎 = 00000 o 𝑏 = 00001 o 𝑐 = 00010 o 𝑑 = 00011 o... 373F20 - Nisarg Shah 33 Lossless Compression • Is this optimal? ➢ What if 𝑎, 𝑒, 𝑟, 𝑠 are much more frequent in the document than 𝑥, 𝑞, 𝑧? ➢ Can we assign shorter codes to more frequent letters? • Say we assign... ➢ 𝑎 = 0, 𝑏 = 1, 𝑐 = 01, ... ➢ See a problem? o What if we observe the encoding ‘01’? o Is it ‘ab’? Or is it ‘c’? 373F20 - Nisarg Shah 34 Lossless Compression • To avoid conflicts, we need a prefix-free encoding ➢ Map each label 𝑥 to a bit-string 𝑐(𝑥) such that for all distinct labels 𝑥 and 𝑦, 𝑐(𝑥) is not a prefix of 𝑐 𝑦 ➢ Then it’s impossible to have a scenario like this ............................. 𝑐(𝑥) 𝑐(𝑦) ➢ So we can read left to right o Whenever the part to the left becomes a valid encoding, greedily decode it, and continue with the rest 373F20 - Nisarg Shah 35 Lossless Compression • Formal problem ➢ Given 𝑛 symbols and their frequencies (𝑤 , ... , 𝑤 ), find a 1𝑛 prefix-free encoding with lengths (l1, ... , l𝑛) assigned to the symbols which minimizes σ𝑛 𝑤 ⋅ l 𝑖=1 𝑖 𝑖 o Note that σ𝑛 𝑤 ⋅ l is the length of the compressed document 𝑖=1 𝑖 𝑖 • Example ➢(𝑤 ,𝑤 ,𝑤 ,𝑤 ,𝑤 ,𝑤 ) = (42,20,5,10,11,12) 𝑎𝑏𝑐𝑑𝑒𝑓 ➢ No need to remember the numbers ☺ 373F20 - Nisarg Shah 36 Lossless Compression • Observation: prefix-free encoding = tree 𝑎 → 0, 𝑒 → 100, 𝑓 → 101, 𝑐 → 1100, 𝑑 → 1101, 𝑏 → 111 373F20 - Nisarg Shah 37 Lossless Compression • Huffman Coding ➢ Build a priority queue by adding 𝑥, 𝑤 for each symbol 𝑥 ➢ While |queue|≥ 2 o Take the two symbols with the lowest weight (𝑥, 𝑤 ) and (𝑦, 𝑤 ) o Merge them into one symbol with weight 𝑤 + 𝑤 • Let’s see this on the previous example 𝑥 𝑥𝑦 𝑥𝑦 373F20 - Nisarg Shah 38 Lossless Compression 373F20 - Nisarg Shah 39 Lossless Compression 373F20 - Nisarg Shah 40 Lossless Compression 373F20 - Nisarg Shah 41 Lossless Compression 373F20 - Nisarg Shah 42 Lossless Compression 373F20 - Nisarg Shah 43 Lossless Compression • Final Outcome 𝑎 → 0, 𝑒 → 100, 𝑓 → 101, 𝑐 → 1100, 𝑑 → 1101, 𝑏 → 111 373F20 - Nisarg Shah 44 Lossless Compression • Running time ➢𝑂(𝑛log𝑛) ➢ Can be made 𝑂(𝑛) if the labels are given to you sorted by their frequencies o Exercise! Think of using two queues... • Proof of optimality ➢ Induction on the number of symbols 𝑛 ➢ Base case: For 𝑛 = 2, both encodings which assign 1 bit to each symbol are optimal ➢ Hypothesis: Assume it returns an optimal encoding with 𝑛 − 1 symbols 373F20 - Nisarg Shah 45 Lossless Compression • Proof of optimality ➢ Consider the case of 𝑛 symbols ➢ Lemma 1: If 𝑤 < 𝑤 , then l ≥ l 𝑥𝑦𝑥𝑦 ➢ Proof: o Suppose for contradiction that 𝑤 < 𝑤 in any optimal tree. and l < l . 𝑥𝑦𝑥𝑦 o Swapping 𝑥 and 𝑦 strictly reduces the overall length as 𝑤 ⋅l +𝑤 ⋅l <𝑤 ⋅l +𝑤 ⋅l (check!) 𝑥𝑦𝑦𝑥𝑥𝑥𝑦𝑦 o QED! 373F20 - Nisarg Shah 46 Lossless Compression • Proof of optimality ➢ Consider the two symbols 𝑥 and 𝑦 with lowest frequency which Huffman combines in the first step ➢ Lemma 2: ∃ optimal tree 𝑇 in which 𝑥 and 𝑦 are siblings (i.e. for some 𝑝, they are assigned encodings 𝑝0 and 𝑝1). ➢ Proof: 1. Take any optimal tree 2. Let 𝑥 be the label with the lowest frequency. 3. If 𝑥 doesn’t have the longest encoding, swap it with one that has 4. Due to optimality, 𝑥 must have a sibling (check!) 5. If it’s not 𝑦, swap it with 𝑦 6. Check that Steps 3 and 5 do not change the overall length. ∎ 373F20 - Nisarg Shah 47 Lossless Compression • Proof of optimality ➢ Let 𝑥 and 𝑦 be the two least frequency symbols that Huffman combines in the first step into “𝑥𝑦” ➢ Let 𝐻 be the Huffman tree produced ➢ Let 𝑇 be an optimal tree in which 𝑥 and 𝑦 are siblings ➢ Let 𝐻′ and 𝑇′ be obtained from 𝐻 and 𝑇 by treating 𝑥𝑦 as one symbol with frequency 𝑤 + 𝑤 𝑥𝑦 ➢ Induction hypothesis: 𝐿𝑒𝑛𝑔𝑡h 𝐻′ ≤ 𝐿𝑒𝑛𝑔𝑡h(𝑇′) ➢𝐿𝑒𝑛𝑔𝑡h 𝐻 =𝐿𝑒𝑛𝑔𝑡h 𝐻′ + 𝑤 +𝑤 ⋅1 𝑥𝑦 ➢𝐿𝑒𝑛𝑔𝑡h𝑇=𝐿𝑒𝑛𝑔𝑡h𝑇′ +𝑤+𝑤 ⋅1 𝑥𝑦 ➢ So 𝐿𝑒𝑛𝑔𝑡h 𝐻 ≤ 𝐿𝑒𝑛𝑔𝑡h(𝑇) ∎ 373F20 - Nisarg Shah 48 Other Greedy Algorithms • If you aren’t familiar with the following algorithms, spend some time checking them out! ➢ Dijkstra’s shortest path algorithm ➢ Kruskal and Prim’s minimum spanning tree algorithms 373F20 - Nisarg Shah 49