CS计算机代考程序代写 python algorithm data structure Java database COMP2100/COMP6442

COMP2100/COMP6442
Algorithms Part I
Sid Chi
[Lecture 5]

Kin Chau
1

Why are Algorithms Important?
• Major functions of computers • Problem solving
• Data processing
• Computing, etc.
• Programming is more than correctness
• We have to write “efficient” code to solve problems with fast running time and small computational resources
• Algorithms are systematic ways of solving specific problems
• Example: How to sort, search, and process data
2

Recap from Data Structures
• Data structures are optimized for efficient data operations • Fast search, insertion, deletion, enumeration
• Consider searching an item with a given key • Linked list
• Search sequentially in the list
• Take at most n steps • Binary search tree
• Search recursively in one of the sub-trees
• Faster on average, still take at most n steps in an unbalanced tree • Red-black tree
• Search recursively in one of the sub-trees in always balanced tree • Take at most log(n) steps
3

Goals of This Lecture
• Learn a general pattern for efficient algorithms • “Divide and conquer”
• Reduce to simpler sub-problems and solve sub-problems recursively • Intuitive approach
• Often quite efficient
• Applications
• Binary search
• Tower of Hanoi
• Merge sort
• Karatsuba multiplication
• Something else to learn
• How to compare the efficiency of different algorithms
4

Divide and Conquer Paradigm
• “Nothing is particularly hard if you divide it into small jobs” Henry Ford
• Idea: break up a problem into smaller (easier) sub-problems Big Problem
Harder
Smaller Problem
Yet Smaller Yet Smaller Problem Problem
Smaller Problem
Yet Smaller Yet Smaller Problem Problem
Easier 5

General Search Algorithms
• Search something in an unstructured space • Specifically, search in a Maze
• Each step gives a result • Found or not
• Options of next action
• WalkNorth,South,East,West
• Able to backtrack • Breath-first search
• Explore all possible options recursively at each step • Depth-first search
• Explore only one option at each step until hitting a dead-end, then backtrack and explore another option
6

• Let n be the steps of the longest path • Breath-first search
• Explore all possible options recursively at each step; Take at most n steps • Depth-first search
• Explore only one option at each step until hitting a dead-end, then backtrack and explore another option; Take at most n steps
Breath-first search Depth-first search
7

Demo
https://www.cs.usfca.edu/~galles/visualization/BFS.html
8

Demo
https://www.cs.usfca.edu/~galles/visualization/DFS.html
9

Search in “Structured” Space
• Consider an ordered list according to certain IDs • Examples
• Phone book (ID: name)
• Dictionary (ID: vocabulary) • Database (ID: record ID)
• Search an item in a linearly ordered list • Find a person record in a database
Abraham
Clark Jackson
Peterson
Smith
Find Maclaren?
10

Binary Search
• Let the n-th item in the list be list[n]
BinarySearch[x,(1,…,n)]
//Search x in a list with list[1]<... list[n/2], return BinarySearch[x,(n/2,…,n)] If x < list[n/2], return BinarySearch[x,(1,...,n/2)] list[5n/8] list[n/2] list[3n/4] Find x? 11 Binary Search • Let the n-th item in the list be list[n] • Binary search reduces the search range iteratively • Divide the search range by half at each step • Until only 1 item in the range • It takes at most log(n) steps n n/2 n/4 n/8 n/16 . . 1 • 2log(n) = n, where log is of base 2 • Binary search is only possible . for structured search space • Example: linearly ordered list 12 log(n) Tower of Hanoi • Consists of three rods and n disks of various sizes • Goal: Move all disks from one rod to another rod • Rule: Only smaller disks can be moved on top of a bigger disk • How to accomplish the goal? How many steps with n disks? Left rod Center rod Right rod 13 Tower of Hanoi • Let Hanoi[n,right] be the algorithm that moves the smallest n disks to the right rod • Hanoi[n-1,right] solves the subproblem that moves the smallest n-1 disks Hanoi[n, r] //Move the smallest n disks to rod r If n > 1 Then
Call Hanoi[n-1,center] //Move the smallest n-1 disks to the center rod Move the remaining n-th disk to the right rod
Call Hanoi[n-1, r] //Move the smallest n-1 disks to the right rod
Else
Move the smallest disk to rod r
14

Hanoi[n,right] Hanoi[n-1,middle]
Hanoi[n-1,right]
• How many steps are needed for Hanoi[n,right]? • At most 2n steps
15

How to Sort
• Sorting is one of most important operations • Find the smallest, largest, k-th ordered, median • Produces structured data spaces (for searching)
• Need efficient systematic methods for sorting
• Key Questions:
• How do we sort efficiently?
• What is the minimum running time for sorting? • How much memory resource does it need?
16 17
9 101112131415
18 19 20 21 12345678
16

Insertion Sort
• Insertion sort:
• Create an empty list
• Insert an item one by one in the list
• The inserted item must be placed in the sorted order
in the list
• What is the running time
• The k-th inserted item will have at most (k-1) comparisons with the existing items in the list
• Hence, the total number of comparison is
• T(n) = 1 + 2 + … + n-1 = n (n-1)/2
3 11 13
20 2
2 3 11 13
20 12
2 3 11 12 13 20
10
2 3 10 11 12 13 20
17

Merge Sort
• Sort a list of n items
• Divide into two sublists of n/2 items
• Sort each sublist of n/2 items
• Divide into two sublists of n/4 items • Sort each sublist of n/4 items
• Divide into two sublists of n/8 items
• Sort each sublist of n/8 items
• ……
• Divide into two sublists of 1 item
• Merge two sublists of 1 item
• ……
• Merge two n/8-item sublists
• Mergetwon/4-itemsublists • Merge two n/2-item sublists
18

Merge Sort
• Sort a list of n items
• Divide into two sublists of n/2 items
• Sort each sublist of n/2 items
• Divide into two sublists of n/4 items • Sort each sublist of n/4 items
• Divide into two sublists of n/8 items
• Sort each sublist of n/8 items
• ……
• Divide into two sublists of 1 item
• Merge two sublists of 1 item
• ……
• Merge two n/8-item sublists
• Mergetwon/4-itemsublists • Merge two n/2-item sublists
Unsorted List
415269738
1 4 2 5 6 9 3 7 8
124536798
123456798
123456789
19
Sorted List

MergeSort[1,…,n]
//Sort a list of list[1],…,list[n]
//Sort each sublist
list1  MergeSort[1,…,n/2] list2  MergeSort[n/2+1,…,n]
Return Merge[list1, list2]
Merge[list1, list2]
//Merge two sorted lists list1 and list2
i  1, j  1 Do
If list1[i] < list2[j] Then Insert list1[i] into newlist i  i+1 Else Insert list2[j] into newlist j  j+1 While i  Length[list1] and j  Length[list2] If j > Length[list2] Then
Insert the rest of list1 into newlist
Else
Insert the rest of list2 into newlist Return newlist
• Assumption
• n is even at each step
• Simple modification for odd n
20

Running Time
• Comparison
T(n) ?
Running time for sorting a list of n items
Running time for merging two lists of n/2 sorted items
T(n/2)
Running time for sorting a list of n/2 items
T(n/2)
41526973
14256937
12453679
12345679
T(n)
T(n/2)
T(n) = 2 T(n/2) + ?
?
21

Running Time
• How to merge two sorted lists into a sorted list?
1. Compare the smallest unmerged elements in the lists 2. Insert the smaller element into the new list
3. Repeat until all elements are merged
T(n) Running time for sorting a list of n items
? Running time for merging two lists of n/2 sorted items
T(n/2)
Running time for sorting a list of n/2 items
1245 Sorted List 2 3 6 7 9
Sorted List 1
22

Running Time
• What is the running time for merging two sorted lists?
1. Running time is proportional to the numbers of comparisons and insertions 2. Number of insertions = n, Number of comparisons = n/2
3. Total running time is proportional to n
T(n) Running time for sorting a list of n items
n Running time for merging two lists of n/2 sorted items
T(n/2)
Running time for sorting a list of n/2 items
1245 Sorted List 2 3 6 7 9
Sorted List 1
23

Proof of
T(n) = 2 T(n/2) + n
= 4 T(n/4) + 2(n/2) + n = 4 T(n/4) + 2n

= n T(1) + log(n) n
 C n log(n)
Running Time
𝑛
𝑛𝑛 22
𝑛𝑛𝑛𝑛 4444
𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 88888888
log(n) layers
⋮⋮⋮⋮⋮⋮⋮⋮ 1⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯1
n items T(n) Running time for sorting a list of n items
T(n/2)
Running time for sorting a list of n/2 items
n Running time for merging two lists of n/2 sorted items
24

Comparison of
Running Time
Merge Sort
Quick Sort Heap Sort Insertion Sort
Worst-case Running Time
C n2
C n2
C n log(n)
C n2
C n log(n)
Bubble Sort
Shell Sort
C n log2(n)
25

Some Reflections
• Why Divide-and-Conquer works:
1. Recurrent sub-structures:
• Possible to break into smaller sub-problems
• E.g., sorting n items reduces to sorting n/2 items
2. Compact memory size:
• Need small memory to store results of sub-problems
• E.g., Need only n-array to store n items
3. Efficient merging:
• Merging sub-problems is efficient
• E.g., Merge 2 sub-lists of n/2 sorted items takes at most n steps
26

How About : Merge Sort for Intervals?
• Intervals
• Earlier starting times always appear first
• For the same starting times, earlier ending times appear first
• How do we sort a list of intervals?
Unsorted Intervals Sorted Intervals
a
f
b
a
c
b
d
d
e
e
c
f
g
g
27

How About : Merge Sort for Intervals?
• Interval a  Interval b 
(a.start-time  b.start-time) or
(a.start-time = b.start-time and a.end-time  b.end-time)
Running time: T(n) = 2 T(n/2) + n  C n log(n)
g
a
a
b
b
d
c
d
f
b
a
c
d
e
f
e
e
c
f
g
g
28

How to Multiply
• Multiplication is fundamental • Traditional approach
• Digital-wise multiplication and multiplication table look-up
• Not efficient for large-number multiplication • Efficient large-scale multiplication is needed
• Computer graphics, e.g., 3D ray tracing • Cryptography, e.g., RSA cryptography
• Key Questions:
• How do we multiply efficiently?
• Any better way beyond simple multiplication table look-up? • Surprisingly, divide-and-conquer can apply well
29

Integer Multiplication 101
• Assumption: we skip the simple operations, e.g., additions and 1-digit multiplications (by multiplication table look-up)
• Long multiplication algorithm
45

97

Integer Multiplication 101
• Consider multiplying a pair of large numbers • Long multiplication algorithm
1234567895931413 4563823520395533

Integer Multiplication 101
• How many 1-digital multiplications are needed? • There are about n2 1-digit multiplications
n
1233925720752752384623764283568364918374523856298  4562323582342395285623467235019130750135350013753

Measure of Running Time
• How do we measure the running time of an algorithm?
• Do we need test on different platforms? • Java? Python? C++?
• How about future platforms?
• Can’t depend on the current platforms
• We should measure how the running time scales with the size of input • Big-O notation
33

Measure of Running Time
• Rough approximation of running time of long multiplication algorithm by hand
(By hand)
34

Measure of Running Time
• Both scales like n2 either way
(By hand)
(By computer)
35

Asymptotic Analysis
• How does the runtime scale with the size of the input? • Running time of long multiplication algorithm scales like n2
• You will learn how to use a formal notation (Big-O notation)
36

A New Multiplication Algorithm
• Hypothetically… A magic multiplication algorithm that scales like n1.6 (Long multiplication by hand)
(Magic multiplication algorithm) (Long multiplication by computer)
37

A New Multiplication Algorithm
• For large enough n, it is faster to do the magic algorithm by hand than long multiplication algorithm on computer
(Long multiplication by hand)
(Long multiplication by computer)
(Magic multiplication algorithm)
38


and

Conquer Multiplication
Divide
• Break up an integer: 1234 = 12  100 + 34
• One 4-digit multiplication = four 2-digit multiplications
1234  5678
= ( 12  100 + 34 ) ( 56  100 + 78 )
= ( 12  56 )10000 + ( 34  56 + 12  78 )100 + ( 34  78 )
1234
39

More Generally
• Break up an n-digit integer: [x1x2…xn] = [x1x2…xn/2]  10n/2 + [xn/2+1xn/2+2…xn] • One n-digit multiplication = four n/2-digit multiplications
xy
=(a10n/2 +b)(c10n/2 +d)
=(ac)10n +(ad + cb)10n/2 +(bd)
1234
40

Divide
• x, y are n-digit numbers
• Assume n is even
• a, b, c, d are n/2-digit numbers
• The algorithm recursively computes 4 multiplications with half of the number of digits

and

Conquer
Multiplication
Algorithm
DnCMultiply[x, y] //Divide-and-Conquer Multiplication
If n = 1 Then Return x  y
Writex=a  10n/2 +b Writey=c  10n/2 +d
//Recursivelycomputea  c,a  d,c  b,b  d
a  c  DnCMultiply[a, c], a  d  DnCMultiply[a, d] c  b  DnCMultiply[c, b], b  d  DnCMultiply[b, d]
//Addthemuptogetx  y
Return(a  c)10n +(a  d+c  b)10n/2 +(b  d)
41

Running Time?
• Better or worse than the long multiplication algorithm? • That is, does the number of multiplications grow like n2 ?
• More or less than that?
• How do we answer this question? • Run it
• Try to understand it analytically • Claim:
• The running time of DnCMultiply still has at least n2 multiplications 42

Divide and Conquer Paradigm
12345678 x 87654321
1234 x 8765
12×87 12×65 34×87 34×65
5678 x 8765
1234 x 4321
5678 x 4321
56 x 87
56 x 65 78 x 87 78 x 65 12 x 43 12 x 21 34 x 43 34 x 21 56 x 43 56 x 21 78 x 43 78 x 21
1×8 2×8 1×6 2×6 3×8 4×8 3×6 4×6 5×8 6×8 5×6 6×6 7×8 8×8 7×6 8×6 1×4 2×4 1×2 2×2 3×4 4×4 3×2 4×2 5×4 6×4 5×2 6×2 7×4 8×4 7×2 8×2 1×7 2×7 1×5 2×5 3×7 4×7 3×5 4×5 5×7 6×7 5×5 6×5 7×7 8×7 7×5 8×5 1×3 2×3 1×1 2×1 3×3 4×3 3×1 4×1 5×3 6×3 5×1 6×1 7×3 8×3 7×1 8×1
Every pair of digits still gets multiplied together separately
So the running time is still at least n2 43

Another Way to See This
• If you cut n in half log(n) times, you get down to 1
• 2log(n) = n
• So we do this log(n) times and get • 4log(n) = n2 problems of size 1
1 problem of size n
4 problems of size n/2 …………
4t problems of size n/2t …………
n2 problems of size 1 44

Proof
• Let T(n) be the running time to multiply two n-digit numbers
• Recurrence relation:
T(n) = 4 T(n/2) + (about n for additions) T(n)  4 T(n/2)
= 4 (4 T(n/4) ) = 42 T(n/22)
= 4 (4 (4 T(n/8) ) ) = 43 T(n/23) …
= 4t T(n/2t)

= 4log(n) T(n/2log(n)) = n2 T(1)
45

Karatsuba Multiplication
• Karatsuba figured out how to improve this!
• If only we recurse three times instead of four…
xy
=(a10n/2 +b)(c10n/2 +d)
=(ac)10n +(ad + cb)10n/2 +(bd)
123
46

Karatsuba Multiplication
• Recursively compute these THREE things •ac
•bd
• (a + b)  (c + d)
• Hence,
• a  d + c  b = (a + b)  (c + d) – a  c – b  d •xy
=(ac)10n +((a+b)(c+d)-ac-bd)10n/2 +(bd) 123
47

Running Time
• If you cut n in half log(n) times, you get down to 1
• 2log(n) = n
• So we do this log(n) times and get
• 3log(n) = nlog(3) = n1.6 problems of size 1
1 problem of size n
3 problems of size n/2
…………
3t problems of size n/2t …………
n1.6 problems of size 1 48

Karatsuba Multiplication
• x, y are n-digit numbers
• Assume n is even
• a, b, c, d are n/2-digit numbers
• The algorithm recursively computes 3 multiplications with half of the number of digits
Algorithm
KMultiply[x, y] //Divide-and-Conquer Multiplication
If n = 1 Then Return x  y
Writex=a  10n/2 +b Writey=c  10n/2 +d
//Recursivelycomputea  c,b  d,(a+b)  (c+d) a  c  KMultiply[a, c], b  d  KMultiply[b, d] (a+b)  (c+d)  KMultiply[a+b,c+d]
a  d + c  b  (a + b)  (c + d) – a  c – b  d
//Addthemuptogetx  y
Return(a  c)10n +(a  d+c  b)10n/2 +(b  d)
49

Karatsuba Multiplication is Faster
(Long multiplication)
(Karatsuba multiplication)
50

Can We Multiply Faster?
• Toom-Cook (1963) • Scales like n1.465
• Schönhage–Strassen (1971) • Scales like n log(n) loglog(n)
• Furer (2007)
• Scales like n log(n) 2log*(n)
• Harvey-van der Hoeven (2019) • Scales like n log(n)
• Impractical, requires gigantic n
51

Summary
• Divide and conquer • Binary search
• Tower of Hanoi • Merge sort
• Karatsuba multiplication • As fast as O(n1.6)
• Some understanding of measuring runtime in terms of scaling law as the input size
• Applied the neat trick of divide and conquer
52

53