Quiz
1
Question 1
If 99.9% of your sequential program execution time is spent inside a loop that can be parallelized, what is the maximum speedup you can achieve via parallelizing this loop?
49
Quiz
1
Question 1
If 99.9% of your sequential program execution time is spent inside a loop that can be parallelized, what is the maximum speedup you can achieve via parallelizing this loop?
Solution: Maximum speedup = 1/(1-99.9%) = 1/0.1% = 1000
50
Quiz
1
Question 2
If a two-hour parallel program running on one thousand cores spends one hour in its parallel portion, what is the scaled speedup of the program on one thousand cores?
51
Quiz
1
Question 2
If a two-hour parallel program running on one thousand cores spends one hour in its parallel portion, what is the scaled speedup of the program on one thousand cores?
Solution: Scaled speedup = (1+1*1000)/2 = 500.5
52
Quiz
1
Question 3
A parallel program achieves a speedup of 2 on 4 cores. Is it likely to achieve a speedup of 4 on 8 cores?
53
Quiz
1
Question 3
A parallel program achieves a speedup of 2 on 4 cores. Is it likely to achieve a speedup of 4 on 8 cores?
Speedup = 1/[f+(1-f)/4]=2 Serial fraction f = 1/3 Speedup = 1/[1/3+(1-1/3)/8] = 2.4 < 4 Impossible to achieve a speedup of 4 on 8 cores
54
1
Quiz
Question 4
A computer animation program generates feature movies frame-by- frame. Each frame can be generated and written to its own file independently. If it takes 90 seconds to render a frame and 10 second to write a frame into a file, what is the maximum speedup you can achieve if you have 100 single-processor computers to use?
55
1
Quiz
Question 4
A computer animation program generates feature movies frame-by- frame. Each frame can be generated and written to its own file independently. If it takes 90 seconds to render a frame and 10 second to write a frame into a file, what is the maximum speedup you can achieve if you have 100 single-processor computers to use?
Solution: Maximum speedup = 100 because each processor can handle separate frames independently.
56
Quiz
1
Question 5
Let n>f(p) denote the Isoefficiency relation of a parallel system and M(n) denote the amount of memory for a problem of size n. Use the scalability function to rank the following parallel systems from most to least scalable:
(a). f(p)=p and M(n)=n;
(b). f(p)=p and M(n)=n^2;
(c). f(p)=p*log(p) and M(n)=n;
(d). f(p)=p*log(p) and M(n)=n^2.
57
Quiz
1
Question 5
Let n>f(p) denote the Isoefficiency relation of a parallel system and M(n) denote the amount of memory for a problem of size n. Use the scalability function to rank the following parallel systems from most to least scalable:
(a). f(p)=p and M(n)=n;
(b). f(p)=p and M(n)=n^2;
(c). f(p)=p*log(p) and M(n)=n;
(d). f(p)=p*log(p) and M(n)=n^2.
Solution:
a. Scalability function = M(f(p))/p = (Cp)/p = C
b. Scalability function = M(f(p))/p = (Cp)^2/p = C^2p
c. Scalability function = M(f(p))/p = (Cp*log(p))/p = C*log(p)
d. Scalability function = M(f(p))/p = (Cp*log(p))^2/p = C^2*p*log^2(p) From most scalable to least scalable is a > c > b > d.
58
1
Quiz
Question 6
Your program has an execution time of n^3 and uses n^2 bytes space when the problem size is n. You have worked very hard to make it fully parallel and reduced the communication time to 24𝑛2𝑙𝑜𝑔2𝑝 on p cores. If you want to achieve a speed up of 1000 with 1024 cores, what is the minimum memory each core must have? If you have 1024 cores with 1 GB (2^30 bytes) memory per core, what is the maximum speedup you can achieve?
59
1
Quiz
Question 6
Your program has an execution time of n^3 and uses n^2 bytes space when the problem size is n. You have worked very hard to make it fully parallel and reduced the communication time to 24𝑛2𝑙𝑜𝑔2𝑝 on p cores. If you want to achieve a speed up of 1000 with 1024 cores, what is the minimum memory each core must have? If you have 1024 cores with 1 GB (2^30 bytes) memory per core, what is the maximum speedup you can achieve?
𝑛3 1 1 𝑆𝑝𝑒𝑒𝑑𝑢𝑝 = 𝑛3/𝑝 + 24𝑛2𝑙𝑜𝑔𝑝 = 1 1024 = 1
2 1024 + 24𝑙𝑜𝑔2 /𝑛 1024 + 240/𝑛
= 1000 ⇒ 𝑛 = 10240000
So the minimum memory each core must have is n^2/p = 1024*10^8 bytes = 10^8 KB = 100GB
If each core only has 1GB, n^2 /p=n^2 /1024≤2^30 ⇒n≤2^20 ⇒
𝑠𝑝𝑒𝑒𝑑𝑢𝑝 ≤ 220 = 830 So the maximum speedup is 830.
220+240 210
60
1
Quiz
Question 7
Assume a parallel program takes 1000 seconds to finish when using 1 core and 600 seconds to finish when using 2 cores. Is it possible to finish the program within 400 seconds when using 4 cores? If your experiment actually takes 500 seconds when using 4 cores, what is the minimum execution time when using 6 cores?
61
1
Quiz
Question 7
Assume a parallel program takes 1000 seconds to finish when using 1 core and 600 seconds to finish when using 2 cores. Is it possible to finish the program within 400 seconds when using 4 cores? If your experiment actually takes 500 seconds when using 4 cores, what is the minimum execution time when using 6 cores?
Solution(1/2): The speedup is 5/3 when using 2 cores, so using Karp-
we can get serial fraction 𝑒 =
Ψ1 − 𝑝1 3 − 1 1 1 = 5 2 = 10 =
When
using 4 cores, the speedup = 1/[1/5+(1-1/5)/4] = 2.5 and execution
time = 1000/2.5 = 400 seconds. So, it’s possible to finish the program
within 400 seconds when using 4 cores.
1−1 1−1 1 5 𝑝22
62
1
Quiz
Question 7
Assume a parallel program takes 1000 seconds to finish when using 1 core and 600 seconds to finish when using 2 cores. Is it possible to finish the program within 400 seconds when using 4 cores? If your experiment actually takes 500 seconds when using 4 cores, what is the minimum execution time when using 6 cores?
Solution(2/2): Actually the speedup is 2 when using 4 cores, so using
Ψ1 − 𝑝1 1 − 1 1 1
= 2 4 = 4 =
Karp- we can get serial fraction 𝑒 =
1−1 1−1 3 3 𝑝44
p24
6
e 1/5 1/3 >=1/3
Therefore, when using 6 cores, serial fraction e =
13 ⇒ Ψ ≤ 49, so the minimum execution time is 444.4 seconds.
Ψ1 − 𝑝1
1 − 1 1 − 1
= Ψ 6 = Ψ 6 ≥
1−1 𝑝66
1−1 5
63
Pop Quiz
◼ A parallel program executing on 32 processors spends 5% of its time in sequential code. What is the scaled speedup of this program?
Example 1: Reduction
◼ Sequential algorithm complexity T(n,1) = (n)
◼ Parallel algorithm
◆Computational complexity = (n/p) ◆Communication complexity = (log p)
◼ Parallel overhead T0(n,p) = (p log p)
Reduction (continued)
◼ Isoefficiency relation: n C p log p
◼ We ask: To maintain same level of efficiency, how must
n increase when p increases? ◼ M(n) = n
M(Cplogp)/ p=Cplogp/ p=Clogp ◼ The system has good scalability
Example 2: Floyd’s Algorithm
◼ Sequential time complexity: T(n,1)=(n3) ◼ Parallel computation time: (n3/p)
◼ Parallel communication time: (n2log p) ◼ Parallel overhead: T0(n,p) = (pn2log p)
Floyd’s Algorithm (continued)
◼ Isoefficiency relation
n3 C(p n2 log p) n C p log p
◼ M(n) = n2
M(Cplogp)/p=C2p2log2 p/p=C2plog2 p
◼ The parallel system has poor scalability
Example 3: Finite Difference
◼ Sequential time complexity per iteration: T(N,1)=(n2)
◼ Parallel communication complexity per iteration: (n/p) ◼ Parallel overhead: T_o(n,p)=(n p)
◼ Space Complexity: M(n) = n2
◼ What is the scalability function?
Finite Difference (continued)
◼ Isoefficiency relation n2 Cnp n C p
◼ M(n) = n2
M(C p)/p=C2p/p=C2
◼ This algorithm is perfectly scalable