Quiz
49
1
Question 1
If 99.9% of your sequential program execution time is spent inside a
loop that can be parallelized, what is the maximum speedup you can
achieve via parallelizing this loop?
Quiz
50
1
Question 1
If 99.9% of your sequential program execution time is spent inside a
loop that can be parallelized, what is the maximum speedup you can
achieve via parallelizing this loop?
Solution: Maximum speedup = 1/(1-99.9%) = 1/0.1% = 1000
Quiz
51
1
Question 2
If a two-hour parallel program running on one thousand cores spends
one hour in its parallel portion, what is the scaled speedup of the
program on one thousand cores?
Quiz
52
1
Question 2
If a two-hour parallel program running on one thousand cores spends
one hour in its parallel portion, what is the scaled speedup of the
program on one thousand cores?
Solution: Scaled speedup = (1+1*1000)/2 = 500.5
Quiz
53
1
Question 3
A parallel program achieves a speedup of 2 on 4 cores. Is it likely to
achieve a speedup of 4 on 8 cores?
Quiz
54
1
Question 3
A parallel program achieves a speedup of 2 on 4 cores. Is it likely to
achieve a speedup of 4 on 8 cores?
Speedup = 1/[f+(1-f)/4]=2 Serial fraction f = 1/3 Speedup =
1/[1/3+(1-1/3)/8] = 2.4 < 4 Impossible to achieve a speedup of 4
on 8 cores
Quiz
55
1
Question 4
A computer animation program generates feature movies frame-by-
frame. Each frame can be generated and written to its own file
independently. If it takes 90 seconds to render a frame and 10 second
to write a frame into a file, what is the maximum speedup you can
achieve if you have 100 single-processor computers to use?
Quiz
56
1
Question 4
A computer animation program generates feature movies frame-by-
frame. Each frame can be generated and written to its own file
independently. If it takes 90 seconds to render a frame and 10 second
to write a frame into a file, what is the maximum speedup you can
achieve if you have 100 single-processor computers to use?
Solution: Maximum speedup = 100 because each processor can
handle separate frames independently.
Quiz
57
1
Question 5
Let n>f(p) denote the Isoefficiency relation of a parallel system and
M(n) denote the amount of memory for a problem of size n. Use the
scalability function to rank the following parallel systems from most to
least scalable:
(a). f(p)=p and M(n)=n;
(b). f(p)=p and M(n)=n^2;
(c). f(p)=p*log(p) and M(n)=n;
(d). f(p)=p*log(p) and M(n)=n^2.
Quiz
58
1
Question 5
Let n>f(p) denote the Isoefficiency relation of a parallel system and
M(n) denote the amount of memory for a problem of size n. Use the
scalability function to rank the following parallel systems from most to
least scalable:
(a). f(p)=p and M(n)=n;
(b). f(p)=p and M(n)=n^2;
(c). f(p)=p*log(p) and M(n)=n;
(d). f(p)=p*log(p) and M(n)=n^2.
Solution:
a. Scalability function = M(f(p))/p = (Cp)/p = C
b. Scalability function = M(f(p))/p = (Cp)^2/p = C^2p
c. Scalability function = M(f(p))/p = (Cp*log(p))/p = C*log(p)
d. Scalability function = M(f(p))/p = (Cp*log(p))^2/p = C^2*p*log^2(p)
From most scalable to least scalable is a > c > b > d.
Quiz
59
1
Question 6
Your program has an execution time of n^3 and uses n^2 bytes space
when the problem size is n. You have worked very hard to make it fully
parallel and reduced the communication time to 24𝑛2𝑙𝑜𝑔2
𝑝
on p cores.
If you want to achieve a speed up of 1000 with 1024 cores, what is the
minimum memory each core must have? If you have 1024 cores with 1
GB (2^30 bytes) memory per core, what is the maximum speedup you
can achieve?
Quiz
60
1
Question 6
Your program has an execution time of n^3 and uses n^2 bytes space
when the problem size is n. You have worked very hard to make it fully
parallel and reduced the communication time to 24𝑛2𝑙𝑜𝑔2
𝑝
on p cores.
If you want to achieve a speed up of 1000 with 1024 cores, what is the
minimum memory each core must have? If you have 1024 cores with 1
GB (2^30 bytes) memory per core, what is the maximum speedup you
can achieve?
𝑆𝑝𝑒𝑒𝑑𝑢𝑝 =
𝑛3
𝑛3/𝑝 + 24𝑛2𝑙𝑜𝑔2
𝑝 =
1
1
1024
+ 24𝑙𝑜𝑔2
1024/𝑛
=
1
1
1024
+ 240/𝑛
= 1000 ⇒ 𝑛 = 10240000
So the minimum memory each core must have is n^2/p = 1024*10^8
bytes = 10^8 KB = 100GB
If each core only has 1GB, n^2 /p=n^2 /1024≤2^30 ⇒n≤2^20 ⇒
𝑠𝑝𝑒𝑒𝑑𝑢𝑝 ≤
220
220
210
+240
= 830 So the maximum speedup is 830.
Quiz
61
1
Question 7
Assume a parallel program takes 1000 seconds to finish when using 1
core and 600 seconds to finish when using 2 cores. Is it possible to
finish the program within 400 seconds when using 4 cores? If your
experiment actually takes 500 seconds when using 4 cores, what is
the minimum execution time when using 6 cores?
Quiz
62
1
Question 7
Assume a parallel program takes 1000 seconds to finish when using 1
core and 600 seconds to finish when using 2 cores. Is it possible to
finish the program within 400 seconds when using 4 cores? If your
experiment actually takes 500 seconds when using 4 cores, what is
the minimum execution time when using 6 cores?
Solution(1/2): The speedup is 5/3 when using 2 cores, so using Karp-
Flatt Metric we can get serial fraction 𝑒 =
1
Ψ
−
1
𝑝
1−
1
𝑝
=
3
5
−
1
2
1−
1
2
=
1
10
1
2
=
1
5
When
using 4 cores, the speedup = 1/[1/5+(1-1/5)/4] = 2.5 and execution
time = 1000/2.5 = 400 seconds. So, it’s possible to finish the program
within 400 seconds when using 4 cores.
Quiz
63
1
Question 7
Assume a parallel program takes 1000 seconds to finish when using 1
core and 600 seconds to finish when using 2 cores. Is it possible to
finish the program within 400 seconds when using 4 cores? If your
experiment actually takes 500 seconds when using 4 cores, what is
the minimum execution time when using 6 cores?
Solution(2/2): Actually the speedup is 2 when using 4 cores, so using
Karp-Flatt Metric we can get serial fraction 𝑒 =
1
Ψ
−
1
𝑝
1−
1
𝑝
=
1
2
−
1
4
1−
1
4
=
1
4
3
4
=
1
3
Therefore, when using 6 cores, serial fraction e =
1
Ψ
−
1
𝑝
1−
1
𝑝
=
1
Ψ
−
1
6
1−
1
6
=
1
Ψ
−
1
6
5
6
≥
1
3
⇒ Ψ ≤
9
4
, so the minimum execution time is 444.4 seconds.
p 2 4 6
e 1/5 1/3 >=1/3
◼ A parallel program executing on 32 processors spends 5%
of its time in sequential code. What is the scaled speedup of
this program?
Pop Quiz
◼ Sequential algorithm complexity
T(n,1) = (n)
◼ Parallel algorithm
◆Computational complexity = (n/p)
◆Communication complexity = (log p)
◼ Parallel overhead
T0(n,p) = (p log p)
Example 1: Reduction
◼ The system has good scalability
◼ Isoefficiency relation: n C p log p
◼ We ask: To maintain same level of efficiency, how must
n increase when p increases?
◼ M(n) = n
M (Cplogp) / p = Cplogp / p = Clogp
Reduction (continued)
◼ Sequential time complexity: T(n,1)=(n3)
◼ Parallel computation time: (n3/p)
◼ Parallel communication time: (n2log p)
◼ Parallel overhead: T0(n,p) = (pn2log p)
Example 2: Floyd’s Algorithm
◼ Isoefficiency relation
n3 C(p n2 log p) n C p log p
◼ M(n) = n2
◼ The parallel system has poor scalability
M (Cplogp) / p =C2 p2 log2 p / p =C2 p log2 p
Floyd’s Algorithm (continued)
◼ Sequential time complexity per iteration: T(N,1)=(n2)
◼ Parallel communication complexity per iteration: (n/p)
◼ Parallel overhead: T_o(n,p)=(n p)
◼ Space Complexity: M(n) = n2
◼ What is the scalability function?
Example 3: Finite Difference
◼ Isoefficiency relation
n2 Cnp n C p
◼ M(n) = n2
◼ This algorithm is perfectly scalable
M (C p) / p =C2 p / p =C2
Finite Difference (continued)