CS计算机代考程序代写 Microsoft PowerPoint – COMP528 HAL05 performance challenges.pptx

Microsoft PowerPoint – COMP528 HAL05 performance challenges.pptx

Dr Michael K Bane, G14, Computer Science, University of Liverpool
m.k. .uk https://cgi.csc.liv.ac.uk/~mkbane/COMP528

COMP528: Multi-core and
Multi-Processor Programming

5 – HAL

Exploiting
Parallelism

Quadrature (integration)

• Problem Def: find integral of f(x) from x=a to x=b

0

2

4

6

8

10

12

0 0.5 1 1.5 2 2.5 3 3.5

Quadrature (integration)

• Problem Def: find integral of f(x) from x=a to x=b

• Approximate integral is
sum of areas under line

• Each area approximated by a rectangle

0

2

4

6

8

10

12

0 0.5 1 1.5 2 2.5 3 3.5

“quad.c” from
first lab

COMP328/COMP528 (c) mkbane, Univ of
Liverpool

• Approximate integral is
sum of areas under line

• Each area approximated by a rectangle (all of width “h”)

• Calculation of each area is independent of others,
can be done in any order

• ==> Calculate these in parallel…

0

2

4

6

8

10

12

0 0.5 1 1.5 2 2.5 3 3.5

x y=x+h

0.5*[f(x)+f(y)]

Specific example…

Each trapezoidal area is independent
Therefore we calculate in parallel

And the TOTAL integral is the SUM of
the areas of the 6 trapezoidals

This is one way we can distribute the
work between (e.g.) 3 cores

So this should go 3x faster than on 1
core

But there is some overheads – bringing
all the areas together to sum to get
the total integral

core
0

core
1

core
2

0 1
2 3

4 5

COMP328/COMP528 (c) mkbane, Univ of Liverpool

Many ways to distribute work between
resources

You want to
• Share work as evenly as possible
• Minimise overheads
• Do it logically so you understand
• Comment your code!

Scalability and Speedup

• Speedup is a ratio of the time it takes to run a program
without parallelism versus the time it runs in parallel;

• Scalability is a measure of how much speedup the program
gets as one adds more processor cores;

• The program does not scale beyond certain point when
adding more processors does not result in additional
speedup.

Speed Up & Efficiency

• Time on 1 core: t1
• Could be either sequential implementation or p=1 version of parallel code

• Time on p cores: t(p)

• Speed-up, S(p) = t1 / t(p)

• Ideal speed-up is when S(p) = p

• Super-linear speed-up

• Efficiency, E(p) = S(p)/p usually expressed as a percentage

• What is “good efficiency”?

COMP328/COMP528 (c) mkbane, University of
Liverpool

COMP328/COMP528 (c) Univ of Liverpool

COMP328/COMP528 (c) Univ of Liverpool

COMP328/COMP528 (c) Univ of Liverpool

Scaling

• Strong scaling
– Keep problem size fixed

– Time to solution decreases proportionally as #cores increases

• Weak scaling
– As increase problem size, increasing #cores will ensure

time to solution remains constant

How much parallelism is there?

• Amdahl’s law (1967)
– The computer program will never go faster than the sum of the

parts that do not run in parallel (the serial portions), no matter
how many processing elements we have

Amdahl’s Law

• Alpha : serial proportion of original code

• Tp = alpha*T1 + (1-alpha)*T1/p

• Sp = T1/Tp
• Thus Sp = 1 / (alpha + (1-alpha)/p)

• Max speed-up only dependent on the proportion of code
that is serial

• Max speed-up (p->inf): is 1/alpha

Amdahl’s Law

• Alpha : serial proportion of original code

• Tp = alpha*T1 + (1-alpha)*T1/p

• Sp = T1/Tp
• Thus Sp = 1 / (alpha + (1-alpha)/p)

• Max speed-up only dependent on the proportion of code
that is serial

• Max speed-up (p->inf): is 1/alpha

• If 1% of a code is serial what is max speed-up

• If 1% of a code is serial what is max speed-up

• Alpha=0.01, max S(p) = 1/alpha = 100

• If 1% of a code is serial what is max speed-up

• Alpha=0.01, max S(p) = 1/alpha = 100

• SO why do HPC codes use 1000s of cores?
– ? How many cores needed for that 100x speed up

e.g.
p=100, Sp = 1/[0.01+.99/100]=50.25
p=1000, Sp=1/[.001+.999/1000] = 91.0

– It is not all about a fixed problem size

Examples using Amdahl’s Law

• Apply Amdahl’s Law to determine the theoretical
maximum speed-up for a code that takes 180 seconds on 1
core and 100 seconds on 2 cores.

Examples using Amdahl’s Law

• Apply Amdahl’s Law to determine the theoretical
maximum speed-up for a code that takes 180 seconds on 1
core and 100 seconds on 2 cores.

• T(1) = 180, T(2) = 100

• T(p) = α T(1) + (1- α) T(1)/p

COMP328/COMP528 (c) Univ of Liverpool

Examples using Amdahl’s Law

• Apply Amdahl’s Law to determine the theoretical
maximum speed-up for a code that takes 180 seconds on 1
core and 100 seconds on 2 cores.

• T(1) = 180, T(2) = 100

• Let S be time taken by serial proportion and
let P be time taken on 1 core by parallel proportion

• So..
T(1) = 180 = S + P/1 & T(2) = 100 = S + P/2
T(1)-T(2) = 180-100 = 80 = (S+P/1) – (S+P/2) = P/2

• Therefore P=160 secs and S=20 secs (substitute back in the check!)

Examples using Amdahl’s Law

• Apply Amdahl’s Law to determine the theoretical
maximum speed-up for a code that takes 180 seconds on 1
core and 100 seconds on 2 cores.

• T(1) = 180, T(2) = 100

• P=160 secs and S=20 secs
(substitute back in the check!)

• Alpha = proportion serial to total = 20/180 = 1/9

• Max speed-up = 1/alpha = 9

Examples using Amdahl’s Law

• Apply Amdahl’s Law to determine the theoretical
maximum speed-up for a code that takes 180 seconds on 1
core and 100 seconds on 2 cores.

• P=160 secs and S=20 secs

• Fastest time is S
(i.e. infinite number of cores so 160/infinity = zero)

• Speed up at fastest = T(1)/T(inf) = 180/20 = 9

• The fastest this code will go, according to Amdahl, is 9
times faster than serial

• Apply Amdahl’s Law to determine the theoretical
maximum speed-up for a code that takes 180 seconds on 1
core and 100 seconds on 2 cores.

• The fastest this code will go, according to Amdahl, is 9
times faster than serial
– NB the answer is not “9 seconds”. It is “9 times faster than

serial”. Be careful of units etc when applying Amdahl’s Law

• The answer is NOT derived like this:
T(2) is 100 seconds, but T(1) is 180, so serial is 80 seconds, so alpha
is 80/180 = 4/9; you can test & see it is wrong:
T(2) = 4/9*T(1) + 5/9*T(1)/2 = 4/9*(180)+5/9*(180)/2
= 80 + 50 = 130 (c.f. correct value from question of 100 secs)

How much parallelism is there?

• J. Gustafson’s observations (around Amdahl’s law):
– For many problems as the problem size grows the work required for

the parallel part grows faster then the work required for the serial
part

– The serial fraction decreases,
so by Amdahl’s law scalability improves

Keep the run time fixed, but use more cores to do larger examples

How much parallelism is there?

• J. Gustafson’s observations (around Amdahl’s law):
– For many problems as the problem size grows the work required for

the parallel part grows faster then the work required for the serial
part

– The serial fraction decreases,
so by Amdahl’s law scalability improves

Keep the run time fixed, but use more cores to do larger examples

Amdahl vs Gustafson

• Both are right

• Both are theoretical


– Amdahl’s point is pessimistic: for a fixed problem/program there

is limit of its acceleration by the parallelization;

– Gustafson’s point is optimistic: for growing problems there is
chance to keep up using parallelization

In both cases, minimising the amount of time spent in sequential code is paramount

Questions via MS Teams / email
Dr Michael K Bane, Computer Science, University of Liverpool
m.k. .uk https://cgi.csc.liv.ac.uk/~mkbane