CS计算机代考程序代写 Microsoft PowerPoint - COMP528 HAL05 performance challenges.pptx

Microsoft PowerPoint – COMP528 HAL05 performance challenges.pptx

Dr Michael K Bane, G14, Computer Science, University of Liverpool
m.k. .uk https://cgi.csc.liv.ac.uk/~mkbane/COMP528

COMP528: Multi-core and
Multi-Processor Programming

5 – HAL

Exploiting
Parallelism

Quadrature (integration)

• Problem Def: find integral of f(x) from x=a to x=b

0 0.5 1 1.5 2 2.5 3 3.5

Quadrature (integration)

• Problem Def: find integral of f(x) from x=a to x=b

• Approximate integral is
sum of areas under line

• Each area approximated by a rectangle

0 0.5 1 1.5 2 2.5 3 3.5

“quad.c” from
first lab

COMP328/COMP528 (c) mkbane, Univ of
Liverpool

• Approximate integral is
sum of areas under line

• Each area approximated by a rectangle (all of width “h”)

• Calculation of each area is independent of others,
can be done in any order

• ==> Calculate these in parallel…

0 0.5 1 1.5 2 2.5 3 3.5

x y=x+h

0.5*[f(x)+f(y)]

Specific example…

Each trapezoidal area is independent
Therefore we calculate in parallel

And the TOTAL integral is the SUM of
the areas of the 6 trapezoidals

This is one way we can distribute the
work between (e.g.) 3 cores

So this should go 3x faster than on 1
core

But there is some overheads – bringing
all the areas together to sum to get
the total integral

core
0

core
1

core
2

0 1
2 3

4 5

COMP328/COMP528 (c) mkbane, Univ of Liverpool

Many ways to distribute work between
resources

You want to
• Share work as evenly as possible
• Minimise overheads
• Do it logically so you understand
• Comment your code!

Scalability and Speedup

• Speedup is a ratio of the time it takes to run a program
without parallelism versus the time it runs in parallel;

• Scalability is a measure of how much speedup the program
gets as one adds more processor cores;

• The program does not scale beyond certain point when
adding more processors does not result in additional
speedup.

Speed Up & Efficiency

• Time on 1 core: t1
• Could be either sequential implementation or p=1 version of parallel code

• Time on p cores: t(p)

• Speed-up, S(p) = t1 / t(p)

• Ideal speed-up is when S(p) = p

• Super-linear speed-up

• Efficiency, E(p) = S(p)/p usually expressed as a percentage

• What is “good efficiency”?

COMP328/COMP528 (c) mkbane, University of
Liverpool

COMP328/COMP528 (c) Univ of Liverpool

Scaling

• Strong scaling
– Keep problem size fixed

– Time to solution decreases proportionally as #cores increases

• Weak scaling
– As increase problem size, increasing #cores will ensure

time to solution remains constant

How much parallelism is there?

• Amdahl’s law (1967)
– The computer program will never go faster than the sum of the

parts that do not run in parallel (the serial portions), no matter
how many processing elements we have

Amdahl’s Law

• Alpha : serial proportion of original code

• Tp = alpha*T1 + (1-alpha)*T1/p

• Sp = T1/Tp
• Thus Sp = 1 / (alpha + (1-alpha)/p)

• Max speed-up only dependent on the proportion of code
that is serial

• Max speed-up (p->inf): is 1/alpha

Amdahl’s Law

• Alpha : serial proportion of original code

• Tp = alpha*T1 + (1-alpha)*T1/p

• Sp = T1/Tp
• Thus Sp = 1 / (alpha + (1-alpha)/p)

• Max speed-up only dependent on the proportion of code
that is serial

• Max speed-up (p->inf): is 1/alpha

• If 1% of a code is serial what is max speed-up

• Alpha=0.01, max S(p) = 1/alpha = 100

• If 1% of a code is serial what is max speed-up

• Alpha=0.01, max S(p) = 1/alpha = 100

• SO why do HPC codes use 1000s of cores?
– ? How many cores needed for that 100x speed up

e.g.
p=100, Sp = 1/[0.01+.99/100]=50.25
p=1000, Sp=1/[.001+.999/1000] = 91.0

– It is not all about a fixed problem size

Examples using Amdahl’s Law