CS计算机代考程序代写 compiler Microsoft PowerPoint – COMP528 HAL22 Data Dependencies.pptx

Microsoft PowerPoint – COMP528 HAL22 Data Dependencies.pptx

Dr Michael K Bane, G14, Computer Science, University of Liverpool
m.k. .uk https://cgi.csc.liv.ac.uk/~mkbane/COMP528

COMP528: Multi-core and
Multi-Processor Programming

22 – HAL

Data Dependencies

• Have considered “independency” from logical view point
– Light rays 

– Coloured ball sorting 

– Jigsaws?

– Prime numbers???

• How about in a sequential code,
when considering what to parallelise

COMP328/COMP528 (c) mkbane, university of
liverpool

Consider…

• Moving from sequential to parallel implementations…

COMP328/COMP528 (c) mkbane, university of
liverpool

Consider…

• Moving from sequential to parallel implementations…

x[0] = pi;

for (i=1; i<4; i++) { x[i] = 3.0 * x[i-1]; } COMP328/COMP528 (c) mkbane, university of liverpool Consider… • Sequential… what are the final values of x? x[0] = pi; for (i=1; i<4; i++) { x[i] = 3.0 * x[i-1]; } x[0] = pi x[1] = 3 * x[0] x[2] = 3 * x[1] x[3] = 3 * x[2] COMP328/COMP528 (c) mkbane, university of liverpool Consider… • So x[j] depends on x[j-1] for j>0

• i.e. DATA DEPENDENCY
– specifically “loop carried dependency”

• the order of the execution of the iterations matters

• ==> so what happens if we parallelise?

x[0] = pi;
for (i=1; i<4; i++) { x[i] = 3.0 * x[i-1]; } x[0] = pi x[1] = 3 * x[0] x[2] = 3 * x[1] x[3] = 3 * x[2] COMP328/COMP528 (c) mkbane, university of liverpool Consider 2 parallel processing elements… PE#0 • x[1] = • x[2] = PE#1 • x[3] = x[0] = pi; & then PE#0 does half and PE#1 does half x[0] = pi; for (i=1; i<4; i++) { x[i] = 3.0 * x[i-1]; } COMP328/COMP528 (c) mkbane, university of liverpool Consider 2 parallel processing elements… PE#0 • x[1] = 3.0 * x[0] • x[2] = 3.0 * x[1] PE#1 • x[3] = x[0] = pi; & then PE#0 does half and PE#1 does half x[0] = pi; for (i=1; i<4; i++) { x[i] = 3.0 * x[i-1]; } COMP328/COMP528 (c) mkbane, university of liverpool Consider 2 parallel processing elements… PE#0 • x[1] = 3.0 * x[0] • x[2] = 3.0 * x[1] PE#1 • x[3] = 3.0 * x[2] • Q: has x[2] been evaluated? – probably not: 2 opers on PE#0 to be completed before very first read on PE#1 • Q: is x[2] accessible on PE#1? x[0] = pi; & then PE#0 does half and PE#1 does half x[0] = pi; for (i=1; i<4; i++) { x[i] = 3.0 * x[i-1]; } Consider 2 parallel processing elements… • Q: is x[2] accessible on PE#1? – depends… – if shared memory (OpenMP) then quite likely (albeit probably wrong value) – if distributed memory (MPI) then not likely to be accessible (so def wrong value) PE#1 • x[3] = 3.0 * x[2] x[0] = pi; & then PE#0 does half and PE#1 does half x[0] = pi; for (i=1; i<4; i++) { x[i] = 3.0 * x[i-1]; } Consider… • Sequential… what are the final values of x? x[0] = pi; for (i=1; i<4; i++) { x[i] = 3.0 * x[i-1]; } x[0] = pi x[1] = 3 * x[0] = 3 * pi x[2] = 3 * x[1] = 9 * pi x[3] = 3 * x[2] = 27*pi COMP328/COMP528 (c) mkbane, university of liverpool Fibonnaci x[0] = 1; x[1] = 1; for (i=2; i2, x[i] depends on x[i-1] which may be being calculated on another thread
for j>3, x[j] depends on x[j-2] which may be being calculated on another thread

time… #0 #1 #2

i=2,3 i=4,5 i=6,7,8

x[2]=x[1]+x[0] x[4]=x[3]+x[2] x[6]=x[5]+x[4]

x[3]=x[2]+x[1] x[5]=x[4]+x[3] x[7]=x[6]+x[5]

x[8]=x[7]+x[6]

One has to think
whether a given loop
can be parallelised

• The compiler will merely obey without question OpenMP
directives / MPI logic, even if the logic is wrong

COMP328/COMP528 (c) mkbane, university of
liverpool

what about…

// previously set all x values

for (i=0; i<4; i++) { x[i] = 3.0 * x[i+1]; } • cannot parallelise… • still have a given array element on both left and right hand side, so order will matter for (i=0; i<4; i++) { x[i] = 3.0 * x[i+1]; } x[0] = 3 * x[1] x[1] = 3 * x[2] x[2] = 3 * x[3] x[3] = 3 * x[4] COMP328/COMP528 (c) mkbane, university of liverpool what about… // previously set all x values for (i=0; i<4; i++) { x[i] = 3.0 * y[i+1]; } • no data dependencies, diff array being read as being written • order doesn't matter • can be parallelised • (as long as no aliaising between 'x' and 'y' -- advanced C programming) for (i=0; i<4; i++) { x[i] = 3.0 * x[i+1]; } x[0] = 3 * y[1] x[1] = 3 * y[2] x[2] = 3 * y[3] x[3] = 3 * y[4] what about… // previously set all x,y values for (i=0; i not in COMP328/COMP528
– but for some forms of canonical loops, compiler can determine if

any dependency

– if there is dependency OR if compiler cannot prove there is no
dependency, then compiler will not vectorise (not auto-
parellelise) the loop

COMP328/COMP528 (c) mkbane, university of
liverpool

Data Dependency Analysis

• Theory ==> not in COMP328/COMP528
– but for some forms of canonical loops, compiler can determine if any dependency

– if there is dependency OR if compiler cannot prove there is no dependency, then compiler will not vectorise (not auto-parellelise) the loop

• BUT YOU DO
– need to be able to spot dependencies

– need to be able to consider options for removing dependencies

• Quick test: if run the loop in ‘opposite’ direction,
do you get same results?

• Y=> some loop invariance, possible parallelisable
• N=> definitely ordering issues, no straight forward parallelisable

Questions via MS Teams / email
Dr Michael K Bane, Computer Science, University of Liverpool
m.k. .uk https://cgi.csc.liv.ac.uk/~mkbane