Microsoft PowerPoint – COMP528 HAL22 Data Dependencies.pptx
Dr Michael K Bane, G14, Computer Science, University of Liverpool
m.k. .uk https://cgi.csc.liv.ac.uk/~mkbane/COMP528
COMP528: Multi-core and
Multi-Processor Programming
22 – HAL
Data Dependencies
• Have considered “independency” from logical view point
– Light rays
– Coloured ball sorting
– Jigsaws?
– Prime numbers???
• How about in a sequential code,
when considering what to parallelise
COMP328/COMP528 (c) mkbane, university of
liverpool
Consider…
• Moving from sequential to parallel implementations…
COMP328/COMP528 (c) mkbane, university of
liverpool
Consider…
• Moving from sequential to parallel implementations…
x[0] = pi;
for (i=1; i<4; i++) { x[i] = 3.0 * x[i-1]; } COMP328/COMP528 (c) mkbane, university of liverpool Consider… • Sequential… what are the final values of x? x[0] = pi; for (i=1; i<4; i++) { x[i] = 3.0 * x[i-1]; } x[0] = pi x[1] = 3 * x[0] x[2] = 3 * x[1] x[3] = 3 * x[2] COMP328/COMP528 (c) mkbane, university of liverpool Consider… • So x[j] depends on x[j-1] for j>0
• i.e. DATA DEPENDENCY
– specifically “loop carried dependency”
• the order of the execution of the iterations matters
• ==> so what happens if we parallelise?
x[0] = pi;
for (i=1; i<4; i++) {
x[i] = 3.0 * x[i-1];
}
x[0] = pi
x[1] = 3 * x[0]
x[2] = 3 * x[1]
x[3] = 3 * x[2]
COMP328/COMP528 (c) mkbane, university of
liverpool
Consider 2 parallel
processing elements…
PE#0
• x[1] =
• x[2] =
PE#1
• x[3] =
x[0] = pi;
& then
PE#0 does half and PE#1 does half
x[0] = pi;
for (i=1; i<4; i++) {
x[i] = 3.0 * x[i-1];
}
COMP328/COMP528 (c) mkbane, university of
liverpool
Consider 2 parallel
processing elements…
PE#0
• x[1] = 3.0 * x[0]
• x[2] = 3.0 * x[1]
PE#1
• x[3] =
x[0] = pi;
& then
PE#0 does half and PE#1 does half
x[0] = pi;
for (i=1; i<4; i++) {
x[i] = 3.0 * x[i-1];
}
COMP328/COMP528 (c) mkbane, university of
liverpool
Consider 2 parallel
processing elements…
PE#0
• x[1] = 3.0 * x[0]
• x[2] = 3.0 * x[1]
PE#1
• x[3] = 3.0 * x[2]
• Q: has x[2] been evaluated?
– probably not: 2 opers on PE#0 to be
completed before very first read on PE#1
• Q: is x[2] accessible on PE#1?
x[0] = pi;
& then
PE#0 does half and PE#1 does half
x[0] = pi;
for (i=1; i<4; i++) {
x[i] = 3.0 * x[i-1];
}
Consider 2 parallel
processing elements…
• Q: is x[2] accessible on PE#1?
– depends…
– if shared memory (OpenMP) then quite likely (albeit probably wrong value)
– if distributed memory (MPI) then not likely to be accessible (so def wrong value)
PE#1
• x[3] = 3.0 * x[2]
x[0] = pi;
& then
PE#0 does half and PE#1 does half
x[0] = pi;
for (i=1; i<4; i++) {
x[i] = 3.0 * x[i-1];
}
Consider…
• Sequential… what are the final values of x?
x[0] = pi;
for (i=1; i<4; i++) {
x[i] = 3.0 * x[i-1];
}
x[0] = pi
x[1] = 3 * x[0] = 3 * pi
x[2] = 3 * x[1] = 9 * pi
x[3] = 3 * x[2] = 27*pi
COMP328/COMP528 (c) mkbane, university of
liverpool
Fibonnaci
x[0] = 1; x[1] = 1;
for (i=2; i
for j>3, x[j] depends on x[j-2] which may be being calculated on another thread
time… #0 #1 #2
i=2,3 i=4,5 i=6,7,8
x[2]=x[1]+x[0] x[4]=x[3]+x[2] x[6]=x[5]+x[4]
x[3]=x[2]+x[1] x[5]=x[4]+x[3] x[7]=x[6]+x[5]
x[8]=x[7]+x[6]
One has to think
whether a given loop
can be parallelised
• The compiler will merely obey without question OpenMP
directives / MPI logic, even if the logic is wrong
COMP328/COMP528 (c) mkbane, university of
liverpool
what about…
// previously set all x values
for (i=0; i<4; i++) {
x[i] = 3.0 * x[i+1];
}
• cannot parallelise…
• still have a given array element on both left and right hand
side, so order will matter
for (i=0; i<4; i++) {
x[i] = 3.0 * x[i+1];
}
x[0] = 3 * x[1]
x[1] = 3 * x[2]
x[2] = 3 * x[3]
x[3] = 3 * x[4]
COMP328/COMP528 (c) mkbane, university of
liverpool
what about…
// previously set all x values
for (i=0; i<4; i++) {
x[i] = 3.0 * y[i+1];
}
• no data dependencies, diff array being read as being written
• order doesn't matter
• can be parallelised
• (as long as no aliaising between 'x' and 'y' -- advanced C programming)
for (i=0; i<4; i++) {
x[i] = 3.0 * x[i+1];
}
x[0] = 3 * y[1]
x[1] = 3 * y[2]
x[2] = 3 * y[3]
x[3] = 3 * y[4]
what about…
// previously set all x,y values
for (i=0; i
– but for some forms of canonical loops, compiler can determine if
any dependency
– if there is dependency OR if compiler cannot prove there is no
dependency, then compiler will not vectorise (not auto-
parellelise) the loop
COMP328/COMP528 (c) mkbane, university of
liverpool
Data Dependency Analysis
• Theory ==> not in COMP328/COMP528
– but for some forms of canonical loops, compiler can determine if any dependency
– if there is dependency OR if compiler cannot prove there is no dependency, then compiler will not vectorise (not auto-parellelise) the loop
• BUT YOU DO
– need to be able to spot dependencies
– need to be able to consider options for removing dependencies
• Quick test: if run the loop in ‘opposite’ direction,
do you get same results?
• Y=> some loop invariance, possible parallelisable
• N=> definitely ordering issues, no straight forward parallelisable
Questions via MS Teams / email
Dr Michael K Bane, Computer Science, University of Liverpool
m.k. .uk https://cgi.csc.liv.ac.uk/~mkbane