L11-MemoryConsistency.ppt
HPC ARCHITECTURES
Memory consistency
Motivation
• Simple producer-consumer pattern with two threads
• If the two threads are running on different cores, does the
hardware guarantee that b ends up with the value 23 on
Thread 1?
• Let’s assume that the compiler doesn’t rearrange the memory
access or optimise anything….
Thread 0 !
!
a = 23;!
flag = 1;!
Thread 1 !
!
while (!flag){}!
b = a;!
Coherency and consistency
• Coherency rules concern the ordering of operations on
the same memory location.
• Coherency says nothing about ordering of operations on
different memory locations.
• This is determined by the memory consistency model.
• Coherence and consistency are different, and
complementary, but are often confused.
Consistency models
• Memory consistency models consists of rules
determining the order in which writes by one processor
must be observed by other processors (or other
devices).
• Example:
P1
a = 0;
…
a = 1;
x = b;
P2
b = 0;
…
b = 1;
y = a;
Should it be possible !
for both x and y to be 0?!
Sequential consistency
• Simplest model is sequential consistency
• Result must be the same as if order of accesses (reads
and writes) by each processor are preserved, and
accesses among different processors are interleaved in
time.
P1
a = 0;
…
a = 1;
x = b;
P2
b = 0;
…
b = 1;
y = a;
x and y cannot both be 0!
• Sequential consistency can be implemented by delaying
any write until all previous writes have completed.
• in a write-invalidate protocol, enough to wait until all the
invalidations associated with previous writes have completed.
• Simple, but overkill.
• for correctly synchronised programs the rules can be relaxed
without causing the program to give different results.
• Models talk about abstract synchronisation operations
• acquire and release
• lock and unlock is an example
• Program is synchronised if all accesses to shared
variables happen in the following order:
write (x);
…
release(s);
…
acquire(s);
…
access(x);
Consistency models
• Consistency models are defined by the allowed
orderings in which operations on different addresses
must occur.
• consider 4 operations: write (W), read(R), acquire(SA),
release(SR).
• Sequential consistency requires all orderings to be
preserved:
W SA R SR!
SR!
SA!
R
W
First !
operation
Second operation
• Processor consistency or total store ordering
• Allows reads to occur before previous writes have
completed.
W SA R SR!
SR!
SA!
R
W
First !
operation
Second operation
• Partial store ordering
• Also allows writes to different addresses to occur out of
order.
W SA R SR!
SR!
SA!
R
W
First !
operation
Second operation
• Weak ordering
• Allows reads and writes to be arbitrarily reordered
W SA R SR!
SR!
SA!
R
W
First !
operation
Second operation
• Release consistency
• Allows some reordering between reads and writes and
acquires and releases.
• Weakest ordering which ensures synchronised
programs execute correctly
W SA R SR!
SR!
SA!
R
W
First !
operation
Second operation
Real processors
• In reality, models are more complicated – there can be
different types of loads/stores and multiple different sync
operations in an ISA
• (Almost) no real processors implement sequential
consistency
• too expensive
• x86 and SPARC processors implement TSO
• easy to program and reason about
• Power and ARM processors implement a version of
relaxed consistency
• harder to program correctly
• theoretical performance gains…..
Language level models
• Languages/APIs which support multi-threading also need
a memory model
• e.g. Java, C++, OpenMP
• Allows programmers to reason about consistency at a
language level rather than assembly code level
• These are usually relaxed models, so they can be
implemented efficiently across a range of hardware
platforms
• Most consistency problems are avoided if programmers
only use the inbuilt language/API synchronisation
constructs
• problems can occur if you try to build your own…
Producer consumer pattern
• A straightforward assembly code implementation would be
correct on an x86 processor, but not on an ARM
processor.
• This is not correct code in Java, C++, or OpenMP
• need to add some synchronisation constructs to fix it
Thread 0 !
!
a = 23;!
flag = 1;!
Thread 1 !
!
while (!flag){}!
b = a;!