CS计算机代考程序代写 x86 compiler Java arm assembly L11-MemoryConsistency.ppt

L11-MemoryConsistency.ppt

HPC ARCHITECTURES
Memory consistency

Motivation
• Simple producer-consumer pattern with two threads

•  If the two threads are running on different cores, does the
hardware guarantee that b ends up with the value 23 on
Thread 1?
•  Let’s assume that the compiler doesn’t rearrange the memory

access or optimise anything….

Thread 0 !
!
a = 23;!
flag = 1;!

Thread 1 !
!
while (!flag){}!
b = a;!

Coherency and consistency
• Coherency rules concern the ordering of operations on

the same memory location.

• Coherency says nothing about ordering of operations on
different memory locations.

• This is determined by the memory consistency model.

• Coherence and consistency are different, and
complementary, but are often confused.

Consistency models
• Memory consistency models consists of rules

determining the order in which writes by one processor
must be observed by other processors (or other
devices).

• Example:
P1

a = 0;

a = 1;
x = b;

P2

b = 0;

b = 1;
y = a;

Should it be possible !
for both x and y to be 0?!

Sequential consistency
• Simplest model is sequential consistency
• Result must be the same as if order of accesses (reads

and writes) by each processor are preserved, and
accesses among different processors are interleaved in
time.

P1

a = 0;

a = 1;
x = b;

P2

b = 0;

b = 1;
y = a;

x and y cannot both be 0!

• Sequential consistency can be implemented by delaying
any write until all previous writes have completed.
•  in a write-invalidate protocol, enough to wait until all the

invalidations associated with previous writes have completed.

• Simple, but overkill.
•  for correctly synchronised programs the rules can be relaxed

without causing the program to give different results.

• Models talk about abstract synchronisation operations
•  acquire and release
•  lock and unlock is an example

• Program is synchronised if all accesses to shared
variables happen in the following order:

write (x);

release(s);

acquire(s);

access(x);

Consistency models
• Consistency models are defined by the allowed

orderings in which operations on different addresses
must occur.
•  consider 4 operations: write (W), read(R), acquire(SA),

release(SR).

• Sequential consistency requires all orderings to be
preserved:

W SA R SR!

SR!
SA!
R
W

First !
operation

Second operation

• Processor consistency or total store ordering
• Allows reads to occur before previous writes have

completed.

W SA R SR!

SR!
SA!
R
W

First !
operation

Second operation

• Partial store ordering
• Also allows writes to different addresses to occur out of

order.

W SA R SR!

SR!
SA!
R
W

First !
operation

Second operation

• Weak ordering
• Allows reads and writes to be arbitrarily reordered

W SA R SR!

SR!
SA!
R
W

First !
operation

Second operation

• Release consistency
• Allows some reordering between reads and writes and

acquires and releases.
• Weakest ordering which ensures synchronised

programs execute correctly

W SA R SR!

SR!
SA!
R
W

First !
operation

Second operation

Real processors
•  In reality, models are more complicated – there can be

different types of loads/stores and multiple different sync
operations in an ISA

•  (Almost) no real processors implement sequential
consistency
•  too expensive

•  x86 and SPARC processors implement TSO
•  easy to program and reason about

• Power and ARM processors implement a version of
relaxed consistency
•  harder to program correctly
•  theoretical performance gains…..

Language level models
•  Languages/APIs which support multi-threading also need

a memory model
•  e.g. Java, C++, OpenMP

• Allows programmers to reason about consistency at a
language level rather than assembly code level

• These are usually relaxed models, so they can be
implemented efficiently across a range of hardware
platforms

• Most consistency problems are avoided if programmers
only use the inbuilt language/API synchronisation
constructs
•  problems can occur if you try to build your own…

Producer consumer pattern

• A straightforward assembly code implementation would be
correct on an x86 processor, but not on an ARM
processor.

• This is not correct code in Java, C++, or OpenMP
•  need to add some synchronisation constructs to fix it

Thread 0 !
!
a = 23;!
flag = 1;!

Thread 1 !
!
while (!flag){}!
b = a;!