程序代写代做代考 cache compiler chain Squishy Maps for Soft Body Modelling Using Generalised Chain Mail

Squishy Maps for Soft Body Modelling Using Generalised Chain Mail

KIT308/408 (Advanced) Multicore Architecture and Programming

Shared Memory and Synchronisation
Dr. Ian Lewis
Discipline of ICT, School of TED
University of Tasmania, Australia
1

Today we’ll look at some of the problems with sharing memory between multicores (or just multithreaded programs)
See simple approaches for restricting access to avoid these potential pitfalls
Often referred to as synchronisation

2
Programming Multicores

Shared Memory

3

Threads share the address space of their parent process
Simple to access shared data
Any global variables are global across all threads
Care needs to be taken when doing so

4
Refresher: Processes and Threads

Threads within a Process

Thread functions can be informally grouped into four major groups
Thread management
Routines that work directly on threads
Creating, exiting, waiting for other threads to finish, etc.
Mutexes
Routines that deal with synchronization, called a mutex, which is an abbreviation for “mutual exclusion”
Condition variables
Routines that address communications between threads that share a mutex
Synchronization
Routines that manage read/write locks and barriers
Today we’ll look at mutex and synchronization thread functions

Refresher: Windows Thread Functions
5

This is a different calculation that what we’ve been doing with Mandelbrot
One single result
Would this code work?
Unlikely, as all the threads are trying to read and write to sum at the same time
Lots and lots of reads and writes
Code once compiled is actually: read, add, write
What’s going on in main memory and cache?
This value would need to be shared between different cores
int array[LOTS_OF_VALUES];
int sum = 0;

void thread_func(int* array, int start, int end)
{
for (int i = start; i < end; ++i) { sum += array[i]; } } // initialise array with something // make some threads that all run thread_func // (with appropriate start and end indices) // to calculate sum 6 Multithreaded Sum: Naïve Attempt 1 Would this code work? Less read/writes of sum variable But still potential for them to fight Pretty unlucky, but not impossible This is probably worse than previous example as this might work most of the time Quick testing wouldn’t reveal problem int array[LOTS_OF_VALUES]; int sum = 0; void thread_func(int* array, int start, int end) { int local_sum = 0; for (int i = start; i < end; ++i) { local_sum += array[i]; } sum += local_sum; } // initialise array with something // make some threads that all run thread_func // (with appropriate start and end indices) // to calculate sum 7 Multithreaded Sum: Naïve Attempt 2 The solution to this problem is to somehow lock the variable So that only one thread can read/write from it at one time Then unlock it once the calculation has been done int array[LOTS_OF_VALUES]; int sum = 0; void thread_func(int* array, int start, int end) { int local_sum = 0; for (int i = start; i < end; ++i) { local_sum += array[i]; } GET_LOCK_FOR_SUM; // (no other thread can touch it) sum += local_sum; RELEASE_LOCK_FOR_SUM; } // initialise array with something // make some threads that all run thread_func // (with appropriate start and end indices) // to calculate sum 8 Multithreaded Sum: Locking Mutexes 9 A mutex is used to protect a shared resource from multiple simultaneous accesses It enforces sequential access to the resource Resources can be as simple as a single memory location Or as complex as you like 1. https://i.pinimg.com/564x/d9/92/ed/d992ed50b3efe8c33f125e5cd353c5b6--life-plan-venn-diagrams.jpg 10 Mutual Exclusion Locks (Mutexes) To create a mutex, the CreateMutex function is used with the following parameters LPSECURITY_ATTRIBUTES lpMutexAttributes A pointer to a structure to define security features (NULL for the defaults) BOOL bInitialOwner A boolean specifying whether to create this mutex in its locked state or not i.e. if this is TRUE, the thread calling CreateMutex will own the mutex LPCTSTR lpName The name of the mutex (NULL to default to no name) This function returns a HANDLE A valid (non-zero) HANDLE is returned for success, if NULL is returned, you can query GetLastError to get an error code 11 Creating Windows Mutexes The lock on a mutex is obtained by use of the WaitForSingleObject function (Or WaitForMultipleObjects to get multiple mutexes, or one from a set) The lock is released by using the ReleaseMutex function with the following parameters HANDLE hMutex The HANDLE to the mutex 1. http://static.gulfnews.com/polopoly_fs/1.2177287!/image/2534844721.jpg_gen/derivatives/box_620347/2534844721.jpg 12 Locking/Releasing Windows Mutexes Using a single mutex allows us to do our sum calculation correctly Why are we still bothering with the local_sum variable here? To reduce the amount of locking that takes place Locking is slow int array[LOTS_OF_VALUES]; int sum = 0; HANDLE sum_mutex; void thread_func(int* array, int start, int end) { int local_sum = 0; for (int i = start; i < end; ++i) { local_sum += array[i]; } WaitForSingleObject(sum_mutex, INFINITE); sum += local_sum; ReleaseMutex(sum_mutex); } void main() { sum_mutex = CreateMutex(NULL, FALSE, NULL); // initialise array with something // make some threads that all run thread_func // (with appropriate start and end indices) // to calculate sum } 13 Multithreaded Sum: Mutex Semaphores are a related concept to mutexes Except that they allow for more than one thread to access a resource at once They allow a fixed number of threads access Basically they are maintain a counter, and every time the semaphore is requested, the counter is increased Increasing the counter is thread-safe (i.e. only one thread can do it a time) When the semaphore is released, the counter is decreased 1. https://upload.wikimedia.org/wikipedia/commons/b/b2/020118-N-6520M-011_Semaphore_Flags.jpg 14 Aside: Semaphores Specific Windows Functions 15 Windows has an extensive set of custom functions/classes for helping write threaded programs CriticalSection Basically the same as a mutex Can be more efficient when the lock is available Interlocked API A variety of functions for allowing simple thread-safe operations 16 Windows Thread Sharing Functions InterlockedIncrement will thread-safely increase a 32-bit integer variable by 1 It’s equivalent to wrapping the increment of a variable in a mutex lock / unlock There are also variants that work on 16- or 64-bit values There are also similar functions for Decrement Xor, And, Or Exchange Assignment (with previous value of variable returned) Add (with previous value returned) The InterlockedIncrement function has the following parameters LONG volatile *Addend A pointer to the variable to be incremented This function returns a LONG The incremented value of the variable pointed to by Addend The AddEnd pointer needs to be 32-bit aligned We’ll discuss this concept more later on And see how to force the compiler to do this 17 InterlockedIncrement The interlocked Windows functions allow for shorthand versions of the simple uses of mutex int array[LOTS_OF_VALUES]; int sum = 0; void thread_func(int* array, int start, int end) { int local_sum = 0; for (int i = start; i < end; ++i) { local_sum += array[i]; } // guaranteed to perform correct add InterlockedExchangeAdd(&sum, local_sum); } // initialise array with something // make some threads that all run thread_func // (with appropriate start and end indices) // to calculate sum 18 Multithreaded Sum: InterlockedExchangeAdd Operation ENDURING FREEDOM