Squishy Maps for Soft Body Modelling Using Generalised Chain Mail
KIT308/408 (Advanced) Multicore Architecture and Programming
Threading
Dr. Ian Lewis
Discpline of ICT, School of TED
University of Tasmania, Australia
1
We’ve looked at details of high performance CPU architectures
Pipelining, Superscalar
These advancements were transparent to the programmer
The CPU did all the work to make things run faster
The basic programming model remained the same
Now that individual cores aren’t get much faster, how do we (as programmers) take advantage of extra cores?
i.e. what programming model do we need to use to access multiple cores at once?
Multicore programming requires the programmer to change the way their program is structured
2
Programming Multicores
Processes and Threads
3
Semester 2
2015
3
A process is broadly analogous to a single running program
A thread is a the smallest unit of processing that can be scheduled by an operating system
A process can contain multiple threads
Multiple threads can share the same memory space (the memory space of the parent process)
For almost all the work we do in this unit the difference won’t matter
We’ll generally be running a single thread per core
Although we will deliberately run too many threads on occasion to see what happens
“Shared memory” still requires some work
(But we won’t be learning how to make a multi-process program)
Processes / Threads
Processes and Threads
5
Process
Threads within a Process
Having the capacity to run multiple threads is referred to as multithreading
The implementation of multithreading is hardware dependent and there are a variety of schemes in use
Interleaved
Time-slicing threads on a single core
Simultaneous
Sharing a single core amongst multiple threads
Chip-Level
Using multiple cores
1. https://www.reddit.com/r/funny/comments/83w41t/the_girls_are_comin/?st=jeowvx73&sh=6e386046
6
Hardware Threads
AKA temporal multithreading
Running multiple threads on a traditional CPU requires threads to be time-sliced
Each thread is given a small amount of time to execute, then control is passed to another thread
Done quickly enough this makes it appear that all threads are executing simultaneously
Happens around 100 times per second typically
A single thread executes at one time on the CPU
Once a thread has used its allocated time or stalls (e.g. a page-fault occurs) then a context-switch occurs
This requires a mechanism to store the current state of thread so it can be restarted later (referred to as context switching)
All visible registers copied (usually to RAM)
A very costly operation
Interleaved Multithreading
Context Switch
Simultaneous multithreading allows the execution of multiple threads simultaneously on one CPU
i.e. instructions from multiple threads can be issued at the same time
This technique requires a superscalar architecture (for the multiple execution units) and also the duplication of many parts of the CPU front-end
The performance gain is very application dependent
Contention for bandwidth (instruction and data fetch), caches, TLB, register renaming buffers, etc.
In the worst case simultaneous multithreading can be slower than interleaved multithreading
Interleaving is still used to provide a larger number of possible threads
Simultaneous multithreading was first commercialised with the Pentium 4 in 2002
Duplicated: Instruction Pointer, register renaming logic, return stack predictor, and ITLB (this is to do with memory management)
Caches and such are shared
Achieved around 15–30% speed increase from a 5% chip area cost
Simultaneous Multithreading
Core/Chip-level multithreading runs multiple threads simultaneously by having multiple CPUs (or multiple cores on the same CPU)
In this case each running thread is executing on its own CPU, so resource contention is reduced
But there can still be memory contention (caches or RAM)
Chip-level multithreading can still obviously use interleaved multithreading or even be combined with simultaneous multithreading
9
Core-Level Multithreading
Fibers are even more efficient than threads
Can been seen as an implementation of coroutines (or at least one technique for implementing them)
Each fiber/coroutine has to relinquish control before the next can run
If a fiber/coroutine has an infinite loop, then whole thread is stuck
From most expensive to least expensive
Process
OS-level construct (OS pre-emptively switches between processes)
Expensive to switch between
Single address space per process
Thread
OS-level construct (OS switches between threads)
Multiple threads per address space
Fiber
Programming construct (programmer must co-operatively switch between fibers)
Multiple fibers per threads
10
Aside: Fibers and Coroutines
Windows Threading Model
11
Semester 2
2015
11
Windows has its own proprietary thread library (the standard Unix/Linux threading library is POSIX Threads – see appendix if you are interested)
Defined in an object-oriented fashion (although still purely C code)
We’ll look at the C versions here, but we will also see the C++ versions later
Defined to be modular and portable
All threads within a process share the same address space
Easy sharing of data
Can creates coherency problems
Threads can also have individual private data
Windows Threads
12
1. http://i.stack.imgur.com/Ytv2X.gif
Each thread is uniquely identified by a thread handle of type HANDLE
When threads are created they begin by executing a function
A single parameter may be passed to this function
Threads can either exit or be killed by other threads
Threads can wait for another thread and receive its return value
Threads have modifiable attributes
e.g. thread priority
Windows Threads
13
1. https://www.brauer.co.uk/files/images/L_389_05108%20Female.jpg
Thread functions can be informally grouped into four major groups
Thread management
Routines that work directly on threads
Creating, exiting, waiting for other threads to finish, etc.
Mutexes
Routines that deal with synchronization, called a mutex, which is an abbreviation for “mutual exclusion”
Condition variables
Routines that address communications between threads that share a mutex
Synchronization
Routines that manage read/write locks and barriers
For the moment we’ll only look at thread management functions
It’s unclear at this stage if there is room in the unit for dealing with high-level synchronisation
Windows Thread Functions
14
To create a thread the CreateThread function is used with the following parameters
LPSECURITY_ATTRIBUTES lpThreadAttributes
A pointer a structure to define security features (NULL for the defaults)
SIZE_T dwStackSize
Size of the stack (0 for default)
LPTHREAD_START_ROUTINE lpStartAddress
The function to run in the thread
LPVOID lpParameter
The parameter to pass to the function
This is effectively a void* pointer and can refer to any object in memory and is generally casted into something more useful immediately
DWORD dwCreationFlags
Flags that control the creation of the thread (0 for the defaults)
LPDWORD lpThreadId
A pointer to a variable to receive the thread’s ID (NULL means the ID isn’t returned)
This function returns an HANDLE
A valid (non-zero) HANDLE is returned for success, if NULL is returned, you can query GetLastError to get an error code
Creating Windows Threads
15
There are several ways in which a Thread may be terminated
The thread returns from its starting routine (i.e. the main function for the initial thread)
The thread is canceled by another thread via the TerminateThread function
The entire process is terminated due to a call to either the exec or exit function
The thread makes a call to the ExitThread function
Returns a single (DWORD) value
This is the graceful way to end a thread
16
Terminating a Windows Thread
1. https://www.youtube.com/watch?v=iQS-8ukufkA
#include
#include
#define NUM_THREADS 5
DWORD __stdcall PrintHello(LPVOID threadid)
{
long tid = (long)threadid;
printf(“Hello World! It’s me, thread #%ld!\n”,
tid);
ExitThread(NULL);
}
int main(int argc, char *argv[])
{
HANDLE threads[NUM_THREADS];
for (unsigned int t = 0; t < NUM_THREADS; t++)
{
printf("In main: creating thread %ld\n", t);
threads[t] = CreateThread(NULL, 0,
PrintHello, (LPVOID)t, 0, NULL);
if (threads[t] == NULL)
{
printf("ERROR. Return code from"
"CreateThread() is %d\n",
GetLastError());
exit(-1);
}
}
ExitThread(NULL);
}
Windows Thread Example
17
struct thread_data
{
int thread_id;
int sum;
char *message;
};
struct thread_data thread_data_array[NUM_THREADS];
DWORD __stdcall PrintHello(LPVOID threadarg)
{
struct thread_data *my_data;
...
my_data = (struct thread_data *) threadarg;
taskid = my_data->thread_id;
sum = my_data->sum;
hello_msg = my_data->message;
…
}
int main(int argc, char *argv[])
{
…
thread_data_array[t].thread_id = t;
thread_data_array[t].sum = sum;
thread_data_array[t].message = messages[t];
threads[t] = CreateThread(NULL, 0, PrintHello,
(LPVOID) &thread_data_array[t], 0, NULL);
…
}
Passing Arguments to Threads
18
Passing Arguments to Threads
The following code fragment illustrates a common mistake when passing arguments to threads
A reference to the variable t is passed to each thread with the intention of t having a different value in each thread
But the value of t may change before the thread gets to access it
for(long t=0; t
It is recommended whenever you are using windows.h to include this header first since it can cause errors (weird #defines) if not
When compiling under Visual Studio, you shouldn’t need to add any libraries to the link path to get threading to work
1. https://youtu.be/hL8BBOwupcI
21
Compiling with Windows Threads
22
Semester 2
2015
Dr Ian Lewis
KIT308 – Multicore Architecture and Programming
Appendix: Pthreads
(just included for interest’s sake)
22
POSIX Threads (Pthreads)
The standard Unix/Linux threading library is POSIX Threads
Defined in an object-oriented fashion (although still purely C code)
Defined to be modular and portable
Pthreads are designed to be as fast as possible
All threads within a process share the same address space
Easy sharing of data
Can creates coherency problems
Threads can also have individual private data
23
POSIX Threads
Each thread is uniquely identified by a thread ID of type pthread_t
When threads are created they begin by executing a function
A single parameter may be passed to this function
Threads can either exit or be killed by other threads
Threads can wait for another thread and receive its return value
Threads have modifiable attributes
eg. thread priority
24
Pthread Functions
Pthreads functions can be informally grouped into four major groups
Thread management
Routines that work directly on threads – creating, detaching, joining, etc. They also include functions to set/query thread attributes (joinable, scheduling etc.)
Mutexes
Routines that deal with synchronization, called a mutex, which is an abbreviation for “mutual exclusion”
Condition variables
Routines that address communications between threads that share a mutex
Synchronization
Routines that manage read/write locks and barriers
For the moment we’ll only look at thread management functions
25
Creating Pthreads
To create a Pthread the pthread_create function is used with the following parameters
pthread_t *thread
A pointer to the location to store the unique thread ID
const pthread attr_t *attr
NULL for the defaults
void *(*routine)(void *)
The function to run in the thread
void *arg
The parameter to pass to the function
This void* pointer can refer to any object in memory and is generally casted into something more useful immediately
This function returns an int
Zero returned for success, non-zero returns an error code
26
Terminating a Pthread
There are several ways in which a Pthread may be terminated
The thread returns from its starting routine (ie. the main function for the initial thread)
The thread is canceled by another thread via the pthread_cancel function
The entire process is terminated due to a call to either the exec or exit function
The thread makes a call to the pthread_exit function
Returns a single (void*) value
This is the graceful way to end a thread
27
Pthread Example
#include
#define NUM_THREADS 5
void *PrintHello(void *threadid)
{
long tid = (long) threadid;
printf(“Hello World! It’s me, thread #%ld!\n”, tid);
pthread_exit(NULL);
}
int main(int argc, char *argv[])
{
pthread_t threads[NUM_THREADS];
int rc;
for(long t = 0; t < NUM_THREADS; t++)
{
printf("In main: creating thread %ld\n", t);
rc = pthread_create(&threads[t], NULL, PrintHello, (void *) t);
if (rc)
{
printf("ERROR; return code from pthread_create() is %d\n", rc);
exit(-1);
}
}
pthread_exit(NULL);
}
28
Passing Arguments to Pthreads
struct thread_data
{
int thread_id;
int sum;
char *message;
};
struct thread_data thread_data_array[NUM_THREADS];
void *PrintHello(void *threadarg)
{
struct thread_data *my_data;
...
my_data = (struct thread_data *) threadarg;
taskid = my_data->thread_id;
sum = my_data->sum;
hello_msg = my_data->message;
…
}
int main (int argc, char *argv[])
{
…
thread_data_array[t].thread_id = t;
thread_data_array[t].sum = sum;
thread_data_array[t].message = messages[t];
rc = pthread_create(&threads[t], NULL, PrintHello, (void *) &thread_data_array[t]);
…
}
29
Passing Arguments to Pthreads
The following code fragment illustrates a common mistake when passing arguments to Pthreads
A reference to the variable t is passed to each thread with the intention of t having a different value in each thread
But the value of t may change before the thread gets to access it
int rc;
for(long t=0; t
It is recommended in some implementations to place this header first since it defines macros for using thread-safe functions when these are available
When compiling (on most systems) you need to manually specify that you want to link the libpthread.a library
eg. on the PS3:
ppu-gcc –pthread …remaining_args…
32