Threads Level
Parallelism
aka “SIMD’s starting point” https://warwick.ac.uk/fac/sci/dcs/teaching/material/cs402/ 17/01/2022 ● CS402/922 High Performance Computing ● ●
Copyright By PowCoder代写 加微信 powcoder
17/01/2022
They make fabrics, right?
• ThreadàAn execution context for a processor, including a stream of instructions
• Limited to a particular Non Uniform Memory Access (NUMA) region
• We can have multiple threads per process
17/01/2022
Potential Problems with Parallelism
Now you tell me…
• Splitting up the workload across multiple threads can decrease runtime
• Need to ensure operations are ran in the right order
• Referred to as dependencies: • Control dependencies
• Data dependencies
17/01/2022
Control Dependencies
• Actions of further instructions dependant on current state
• Easiest ways to fix control dependencies:
• Design the loop/algorithm to remove branches
• MaskingàRun both sides of the branch at the same time, and only keep the results that actually occur at the end
What’s up next?
for (i = 0; i < N; i++) {
//Random number between 0 and 1
float a = rand();
if (a < 0.5f) {
b[i] += 1.0f
17/01/2022
Data Dependencies
What am I doing?
• Multiple instructions act on the same piece of data
Flow Dependency
• Variable depends on the previous instruction
• b is dependent on the calculations for a
for (i = 0; i < N; i++) {
a[i] = x[i] + y[i];
for (i = 0; i < N; i++) {
Anti-dependency
• Variable depends on a future instruction
• b is dependent on a not being updated
for (i = 0; i < N; i++) {
b[i] = a[i] + c[i];
for (i = 0; i < N; i++) {
Output Dependency
• Ordering of instructions affects optput
• acouldbe2or5
for (i = 0; i < N; i++) {
for (i = 0; i < N; i++) {
x[i] = a[i] + 1;
for (i = 0; i < N; i++) {
b[i] = a[i] + c[i];
a[i] = x[i] + y[i];
17/01/2022
Data Dependencies
What am I doing?
• Some of these dependencies can be fixed!
• Often, multiple versions of the same variable are required
• Won’t work for flow dependencies, as dependant on instruction order and data
Anti-dependency
for (i = 0; i < N; i++) {
b[i] = a1[i] + c[i];
for (i = 0; i < N; i++) {
a2[i] = x[i] + y[i];
Output Dependency
for (i = 0; i < N; i++) {
a1[i] = 2;
for (i = 0; i < N; i++) {
x[i] = a1[i] + 1;
for (i = 0; i < N; i++) {
a2[i] = 5;
17/01/2022
Can someone else do it for me?
• Compilers can try and detect parallelisable code
• Can often detect simple cases
• ExampleàLoops that set data to a static value
• May not always pick up on possible optimisations • Compiler will be overly cautious
• Loop may be overly complex
• Many compilers will produce a report on optimisations it has achieved
17/01/2022
Assisting the Compilers
Fine, let me help...
• Add flags to code
• Exampleà__attribute__((unused))
• Libraries such as OpenMP
• Exampleà#pragma omp parallel
• More on this next time!
• Specific compiler flags such as Intel Compilers • Exampleà-ipo
17/01/2022
Multithreading
Now we are thinking in parallels!
• If dependencies are dealt with, we can multithread it!
• Most languages have a way of adding parallelism • C/C++àPthreads, OpenMP
• JavaàRunnable interface, Thread class
• Pythonà(Uses C disguised as Python...)
17/01/2022
Multithreading Examples – C/C++ (PThreads)
void * kernel(void *);
int main() {
What does this look like in the real world?
Create function signature
int result, param, threadID;
result=pthread_create(&threadID, NULL, kernel, (void *) param); /*Continue to do other tasks*/
result=pthread_join(threadID, NULL);
Create the thread, running the function kernel and passing in the parameter param (can only pass one parameter). Use threadID as pointer to thread
void * kernel(void *param) {
b = a + (int) param;
Join thread back to the main thread
17/01/2022
Multithreading Examples – Java (Thread class)
What does this look like in the real world?
public class MyThread extends java.lang.Thread { int threadNum;
public MyThread(int num) {
threadNum = num;
public void run() {
for (int i = 0; i<20; i++) {
System.out.println("Hello from thread " + threadNum); }
public static void main(String[] args) {
for (int i = 0; i < 4; i++) {
MyThread newThread = new MyThread(i); newThread.start();
Need to extend from java.lang.Thread Function that runs when the thread is
Create each thread object Start each thread
17/01/2022
Multithreading Examples – Java (Runnable interface)
What does this look like in the real world?
public class RunBasicThread implements Runnable { char c;
public RunBasicThread(char c) { this.c = c;
public void run() {
for(int i=0; i<100; i++) { System.out.print(c);
public static void main(String[ ] args) { RunBasicThread bt = new RunBasicThread('!’);
Need to implement Runnable Function that runs when the thread is
RunBasicThread bt1 = new RunBasicThread('*’); new Thread(bt).start();
new Thread(bt1).start();
Create each runnable object
Run each runnable object in a new thread (by passing them to a new Thread object and starting them)
17/01/2022
Synchronisation
OK, press the button at the same time!
• Threads run asynchronously
• Sometimes, you need to access global data
• Need to make sure the data isn’t altered
• Need to make sure all threads have the same data
• ̀stops other threads accessing the data
• Cooperationàwait and notify
17/01/2022
Synchronisation Examples - C/C++ (Mutex Locks using PThreads)
pthread_mutex_t my_mutex;
void * kernel(void *);
int main() {
int * threadID = new int[NUM_THREADS]; int param; pthread_mutex_init(&my_mutex, NULL); for (i=0; i