程序代写 CS402/922 High Performance Computing ● ●

Threads Level
Parallelism
aka “SIMD’s starting point” https://warwick.ac.uk/fac/sci/dcs/teaching/material/cs402/ 17/01/2022 ● CS402/922 High Performance Computing ● ●

Copyright By PowCoder代写 加微信 powcoder

17/01/2022
They make fabrics, right?
• ThreadàAn execution context for a processor, including a stream of instructions
• Limited to a particular Non Uniform Memory Access (NUMA) region
• We can have multiple threads per process

17/01/2022
Potential Problems with Parallelism
Now you tell me…
• Splitting up the workload across multiple threads can decrease runtime
• Need to ensure operations are ran in the right order
• Referred to as dependencies: • Control dependencies
• Data dependencies

17/01/2022
Control Dependencies
• Actions of further instructions dependant on current state
• Easiest ways to fix control dependencies:
• Design the loop/algorithm to remove branches
• MaskingàRun both sides of the branch at the same time, and only keep the results that actually occur at the end
What’s up next?
for (i = 0; i < N; i++) { //Random number between 0 and 1 float a = rand(); if (a < 0.5f) { b[i] += 1.0f 17/01/2022 Data Dependencies What am I doing? • Multiple instructions act on the same piece of data Flow Dependency • Variable depends on the previous instruction • b is dependent on the calculations for a for (i = 0; i < N; i++) { a[i] = x[i] + y[i]; for (i = 0; i < N; i++) { Anti-dependency • Variable depends on a future instruction • b is dependent on a not being updated for (i = 0; i < N; i++) { b[i] = a[i] + c[i]; for (i = 0; i < N; i++) { Output Dependency • Ordering of instructions affects optput • acouldbe2or5 for (i = 0; i < N; i++) { for (i = 0; i < N; i++) { x[i] = a[i] + 1; for (i = 0; i < N; i++) { b[i] = a[i] + c[i]; a[i] = x[i] + y[i]; 17/01/2022 Data Dependencies What am I doing? • Some of these dependencies can be fixed! • Often, multiple versions of the same variable are required • Won’t work for flow dependencies, as dependant on instruction order and data Anti-dependency for (i = 0; i < N; i++) { b[i] = a1[i] + c[i]; for (i = 0; i < N; i++) { a2[i] = x[i] + y[i]; Output Dependency for (i = 0; i < N; i++) { a1[i] = 2; for (i = 0; i < N; i++) { x[i] = a1[i] + 1; for (i = 0; i < N; i++) { a2[i] = 5; 17/01/2022 Can someone else do it for me? • Compilers can try and detect parallelisable code • Can often detect simple cases • ExampleàLoops that set data to a static value • May not always pick up on possible optimisations • Compiler will be overly cautious • Loop may be overly complex • Many compilers will produce a report on optimisations it has achieved 17/01/2022 Assisting the Compilers Fine, let me help... • Add flags to code • Exampleà__attribute__((unused)) • Libraries such as OpenMP • Exampleà#pragma omp parallel • More on this next time! • Specific compiler flags such as Intel Compilers • Exampleà-ipo 17/01/2022 Multithreading Now we are thinking in parallels! • If dependencies are dealt with, we can multithread it! • Most languages have a way of adding parallelism • C/C++àPthreads, OpenMP • JavaàRunnable interface, Thread class • Pythonà(Uses C disguised as Python...) 17/01/2022 Multithreading Examples – C/C++ (PThreads) void * kernel(void *); int main() { What does this look like in the real world? Create function signature int result, param, threadID; result=pthread_create(&threadID, NULL, kernel, (void *) param); /*Continue to do other tasks*/ result=pthread_join(threadID, NULL); Create the thread, running the function kernel and passing in the parameter param (can only pass one parameter). Use threadID as pointer to thread void * kernel(void *param) { b = a + (int) param; Join thread back to the main thread 17/01/2022 Multithreading Examples – Java (Thread class) What does this look like in the real world? public class MyThread extends java.lang.Thread { int threadNum; public MyThread(int num) { threadNum = num; public void run() { for (int i = 0; i<20; i++) { System.out.println("Hello from thread " + threadNum); } public static void main(String[] args) { for (int i = 0; i < 4; i++) { MyThread newThread = new MyThread(i); newThread.start(); Need to extend from java.lang.Thread Function that runs when the thread is Create each thread object Start each thread 17/01/2022 Multithreading Examples – Java (Runnable interface) What does this look like in the real world? public class RunBasicThread implements Runnable { char c; public RunBasicThread(char c) { this.c = c; public void run() { for(int i=0; i<100; i++) { System.out.print(c); public static void main(String[ ] args) { RunBasicThread bt = new RunBasicThread('!’); Need to implement Runnable Function that runs when the thread is RunBasicThread bt1 = new RunBasicThread('*’); new Thread(bt).start(); new Thread(bt1).start(); Create each runnable object Run each runnable object in a new thread (by passing them to a new Thread object and starting them) 17/01/2022 Synchronisation OK, press the button at the same time! • Threads run asynchronously • Sometimes, you need to access global data • Need to make sure the data isn’t altered • Need to make sure all threads have the same data • ̀stops other threads accessing the data • Cooperationàwait and notify 17/01/2022 Synchronisation Examples - C/C++ (Mutex Locks using PThreads) pthread_mutex_t my_mutex; void * kernel(void *); int main() { int * threadID = new int[NUM_THREADS]; int param; pthread_mutex_init(&my_mutex, NULL); for (i=0; iCS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com