FINAL EXAM OVERVIEW
Shuaiwen Leon Song
Final Exam Structure
▪ Three Sections:
▪ (1) Multiple choices (30 points): 15 questions with 2 points /per question. Please select
all the correct answers.
▪ (2) Fundamental understanding of the important concepts (40 points): 5 questions with 8 points/per question. Please provide short and on-point answers.
▪ (3) Comprehensive understanding (30 points): 3 questions with 10 points/per question. Please provide concise answers and/or simple pseudo code. If there are multiple sub- questions within a question, they each counts for a certain percentage of the total 10 points.
Caveats
▪ Section 1 will have one relatively hard question at the very end (worth 2 points). The other questions are also the concepts that we have covered in the class. I think try to get as many questions answered as possible. Similar format as Quiz 2.
▪ Section 2 is the section to test your basic understanding of the concepts. These topics are all covered in the class at certain point. This will be section you should work hard on to secure as many points as possible.
▪ Section 3: This section covers more comprehensive questions instead of just basic concepts. This is where you use what you have learned to answer some simple Q&As as well as demonstrate your ability to fix problematic code. No lengthy C coding is required. It is all about pseudo code and your logic for solving the problem. You should try to answer as many questions as possible. I think all three questions I gave are not difficult; they may require you to think a bit before you answer.
Week 1: Parallel Performance Models
▪ Review the typical performance models covered in this lecture
▪ Pay attention to in-class examples as well as your homework. There is one question
▪ Do warm-up exercises with your homework and in-class examples
▪ Pay attention to the core concepts and models (e.g., Karp-Flatt Metric, speedup formula, isoefficiency formulas and examples).
in Section 2 is from this lecture.
Week2 ~ Week 3: C essentials
▪ Be able to read and understand C programs in the context of pthread, openmp and MPI.
▪ NO questions just testing C essentials; but there will be C codes to illustrate synchronization concepts, OpenMP and MPI.
Week 4: Memory Hierarchies and Optimizations
▪ Memory hierarchies
▪ Spatial and temporal locality
▪ False sharing/true sharing: causes and how to fix them
▪ Register reuse and cache blocking (will not be covered in the final)
Week 5: Processes (*)
▪ Concept and usage of fork():
▪ who is the child process? How is fork() used?
▪ What are the zombie processes? What situation causes zombie processes? What are the solutions to zombie processes? E.g., waitpid(), wait(), or terminate the parent process. Pay attention the exercises on this in the lecture slides.
▪ Signals communication between processes: signals are not queued
▪ What are the differences between process and threads?
Week 6: Threads (*) and basic Synchronization
▪ Threads vs Processes: Mechanisms and Sharing
▪ Pthreads basics
▪ Variable space (sharing? Private? Etc.)
▪ Unintended sharing
▪ Thread detached mode
▪ Understanding race condition and data race, and what may cause them to happen; what are
▪ Enforcing mutual exclusion solutions: semaphores, mutex locks, condition variables, atomic operations (e.g., spinlocks).
the potential solutions (e.g., data race example in slides).
Week 7: Synchronization (*)
▪ Semaphores, mutex, condition variables and atomic operations: coding, mechanisms and when to use what.
▪ Questions on how use semaphores: P (sem_post) and W (sem_wait) problems, binary semaphores, consumer and producer questions using semaphores, deadlocks caused by semaphores…
▪ Questions on deadlocks: what causes deadlocks? Examples that cause deadlocks. Four necessary conditions for triggering deadlocks.
▪ Questions on condition variables: always go with mutex; condition variables usage example and the mechanism (how to wake up and then how they grab the lock).
▪ Questions on atomic operations: what are they? Why are they fast? When to use atomic operations vs mutex/semaphores/condition variables?
▪ Understanding barrier
▪ Understanding spinlocks and what is the condition to use spinlocks (similar to atomic operations);
how to use CAS (compare and swamp) to enable spinlocks.
▪ ***: the cnt program used throughout the lecture 6 and lecture 7
Week 8: Quiz2 review and other synchronization questions
▪ Very important lecture covering some important questions
▪ Don’t ignore the video part about Q12 and Q 13
▪ Understanding false sharing and true sharing mechnisms
▪ Get familiar with the type of questions that may appear in the final.
Week 9 and 10: OpenMP
▪ Basic OpenMP directives, runtime APIs and environmental variables ▪ OpenMP example with false sharing
https://software.intel.com/content/www/us/en/develop/articles/avoiding-and-identifying-false-sharing- among-threads.html
Week 11: MPI
▪ MPI blocked send and receive vs their unblocked versions
▪ Example 1: Deadlock example for MPI_send and MPI_recv
▪ Example 2: MPI_Isend and MPI_Irecv, what do they do?and what do they normally pair with?
▪ Example 3: why we need special care for ring MPI example? Why cant we just directly do a MPI_send and then MPI_recv?
Some other examples
Mapping Variable Instances to Memory
Global var: 1 instance (ptr [data])
Local vars: 1 instance (i.m, msgs.m)
Local var: 2 instances (
myid.p0 [peer thread 0’s stack], myid.p1 [peer thread 1’s stack]
)
char **ptr; /* global */
int main() {
int i;
pthread_t tid;
char *msgs[2] = {
“Hello from foo”,
“Hello from bar”
};
ptr = msgs;
for (i = 0; i < 2; i++)
Pthread_create(&tid,
NULL, thread, (void *)i);
Pthread_exit(NULL);
}
/* thread routine */
void *thread(void *vargp)
{
int myid = (int)vargp;
static int cnt = 0;
printf("[%d]: %s (svar=%d)\n",
myid, ptr[myid], ++cnt);
}
Local static var: 1 instance (cnt [data]) 14
Shared Variable Analysis
■ Which variables are shared?
Variable Referenced by Referenced by Referenced by
instance main thread?
peer thread 0?
yes yes
peer thread 1?
yes yes
ptr cnt
yes no
i.m
yes no no
msgs.m yes yes yes
myid.p0 no yes no
myid.p1 no no yes
■ A variable x is shared iff multiple threads reference at least one instance of x. Thus:
◼ ptr, cnt, and msgs are shared ◼ i and myid are not shared
Semaphores: P & W
Consider the following program using semaphores . Which of the following execution order of sem_wait and sem_post are possible (W represent sem_wait, P represent sem_post)?
16
(A) WPWWWP (B) WWPPWW (C) PPWWWW (D) WPWPWW (E) WWWWPP
Semaphores: deadlocks vs normal execution
sem A1Done = 0; //Thread A
A1
wait(B1Done)
post(A1Done)
A2
sem B1Done = 0; // Thread B
B1
wait(A1Done)
post(B1Done)
B2
Condition Variables Example: What is the mechanism?
False/True Sharing
True sharing: true sharing, would require programmatic synchronization constructs to ensure ordered data access.
False sharing: The frequent coordination required between processors when cache lines are marked ‘Invalid’ requires cache lines to be written to memory and subsequently loaded. False sharing increases this coordination and can significantly degrade application performance.
Example
Deadlocks
Deadlock is when two or more tasks never make progress because each is waiting for some resource held by another process/thread.
Read (lock)
Write (lock) Write_unlock(lock) Read_unlock(lock)
A normal mutex deadlocks if a thread that already has a lock tries a second lock on it.
What causes a race condition and requires mutual exclusive?
Please consider the following program. The program's original intention was to print -out “hello from thread 0”, 'hello from thread 1' all the way to 'hello from thread 9' (note that the order of the printout should be random). However, there are several bugs in the code below. Please list all the bugs appeared in this code and explain why they cause problems. Please provide a fix to the code so it can run correctly.