Final Exam Overview
FINAL EXAM
OVERVIEW
Shuaiwen Leon Song
Final Exam Structure
▪ Three Sections:
▪ (1) Multiple choices (30 points): 15 questions with 2 points /per question. Please select
all the correct answers.
▪ (2) Fundamental understanding of the important concepts (40 points): 5 questions with
8 points/per question. Please provide short and on-point answers.
▪ (3) Comprehensive understanding (30 points): 3 questions with 10 points/per question.
Please provide concise answers and/or simple pseudo code. If there are multiple sub-
questions within a question, they each counts for a certain percentage of the total 10
points.
Caveats
▪ Section 1 will have one relatively hard question at the very end (worth 2 points). The
other questions are also the concepts that we have covered in the class. I think try
to get as many questions answered as possible. Similar format as Quiz 2.
▪ Section 2 is the section to test your basic understanding of the concepts. These
topics are all covered in the class at certain point. This will be section you should
work hard on to secure as many points as possible.
▪ Section 3: This section covers more comprehensive questions instead of just basic
concepts. This is where you use what you have learned to answer some simple
Q&As as well as demonstrate your ability to fix problematic code. No lengthy C
coding is required. It is all about pseudo code and your logic for solving the
problem. You should try to answer as many questions as possible. I think all three
questions I gave are not difficult; they may require you to think a bit before you
answer.
Week 1: Parallel Performance Models
▪ Review the typical performance models covered in this lecture
▪ Pay attention to in-class examples as well as your homework. There is one question
in Section 2 is from this lecture.
▪ Do warm-up exercises with your homework and in-class examples
▪ Pay attention to the core concepts and models (e.g., Karp-Flatt Metric, speedup
formula, isoefficiency formulas and examples).
Week2 ~ Week 3: C essentials
▪ Be able to read and understand C programs in the context of pthread, openmp and
MPI.
▪ NO questions just testing C essentials; but there will be C codes to illustrate
synchronization concepts, OpenMP and MPI.
Week 4: Memory Hierarchies and Optimizations
▪ Memory hierarchies
▪ Spatial and temporal locality
▪ False sharing/true sharing: causes and how to fix them
▪ Register reuse and cache blocking (will not be covered in the final)
Week 5: Processes (*)
▪ Concept and usage of fork():
▪ who is the child process? How is fork() used?
▪ What are the zombie processes? What situation causes zombie processes? What are
the solutions to zombie processes? E.g., waitpid(), wait(), or terminate the parent
process. Pay attention the exercises on this in the lecture slides.
▪ Signals communication between processes: signals are not queued
▪ What are the differences between process and threads?
Week 6: Threads (*) and basic Synchronization
▪ Threads vs Processes: Mechanisms and Sharing
▪ Pthreads basics
▪ Variable space (sharing? Private? Etc.)
▪ Unintended sharing
▪ Thread detached mode
▪ Understanding race condition and data race, and what may cause them to happen; what are
the potential solutions (e.g., data race example in slides).
▪ Enforcing mutual exclusion solutions: semaphores, mutex locks, condition variables, atomic
operations (e.g., spinlocks).
Week 7: Synchronization (*)
▪ Semaphores, mutex, condition variables and atomic operations: coding, mechanisms and when to
use what.
▪ Questions on how use semaphores: P (sem_post) and W (sem_wait) problems, binary semaphores,
consumer and producer questions using semaphores, deadlocks caused by semaphores…
▪ Questions on deadlocks: what causes deadlocks? Examples that cause deadlocks. Four necessary
conditions for triggering deadlocks.
▪ Questions on condition variables: always go with mutex; condition variables usage example and
the mechanism (how to wake up and then how they grab the lock).
▪ Questions on atomic operations: what are they? Why are they fast? When to use atomic operations
vs mutex/semaphores/condition variables?
▪ Understanding barrier
▪ Understanding spinlocks and what is the condition to use spinlocks (similar to atomic operations);
how to use CAS (compare and swamp) to enable spinlocks.
▪ ***: the cnt program used throughout the lecture 6 and lecture 7
Week 8: Quiz2 review and other synchronization questions
▪ Very important lecture covering some important questions
▪ Don’t ignore the video part about Q12 and Q 13
▪ Understanding false sharing and true sharing mechnisms
▪ Get familiar with the type of questions that may appear in the final.
Week 9 and 10: OpenMP
▪ Basic OpenMP directives, runtime APIs and environmental variables
▪ OpenMP example with false sharing
https://software.intel.com/content/www/us/en/develop/articles/avoiding-and-identifying-false-sharing-
among-threads.html
https://software.intel.com/content/www/us/en/develop/articles/avoiding-and-identifying-false-sharing-among-threads.html
Week 11: MPI
▪ MPI blocked send and receive vs their unblocked versions
▪ Example 1: Deadlock example for MPI_send and MPI_recv
▪ Example 2: MPI_Isend and MPI_Irecv, what do they do?and what do they normally
pair with?
▪ Example 3: why we need special care for ring MPI example? Why cant we just directly
do a MPI_send and then MPI_recv?
Some other examples
char **ptr; /* global */
int main()
{
“Hello
“Hello
from
from
int i;
pthread_t tid;
char *msgs[2] = {
foo”,
bar”
};
ptr = msgs;
for (i = 0; i < 2; i++) Pthread_create(&tid, NULL, thread, (void *)i); Pthread_exit(NULL); } /* thread routine */ void *thread(void *vargp) { int myid = (int)vargp; static int cnt = 0; printf("[%d]: %s (svar=%d)\n", myid, ptr[myid], ++cnt); } Global var: 1 instance (ptr [data]) Local vars: 1 instance (i.m, msgs.m) Local var: 2 instances ( myid.p0 [peer thread 0’s stack], myid.p1 [peer thread 1’s stack] ) Local static var: 1 instance (cnt [data]) 14 Mapping Variable Instances to Memory ■ Which variables are shared? ■ A variable x is shared iff multiple threads reference at least one instance of x. Thus: ◼ ptr, cnt, and msgs are shared ◼ i and myid are not shared Shared Variable Analysis Variable Referenced by Referenced by Referenced by instance main thread? peer thread 0? peer thread 1? ptr yes yes yes cnt no yes yes i.m yes no no msgs.m yes yes yes myid.p0 no yes no myid.p1 no no yes Semaphores: P & W 16 Consider the following program using semaphores . Which of the following execution order of sem_wait and sem_post are possible (W represent sem_wait, P represent sem_post)? (A) WPWWWP (B) WWPPWW (C) PPWWWW (D) WPWPWW (E) WWWWPP Semaphores: deadlocks vs normal execution sem A1Done = 0; sem B1Done = 0; //Thread A // Thread B A1 B1 wait(B1Done) wait(A1Done) post(A1Done) post(B1Done) A2 B2 Condition Variables Example: What is the mechanism? False/True Sharing True sharing: true sharing, would require programmatic synchronization constructs to ensure ordered data access. False sharing: The frequent coordination required between processors when cache lines are marked ‘Invalid’ requires cache lines to be written to memory and subsequently loaded. False sharing increases this coordination and can significantly degrade application performance. Example Deadlocks Deadlock is when two or more tasks never make progress because each is waiting for some resource held by another process/thread. Read (lock) Write (lock) Write_unlock(lock) Read_unlock(lock) A normal mutex deadlocks if a thread that already has a lock tries a second lock on it. What causes a race condition and requires mutual exclusive? Please consider the following program. The program's original intention was to print -out “hello from thread 0”, 'hello from thread 1' all the way to 'hello from thread 9' (note that the order of the printout should be random). However, there are several bugs in the code below. Please list all the bugs appeared in this code and explain why they cause problems. Please provide a fix to the code so it can run correctly.