Midterm #2 – Topics and Format
The midterm exam will cover the following major topics. Please take the time to review the materials (especially starred slides). For the exam, you will need to be able to write code in C for both OpenMP and CUDA. Questions will cover:
OpenMP CUDA
The structure and layout of midterm #2 will be similar to what you saw (in terms of structure, format and length) with midterm #1.
The exam will consist of multiple choice, short answer and programming questions (completing aspects of code, converting serial to parallel, OpenMP details, CUDA details and structure, etc).
M/C and short answer questions will cover all materials (theory and applications). Questions will also include code analysis (what is happening/what is the behaviour and why, given a fragment of code) as well as writing C code focusing on OpenMP (all materials) and CUDA topics up to and including what was covered on Thursday, November 4th. Questions will focus on the application/understanding of theory and concepts presented to date. In addition to long programming questions (OpenMP and CUDA), some questions will ask you to analyze the code and describe behaviour/why it works or isn’t working in addition to completing code fragments (provide just the programming statements needed).
The exam will cover all materials from the last midterm (start at topic 6) up to and including the materials on presented on Thursday, November 4th on CUDA scheduling and warps. While the exam doesn’t specially examine early concepts on C and OpenMP basics, it is assumed that you are familiar with these concepts and know the materials. For the exam, you will need to write OpenMP code using advanced concepts (which required the fundamental understanding of materials from the first part of the course as well) in addition to writing CUDA. Please take the tine to review the labs to date. The exam will be open book, meaning that you will be able to use resources presented on the course to date.
The exam will be delivered as a Canvas quiz and is scheduled for 70 minutes. The first 10 minutes will be used to sign students in and is not included in the exam time. Students will be placed into breakout rooms where an invigorator will verify ID. Please be prepared with a form of official ID (UBC card). During the exam you will need to have a microphone (this should be muted once the exam starts) and webcam active so the invigilator will be able to monitor the progress of the exam. Ensure that you are in a quiet area to write the exam. Please arrive in the exam room
the exam. Ensure that you are in a quiet area to write the exam. Please arrive in the exam room
ahead of time to complete the sign in process (starting at 9:25). Information regarding specific breakout rooms will be distributed later on this week. You will be required to join a specific breakout room, based on your last name. Please ensure that you are in the appropriate breakout room at 9:30. The exam will conclude 70 minutes after you start the exam. For a given breakout room, the exam start will be controlled by the invigilator. The quiz will close automatically at the end of the exam. Do not arrive late to the exam. Details will be provided later this week.
While the exam is open book, please do take the time to review and organize your materials (I would suggest creating a summary sheet of notes) as if you need to look everything up, you will run out of time. Additionally, some questions will require an understanding of concepts in order to answer the questions (meaning that you won’t be able to look up an answer/run the code, but be able to recognize what is happening).
A high level overview of topic covered follows (you will need to expand on the details in each section.
Topic 6: Barriers and Mutexes (Critical Sections, Atomics and Locks) (From last exam, but review these concepts)
Race conditions
Purpose/function of barriers (types/behaviours in OpenMP) Illegal barriers
Barriers and nowait
Mutual Exclusion in OpenMP
Critical directives (named vs unnamed) – purpose and behaviour
atomic – purpose and behaviour, load-modify-store, allowed/illegal forms
issues with atomics simple locks
Using atomics, critical sections and locks Cavers with critical sections, atomics and locks
Topic 7: Variable Scope and Reduction
Variable scope in OpenMP (directives, behaviour, usage, limits, etc) shared
firstprivate
Compare and contrast different scopes default clause and use
Reduction (directive, behaviour, usage, limits, etc) Advantages
Be familiar with how reduction works
Limits on Reduction (allowable operations, initial values) Cautions with Reduction (+ vs -) and how to resolve
Review examples in topic 7 (area under curve, summation)
Topic 8: Work Sharing – Parallel For & Single
Parallel For construct (directive, usage, behaviour, limitations, order (lack of) etc)
barrier behaviour with Parallel For
Compare/contrast how loop iterations are distributed across team, limitations/restrictions with variables
Syntax options and placement in parallel regions
Variable scope
Examples with Parallel For
Nested loops with Parallel For (what is parallelized/shared/private) Limitations on How and When to use Parallel For
Loop-carried dependencies/Data dependencies – description, impacts, and methods for resolving
Issues surrounding interdependency among iterations and limitations with OpenMP Review Summary of Working with Loops (techniques, limitations and use)
Assigning work to Single Thread (Single directive)
Assigning work to Master Thread (Master directive)
Compare and contrast Single and Master and describe thread team behaviour
Topic 9: Work Sharing – Sections, Scheduling and Ordered Iterations
Sections construct (directive, usage, behaviour, limitations, etc) Barrier behaviour with Sections
Parallel sections and execution behaviour/order/limitations Syntax
Thread team behaviour with Sections
Schedule construct (directive, usage, behaviour, limitations, etc)
purpose/behaviour of chunksize Compare and contrast scheduling
When to use each type of scheduling
Ordered construct (directive, usage, behaviour, limitations, etc)
Rules and cost for ordered Review usage examples
Topic 10: OpenMP Examples, Functions and SIMD
Matrix multiplication example
Finding the max value
Min/Max with Reduction
Finding Min, Max and Sum together example
Collapse with Nested Loops (directive, usage, behaviour, limitations, etc) Sort examples
Producer/Consumer models/condition synchronization Applications and challenges
Message passing
Nested Parallelism (enabling/disabling and synchronization)
Nested Parallel For
Flynn’s Taxonomy and the Model Scaler vs Vector architectures
The Von- with SIMD Units SIMD Parallelism
SIMD construct in OpenMP
Topic 11: Speed and Efficiency
Compare and contract speed and efficiency
Computer performance – throughout vs response time
CPU performance (user/CPU/wait time)
Metrics (IPS, MIPS, FLOPS, CPI)
Benchmarks
Overhead due to parallelism
Speedup and Efficiency of parallel programs (theorectical vs actual, calculations)
Speedup calculations
Efficiency calculations
Amdahl’s Law (usage, assumptions, limitations) Gustafson’s Law (usage, assumptions, limitations) Estimating speedup
Topic 12: Intro to CUDA
CPU vs GPU programming Latency vs throughput CPU computing
Differences between CPU and GPU threads GPU program structure (parts and flow) Threads, blocks and grids
Arrays of parallel threads
Writing a CUDA program Host code
Device code (kernel)
memory allocation on GPU and data movement
data passing to card
Processing flow on CUDA
__global__
Launching a kernel (parameters, block size, grid size, etc)
Built-in CUDA functions
Error handling
Function declaration in CUDA (know where what gets executed) Limitations on GPU code
Examples – addition, vector addition, data movement
Topic 13: CUDA Threads
Memory and pointers Error handling
CUDA API errors and error codes
CUDA kernel errors and error codes
Device synchronization – cudaDeviceSynchronize() Timing GPU code and and typical GPU program GPU architecture design (SM, SP)
Thread organization (software and hardware) Device properties
Thread in blocks
Blocks in grids
Dimension variables
Thread lifecycle
Kernel launch configurations
Kernel launch configurations
Examples – block/grid sizes
Computing array indexes for each thread
2D processing
Calculation launch configurations (grid size, block size based on data_ Vector addition example
1D, 2D configurations
Topic 13: CUDA Threads Part 2
Higher dimension grid/blocks Limitations
printf in kernel functions
Block calculations for 2D grids
CUDA limits (Important1!)
CUDA vs OpenMP – compare/contrast Thread cooperation
Matrix multiplication example
Row-major layout Expanding to multiple blocks
Topic 14: Scheduling, Warps and Memory
Parallel GPU code with multiple blocks
Transparent Scalability (block assignment to SMs, thread execution and limitations Warps (structure, purpose, function, limitations)
Thread scheduling
Thread lifecycle on hardware
Zero-overhead and latency tolerance
GPU limits/benefits