CS计算机代考程序代写 mips cuda GPU Midterm #2 – Topics and Format: COSC 407 COSC 507…

Midterm #2 – Topics and Format: COSC 407 COSC 507…

Midterm #2 – Topics and Format
The midterm exam will cover the following major topics. Please take the time to review the
materials (especially starred slides). For the exam, you will need to be able to write code in C for
both OpenMP and CUDA. Questions will cover:

OpenMP
CUDA

The structure and layout of midterm #2 will be similar to what you saw (in terms of structure,
format and length) with midterm #1.

The exam will consist of multiple choice, short answer and programming questions (completing
aspects of code, converting serial to parallel, OpenMP details, CUDA details and structure, etc).
M/C and short answer questions will cover all materials (theory and applications). Questions will
also include code analysis (what is happening/what is the behaviour and why, given a fragment of
code) as well as writing C code focusing on OpenMP (all materials) and CUDA topics up to and
including what was covered on Thursday, November 4th. Questions will focus on the
application/understanding of theory and concepts presented to date. In addition to long
programming questions (OpenMP and CUDA), some questions will ask you to analyze the code
and describe behaviour/why it works or isn’t working in addition to completing code fragments
(provide just the programming statements needed).

The exam will cover all materials from the last midterm (start at topic 6) up to and including the
materials on presented on Thursday, November 4th on CUDA scheduling and warps. While the
exam doesn’t specially examine early concepts on C and OpenMP basics, it is assumed that you
are familiar with these concepts and know the materials. For the exam, you will need to write
OpenMP code using advanced concepts (which required the fundamental understanding of
materials from the first part of the course as well) in addition to writing CUDA. Please take the tine
to review the labs to date. The exam will be open book, meaning that you will be able to use
resources presented on the course to date.

The exam will be delivered as a Canvas quiz and is scheduled for 70 minutes. The first 10
minutes will be used to sign students in and is not included in the exam time. Students will be
placed into breakout rooms where an invigorator will verify ID. Please be prepared with a form of
official ID (UBC card). During the exam you will need to have a microphone (this should be muted
once the exam starts) and webcam active so the invigilator will be able to monitor the progress of
the exam. Ensure that you are in a quiet area to write the exam. Please arrive in the exam room

the exam. Ensure that you are in a quiet area to write the exam. Please arrive in the exam room
ahead of time to complete the sign in process (starting at 9:25). Information regarding specific
breakout rooms will be distributed later on this week. You will be required to join a specific
breakout room, based on your last name. Please ensure that you are in the appropriate breakout
room at 9:30. The exam will conclude 70 minutes after you start the exam. For a given
breakout room, the exam start will be controlled by the invigilator. The quiz will close
automatically at the end of the exam. Do not arrive late to the exam. Details will be
provided later this week.

While the exam is open book, please do take the time to review and organize your materials (I
would suggest creating a summary sheet of notes) as if you need to look everything up, you will
run out of time. Additionally, some questions will require an understanding of concepts in order to
answer the questions (meaning that you won’t be able to look up an answer/run the code, but be
able to recognize what is happening).

A high level overview of topic covered follows (you will need to expand on the details in each
section.

Topic 6: Barriers and Mutexes (Critical Sections, Atomics and Locks) (From last exam, but
review these concepts)

Race conditions
Purpose/function of barriers (types/behaviours in OpenMP)
Illegal barriers
Barriers and nowait
Mutual Exclusion in OpenMP

Critical directives (named vs unnamed) – purpose and behaviour
atomic – purpose and behaviour, load-modify-store, allowed/illegal forms

issues with atomics
simple locks

Using atomics, critical sections and locks
Cavers with critical sections, atomics and locks

Topic 7: Variable Scope and Reduction

Variable scope in OpenMP (directives, behaviour, usage, limits, etc)
shared
private
firstprivate

Compare and contrast different scopes
default clause and use

Reduction (directive, behaviour, usage, limits, etc)
Advantages
Be familiar with how reduction works
Limits on Reduction (allowable operations, initial values)
Cautions with Reduction (+ vs -) and how to resolve

Review examples in topic 7 (area under curve, summation)

Topic 8: Work Sharing – Parallel For & Single

Parallel For construct (directive, usage, behaviour, limitations, order (lack of) etc)
barrier behaviour with Parallel For
Compare/contrast how loop iterations are distributed across team, limitations/restrictions
with variables
Syntax options and placement in parallel regions
Variable scope
Examples with Parallel For
Nested loops with Parallel For (what is parallelized/shared/private)
Limitations on How and When to use Parallel For

Loop-carried dependencies/Data dependencies – description, impacts, and methods for
resolving
Issues surrounding interdependency among iterations and limitations with OpenMP
Review Summary of Working with Loops (techniques, limitations and use)
Assigning work to Single Thread (Single directive)
Assigning work to Master Thread (Master directive)
Compare and contrast Single and Master and describe thread team behaviour

Topic 9: Work Sharing – Sections, Scheduling and Ordered Iterations

Sections construct (directive, usage, behaviour, limitations, etc)
Barrier behaviour with Sections
Parallel sections and execution behaviour/order/limitations
Syntax
Thread team behaviour with Sections

Schedule construct (directive, usage, behaviour, limitations, etc)
static
dynamic
guided
purpose/behaviour of chunksize
Compare and contrast scheduling

When to use each type of scheduling
Ordered construct (directive, usage, behaviour, limitations, etc)

Rules and cost for ordered
Review usage examples

Topic 10: OpenMP Examples, Functions and SIMD

Matrix multiplication example
Finding the max value
Min/Max with Reduction
Finding Min, Max and Sum together example
Collapse with Nested Loops (directive, usage, behaviour, limitations, etc)
Sort examples
Producer/Consumer models/condition synchronization

Applications and challenges
Message passing
Locks
Nested Parallelism (enabling/disabling and synchronization)

Nested Parallel For
Flynn’s Taxonomy and the Von Neumann Model
Scaler vs Vector architectures

The Von-Neumann Model with SIMD Units
SIMD Parallelism
SIMD construct in OpenMP

Topic 11: Speed and Efficiency

Compare and contract speed and efficiency
Computer performance – throughout vs response time
CPU performance (user/CPU/wait time)
Metrics (IPS, MIPS, FLOPS, CPI)
Benchmarks
Overhead due to parallelism
Speedup and Efficiency of parallel programs (theorectical vs actual, calculations)

Speedup calculations
Efficiency calculations

Amdahl’s Law (usage, assumptions, limitations)
Gustafson’s Law (usage, assumptions, limitations)
Estimating speedup

Topic 12: Intro to CUDA

CPU vs GPU programming
Latency vs throughput
CPU computing

Differences between CPU and GPU threads
GPU program structure (parts and flow)
Threads, blocks and grids
Arrays of parallel threads
Writing a CUDA program

Host code
Device code (kernel)
memory allocation on GPU and data movement
data passing to card
Processing flow on CUDA
__global__
Launching a kernel (parameters, block size, grid size, etc)

Built-in CUDA functions
Error handling
Function declaration in CUDA (know where what gets executed)
Limitations on GPU code
Examples – addition, vector addition, data movement

Topic 13: CUDA Threads

Memory and pointers
Error handling

CUDA API errors and error codes
CUDA kernel errors and error codes

Device synchronization – cudaDeviceSynchronize()
Timing GPU code and and typical GPU program
GPU architecture design (SM, SP)
Thread organization (software and hardware)
Device properties
Thread in blocks
Blocks in grids
Dimension variables
Thread lifecycle
Kernel launch configurations

Kernel launch configurations
Examples – block/grid sizes
Computing array indexes for each thread
2D processing
Calculation launch configurations (grid size, block size based on data_
Vector addition example
1D, 2D configurations

Topic 13: CUDA Threads Part 2

Higher dimension grid/blocks
Limitations
gridSize
blockSize

printf in kernel functions
Block calculations for 2D grids
CUDA limits (Important1!)
CUDA vs OpenMP – compare/contrast
Thread cooperation
Matrix multiplication example

Row-major layout
Expanding to multiple blocks

Topic 14: Scheduling, Warps and Memory

Parallel GPU code with multiple blocks
Transparent Scalability (block assignment to SMs, thread execution and limitations
Warps (structure, purpose, function, limitations)
Thread scheduling
Thread lifecycle on hardware
Zero-overhead and latency tolerance
GPU limits/benefits

–Cutoff