5/2/2021 Exam 02 Short Answer: CMPSC 450 Spring 2021
https://psu.instructure.co
1/6
Exam 02 Short Answer
Due Mar 15 at 8:50am Points 15 Questions 4
Available Mar 15 at 8am – Mar 15 at 8:50am about 1 hour Time Limit None
This quiz was locked Mar 15 at 8:50am.
Attempt History
Attempt Time Score
LATEST Attempt 1 35 minutes 1.6 out of 15
Correct answers are hidden.
Score for this quiz: 1.6 out of 15 Submitted Mar 15 at 8:50am This attempt took 35 minutes.
Question 1
0.6 / 3 pts
The following code runs too slow and needs to be threaded using std::threads. In the answer, explain how you would modify the code to apply the five step process we covered in class on how to thread a piece of code. For clarity, break your answer into 5 parts. In each part, note the step and explain what changes you would make to the code to accomplish that step. Simply providing a new subroutine without any explanation will result in zero points.
10: void findPattern(int *img, int imgWidth, int imgHeight, int *kerne
l, int kernelWidth, int kernelHeight)
11: {
12:
13:
14:
15:
16:
17:
18:
19:
float *result = new float[imgHeight * imgWidth];
for (int ii = 0; ii < (imgHeight * imgWidth); ii++)
result[ii] = 0.0;
for (int y = 0; y < imgHeight-kernelHeight; y++)
for(int x = 0; x < imgWidth-kernelWidth; x++)
{
for (int ky = 0; ky < kernelHeight; ky++)
/courses/2109084/quizzes/4153101
m
5/2/2021
https://psu.instructure.com/courses/2109084/quizzes/4153101 2/6
Exam 02 Short Answer: CMPSC 450 Spring 2021
Your Answer:
1. Create a structure of parameters struct MYPARAM{
int *img,
int imgWidth,
int imgHeight,
int *kernel,
int kernelWidth,
int kernelHeight};
2. Isolate the loop
void findPattern( struct MYPARAM *p_param) {
float *result = new float[imgHeight * imgWidth];
}
3. Create a container array std::thread t[NUMTHREADS] 4.
5. Join the threads
for (int y = 0; y < imgHeight-kernelHeight; y++)
for(int x = 0; x < imgWidth-kernelWidth; x++)
{
for (int ky = 0; ky < kernelHeight; ky++)
for (int kx = 0; kx < kernelWidth; kx++)
result[y * imgWidth + x] += img[(y + ky) * imgWidth +
(x + kx)] * kernel[ky * kernelWidth + kx];
}
20: for (int kx = 0; kx < kernelWidth; kx++)
21: result[y * imgWidth + x] += img[(y + ky) * imgWidth
+ (x + kx)] * kernel[ky * kernelWidth + kx];
22: }
23: }
5/2/2021 Exam 02 Short Answer: CMPSC 450 Spring 2021
for(int i = 0; i < NUMTHREADS; i++) {
t[i].join();
}
How would you partition the work for Step 2? Missing fire off the threads.
Question 2
0 / 3 pts
What are the values of v1, v2 and v3 on the execution of the OpenMP region below?
int v1 = 0; int v2 = 0;
float v3 = 0.0;
int *A = new int[N];
#pragma omp parallel num_threads(16)
{
v1 = A[0]; v2 = A[0];
#pragma omp for
for (int i = 0; i < N; i++) {
if (A[i] < v1) {
#pragma omp critical
} }
v2 = A[i];
}
v1 = A[i];
if (A[i] > v2) {
#pragma omp master
#pragma omp for reduction (+:v3)
for (int i = 0; i < N; i++)
v3 += A[i];
v3 /= N;
}
Your Answer:
https://psu.instructure.com/courses/2109084/quizzes/4153101
3/6
5/2/2021 Exam 02 Short Answer: CMPSC 450 Spring 2021
v1:critical all thread cannot enter at the same time, v1 may be 0 to N-1 v2 = N/16 when it is for the first master thread
v3 = (n-1)/n
v1 = min value v2 = max of the region the master thread traverses v3 is the sum of the array, then converted to the mean.
Question 3
0 / 3 pts
Below is a PRAM algorithm for calculating prefix sums. How does it compare to the Balanced Binary tree based algorithm we learned before in terms of performance and resource utilization?
for i = 1:N pardo b[i] = a[i];
offset = 1;
for i = 1:log (N) {
for k = 1:N-offset pardo
b(k + offset) = b(k + offset) + b(k);
offset *= 2; }
Your Answer:
put for(k+1) to the outer side of fo
https://psu.instructure.com/courses/2109084/quizzes/4153101
4/6
5/2/2021 Exam 02 Short Answer: CMPSC 450 Spring 2021
https://psu.instructure.co
5/6
Balanced binary tree based parallel prefix sums algorithm W = O(N), T = O(log n) This algorithm calculates the prefix sum in one pass, vs the up and down process of the BBT based algorithm.
Question 4
1 / 6 pts
Part 1:
Name the "Five steps to parallelization" from class. For each step, identify where it is occurring in the following code. Be specific and use the line numbers in your answer.
Part 2:
Given the following code, how much of a speed up would you expect if OMP_NUM_THREADS was increased from 1 to 16 given the following parameters, explain your answer.
A 16 core, 3.3 GHz Intel Xeon processor with 32KB L1, 256KB L2, 30 MB L3 and 256 GB ram. A memory (RAM) bandwidth of 10 GB/s. Furthermore, each core can issue one add and one multiply per clock.
// A,B,C,D are all arrays of positive values;
// N = 1000000000;
8: void dummy(double *A)
9: {
10: A[0] = 0;
11: }
12:
13: void myFunc(double *A, double *B, double *C, double *D, int N)
14: {
15:
16:
17:
18:
19:
20:
#pragma omp parallel for
for (int n = 0; n < N; n++)
{
A[n] = B[n] + C[n] * D[n];
if (A[n] < 0)
/courses/2109084/quizzes/4153101
m
5/2/2021
https://psu.instructure.com/courses/2109084/quizzes/4153101 6/6
Exam 02 Short Answer: CMPSC 450 Spring 2021
Quiz Score: 1.6 out of 15
Your Answer:
Part1: Part2:
when OMP_NUM_THREADS is 1, time = 4*8*10^9 / (10*10^9) = 4s. It will not change if thread number is 16 since the memory is the bottle neck.
1) Isolate the subroutine is the structured block from lines 17-22. 2) Partition the work is performed in lines 15 and 16. 3) Launch the threads is handled by line 15 in the compiler. 4) Wait for the threads to complete. This is the implied barrier at line 22. 5) Combine the results, there are no results to combine here.
21: dummy(A);
22: }
23: }