程序代写代做 GPU CSCI-UA.0480-003 Parallel Computing Homework Assignment 3 [Total 20 points]

CSCI-UA.0480-003 Parallel Computing Homework Assignment 3 [Total 20 points]
1. [8 points] For each of the following applications state whether it is beneficial to implement them on a GPU, and justify your answer.
a) Finding whether a number exist in an array of 10M integers
b) Calculating the first 1M Fibonacci numbers
c) Multiplying two 100×100 matrices
d) Adding 1Mx1M matrices
2. [3] Consider a block with 8 threads executing a section of code before reaching a barrier. The threads require the following amount of time (in micro seconds) to execute their corresponding sections: 2.0, 2.3, 3.0, 2.8, 2.4, 1.9, 2.6, 2.9 respectively, and spend the rest of their time waiting for the barrier. What percentage of the threads’ summed up execution times (i.e. cumulative time for all threads) is spent waiting for the barrier?
3. Assume the following piece of code (next page) is running on GPU with the following specs (assume lines 6 to 11 are part of the main() function)
Each SM can have up to:
 8 blocks
 768 thread
 8192 registers
a. [1] How many threads are there in total?
b. [1] How many threads are there in a warp?
c. [1] How many threads are there in a block?
d. [1] How many global memory loads and stores are done for each thread?
e. [1] How many accesses to shared memory are done for each block?
f. [2] How many iterations of the for loop (Line 23) will have branch divergence? Show your derivation.
g. [2] Identify an opportunity to significantly reduce the bandwidth requirement on
the global memory. How would you achieve this? How many accesses can you eliminate?