程序代写代做 go graph cache CSCI-UA.0480-009 Parallel Computing Homework Assignment 2 [Total of 30 points]

CSCI-UA.0480-009 Parallel Computing Homework Assignment 2 [Total of 30 points]
1. [1] Suppose a single core running one thread has to wait 100 cycles for its memory system to provide a 64-bit word. If we have several loads coming from different cores to the memory these loads will be serialized and each one will take 100 cycles. Memory designers have come up with a nice idea. What did the designers do, assuming no faster technology exists, to reduce this delay. Ignore the existence of caches for now.
2. [2] Suppose we have a system with three level of caches: L1 is close to the processor, level 2 is below it, and level 3 is the last level before accessing the main memory. We know that the two main characteristics of a cache performance are: cache access latency (How long does the cache take before responding with hit or miss?) and cache hit rate (how many of the cache accesses are hits?). As we go from L1 to L2 to L3, which of the two characteristics become more important? and why?
3. [3] A sequential application with a 20% part that must be executed sequentially, is required to be accelerated three-fold. How many CPUs are required for this task? How about five-fold speedup?
4. In slide# 11 of the performance analysis lecture (lecture 6) we saw that as the number of processes increases, the speedup increases.
a. [3] Why the curve of ¡°double size¡± seems better than the other two?
b. [3] If we keep increasing the number of processes (i.e. the x-axis) what do you
think the rest of the curve will look like?
5. [3] In slide# 12 of the performance analysis lecture (lecture 6), efficiency decreases as the number of processes increases, why is that?
6. [2] What is the relationship between synchronization points in a parallel program and load balancing? Explain why do we have such a relationship?
7. [1] In slide 12 of lecture 5, we found that parallelism = 6.25, yet, we can see from the graph that there are 8 paths that can be executed in parallel. How can you explain this discrepancy?
8. [3] Suppose we have two threads doing the same operations but on different data. Also suppose these two threads execute on two different cores, and those cores are not executing anything else but the assigned threads. If the two threads start at the same time, can we assume that they will always finish at the same time? State two justifications to your answer.

9. Assume we have the following task graph. A task can be thought of as function/procedure. Every task is labeled with its run time on a core. An arrow from a task to another means that the first task generates data needed by the second one. Assume all data are of the same size.
a. [2] How will you parallelize that program (i.e. what is the smallest number of processes needed to execute in parallel to get the best performance)? Justify your choice.
b. [3] What will be the speedup if we have 2 processes? 4 processes? 8
processes?
c. [2] What is the span for the above graph? what is the work? d. [2] What is the parallelism in this DAG?