程序代写代做代考 computer architecture clock graph HIGH PERFORMANCE COMPUTER ARCHITECTURE

HIGH PERFORMANCE COMPUTER ARCHITECTURE

ASSIGNMENT 2 ― A Comparison of the Scoreboard & Tomasulo Approaches and Quick Revisions of Key Concepts

Totals of Assignment 2: 15 marks

(Q1) (5 marks) [Quick Revisions on Key Concepts] Determine whether each of the following statement is (T)rue or (F)alse. (1 mark for each question)

• Clock rate reduction may not help reducing power consumption. T F
• Usually recompilation of source is needed for superscalar architectures. T F
• Loop unrolling always requires more registers to execute the original loop. T F
• The time required to “fill” and “drain” a pipeline increases the speedup. T F
• Data hazards occur when an instruction depends on the result of the next instruction still existed in the pipeline. T F

Answer:

(Q2) (10 marks) [A Comparison of Scoreboard & Tomasulo’s Computers] Assume the latency characteristics for producer-consumer instruction pairs given as below:

Instruction producing result
Instruction using result
Latency
Floating-point ALU op.
Another floating point ALU op.
6
Floating-point ALU op.
STORE floating point
4
LOAD Floating-point
Floating point ALU op.
2
LOAD Floating-point
STORE floating point
1

Consider the following code sequence:

addd f7, f3, f4 ; f3 + f4 -> f7
multd f6, f7, f5 ; f7 * f5 -> f6
multd f8, f7, f5 ; 2nd multiply instruction

when executed on the Scoreboard based computer as compared to the same program fragment executed on the Tomasulo based computer. Carefully answer the following questions. In all the questions, you can simply assume both Scoreboard and Tomasulo computers have sufficient number of functional units, including at least one floating-point adder and two floating-point multipliers, for the execution of the above program fragment. Besides, you can assume the number of cycles required for the EXEC stage of “addd” or “multd” is 6 as specified in the above latency table.

(a) (6 marks) When the ‘addd’ instruction already enters its EXEC stage, and is computing the value of f7, clearly state whether the two subsequent ‘multd’ instructions can be issued or not for the execution of the above program fragment on EACH of the Scoreboard/Tomasulo’s computer. In addition, for each “yes” or “no” answer you provide for each specific computer, give a short and concise explanation to justify your answer. For example, here is a sample format for your answer:
• for Scoreboard – No, the two instructions cannot be issued BECAUSE ……
• for Tomasulo – Yes, the two instructions can be issued BECAUSE ……..
Answer:

(b) (4 marks) For the Scoreboard computer, calculate the total number of stalls involved for executing (i.e. from issue up to the ‘Exec. Completion’ stage, here, the write-result stage is excluded in consideration) the above program fragment. Besides, justify your answer with a clear explanation. [hint: refer to p. 55 of Mod-2 notes, OR the 2nd paragraph of p. A-72 in our reference book “Computer Architecture – A Quantitative Approach”]

Answer:

• END of Assignment 2 –