CS计算机代考程序代写 mips cache assembly SECTION A

SECTION A

SCHOOL OF ENGINEERING

DIGITAL SYSTEM DESIGN 4

ELEE10007

Exam Diet: May 2021 Duration: 24 hours Expected workload: 2 hours plus upload
Exam starts: 13:00 on 14/05/2021 Exam ends: 13:00 on 15/05/2021 All times are BST (UTC+1)

Before commencing work, please read the academic, formatting, scanning and uploading guidance.

Examination information

• This exam paper consists of TWO sections
• Candidates should attempt ALL THREE questions, as follows:
• Section A: ONE question. Attempt the whole section
• Section B: Two questions. Attempt the whole section

Specific instructions

• Students should assume reasonable values for any data not given in a question, or not available on a
datasheet, and should make any such assumption clear on their answer sheets.

• Students in any doubt as to the interpretation of the wording of a question, should make their own decision,
and should state it clearly on their answer sheet.

• Write concise, complete answers. If a length limit is given, stay within it. Produce equations and diagrams to a
good hand-drawn standard. The upper limit for each question is 4 pages, therefore, 12 pages for all the
three questions. For all questions show your working and discussion of your approach for full marks.

• This is an open book exam. This means you can freely access any printed or online materials to aid you in
your answers. Online materials can include text, images, videos and data. You may NOT engage in
interactions or discussions relating to the exam questions or examined subject matter in any form. Sharing
the answers to this exam in any way, by any means and in any form is STRICTLY NOT allowed.

• Use only a standard calculator. Do not use computer-based spreadsheets, mathematical solvers, simulation
tools, graphing calculators or any other tool which is interactive in nature, such as online mathematical
equation solvers.

Technical instructions (For full details see the formatting guidance and how to upload your exam to learn)

• Write in dark blue or black ink on white or light-coloured A4 paper, or the nearest equivalent size; unlined,
lined, and graph paper are all acceptable, as are pages with holes for binding.

• Write on one side of the paper only. Start every question on a new page. Use portrait orientation.
• On every page clearly write the QUESTION NUMBER (left side) and your EXAMINATION NUMBER (right

side).
• Arrange your pages so that your answers are given in the same order as the question paper.
• Take clear individual pictures of each page and combine in a single PDF document according to the scanning

instructions.
• Name your file with the course code and your examination number, e.g. ENGI00000-B123456.pdf

• Check your file carefully then upload it in the ONLINE EXAM area for this course on LEARN.
• If you require technical support, contact Exams.Eng@ed.ac.uk

Special Items

• None

Convenor of Board of Examiners: Professor R Cheung
External Examiner: Professor J Morrow & Dr T Tjahjadi

https://www.wiki.ed.ac.uk/x/nQCqGg
https://www.wiki.ed.ac.uk/x/DwCqGg
https://www.wiki.ed.ac.uk/x/KPplGg
https://www.wiki.ed.ac.uk/x/8v9lGg
https://edin.ac/3epLrBq
https://www.wiki.ed.ac.uk/x/DwCqGg
https://www.wiki.ed.ac.uk/x/8v9lGg
https://www.wiki.ed.ac.uk/x/KPplGg
mailto:Exams.Eng@ed.ac.uk

ELEE10007 Digital System Design 4 – May 2021

SECTION A
Question A1

For all questions show full working and discussion of your
approach for full marks.

a) Consider three different processors P1, P2, and P3 which implement
the same instruction set architecture (ISA). The clock rates for each
processor are: P1 = 3 GHz, P2 = 3.4 GHz, P3 = 2.5 GHz. The ISA
includes four different classes of instruction and the CPI (cycles per
instruction) for each class, for each processor are detailed in the
following table:

Processor
Instruction Classes

A: ALU
(immediate)

B: ALU
(registers)

C: Load
/Stores

D: Branch
/Jump

P1 1 2 4 2
P2 2 3 3 2
P3 1 2 3 2

(i) Given a program with a dynamic instruction count of 2.0 × 106

instructions divided into classes as follows: 15% class A, 40%
class B, 15% class C, and 30% class D, calculate the global
CPI, MIPS (millions of instructions per second) rating and the
total CPU time for each processor. Your answer should
include a discussion of the relevance of the MIPS rating as a
benchmark for comparing these processors.

(ii) You have been asked to redesign the slowest processor to

equal the fastest. Assume that you can make changes to the
CPI of one instruction class for that processor but that this
causes a 11% increase in the clock cycle time. Describe how
you would achieve this, with reference to any of the 8 great
ideas in computer design that have inspired your answer.

(iii) A new version of processor P2 is in development with multiple
cores where arithmetic and load/store instructions can be
parallelised, but branch instructions cannot. The CPI for each
instruction class does not change for the new multicore chip
but it has a reduced clock speed of 3 GHz. Assuming that the
number of instructions of instruction class A, B and C are
divided by 0.6 × P where P is the number of cores, determine
how many cores are required to double the overall
performance.

(4)

(2)

(4)

b) Write MIPS assembly language with the least number of instructions

to realise the following C code segment:

for (i = 0; i < 10; i ++) { C[i] = A[i] - (~(A[i + 1] | B[i + 2])); } Assume that i corresponds to register $t0, and the base of arrays A, B and C are in registers $s1, $s2 and $s3, respectively. (10) ELEE10007 Digital System Design 4 – May 2021 SECTION B Question B1 Considering the following MIPS code: SUM: slti $t0, $a0, 1 bne $t0, $zero, EXIT add $a1, $a1, $a0 addi $a0, $$a0, -1 j SUM Exit: 0x00A41022 jr $ra Assume the memory address for the first instruction is 0x00801000, and will increase sequentially for the subsequent instructions. a) (i) Use the MIPS green-card to decode the second and fifth instructions to machine code in hexadecimal. Indicate the addressing mode when implementing these two instructions. (ii) The sixth instruction is shown as machine code in hexadecimal instead of assembly language. Use the MIPS green-card to decode this instruction to the assembly language representation. (4) (2) b) Assume there are five pipeline stages, IF, ID, EX, MEM, WB. The time for running each stage is listed as follows: stages IF ID EX MEM WB time (ps) 200 100 200 200 100 (i) If the single-cycle datapath is used, calculate the time needed to run the first, second, and third instructions, and explain your answer. (ii) If the pipelined datapath with forwarding is used, calculate the total time used to run the first, second, and third instructions, and explain your answer. Here, we assume NO extra hardware is put in the ID stage to test registers and calculate the branch address. (iii) Compare the results from (i) and (ii), which datapath implementation is faster? Explain your answer and indicate potential method(s) to improve the efficiency of this pipelined implementation. (3) (5) (6) ELEE10007 Digital System Design 4 – May 2021 Question B2 a) Consider a directly mapped cache with a total size of 1 KiB where each block contains 4 words (assume a word is 32 bits). (i) Describe the breakdown of the address for the cache, assuming it is a byte address which is 32 bits long. Beginning from power on, the following, byte addressed, read accesses (in hexadecimal) are made: 0x04 0x10 0x0c 0xb0 0xE4 0x9c 0x408 0x1c 0xb4 0xc1c 0x90 0x888 Note: the highest address there has a binary value of 1100 0001 1100 so you don’t need to worry about the leading zeros. (ii) For each reference, list the tag, index, and offset; whether it is a hit or a miss; and if there are any blocks replaced, state the range of byte addresses that are overwritten. Your answer should include a list of the final contents of the cache. (iii) If the cache is changed to a 2-way set associative cache with the same block size and same total data size, repeat the process in part a)(ii) and determine if the overall hit rate is improved. Assume a Least Recently Used (LRU) replacement policy. Your answer should again include the contents of the cache at the end of the series of accesses. (2) (6) (4) b) Suppose that you have a dual core multiprocessor with the following cache arrangement. Each core has a private L1 data cache which is 2-way set associative with a 1 KiB block size and an access latency (hit time) of 20 ns per word. The two cores share a single, directly mapped L2 cache with a block size of 4 KiB and an access latency of 50 ns per word. Consider a process with the following two threads running on this multiprocessor: Thread 1: int A[1024]; for (i=0; i < 1024; i++) { A[i] = A[i] + 1; } Thread 2: int B[1024]; for (i=0; i < 1024; i++) { B[i] = B[i] + 1; } Assume that an int variable is the same size as a word, and that at the start of the process both arrays A and B are in the main memory, which has an access latency of 400 ns per word. CONTINUED OVER ELEE10007 Digital System Design 4 – May 2021 Furthermore, assume that A and B when mapped to L2 start at position 0 of the cache block, and that both caches use a write back policy. (i) If, at the start of the process the private L1 and shared L2 caches are empty, describe in detail what happens during the first two cycles of each loop (i.e., i=0 and i=1)? Assume that main memory blocks for arrays A and B map to different blocks in the L2 cache. (ii) Work out how much time is spent on cache and memory access for the full process to execute, stating any assumptions you have made. Assume that this is the only process running on the machine. (iii) If the main memory blocks for arrays A and B map to the same block in the L2 cache what is the effect on this process in the worst case? Suggest one change to the cache architecture which would improve this. (4) (2) (2) END OF PAPER ELEE10007 Digtal System Design 4.pdf ELEE10007_Exam_May_2021 (2).pdf