CS计算机代考程序代写 mips cache assembler computer architecture assembly SCHOOL OF ENGINEERING DIGITAL SYSTEM DESIGN 4 ELEE10007

SCHOOL OF ENGINEERING DIGITAL SYSTEM DESIGN 4 ELEE10007
Exam Date: 06/05/2019 From and To: 14:30-16:30 Exam Diet: May 2019 Please read full instructions before commencing writing
Exam paper information
• This paper consists of TWO sections.
• Candidates should attempt THREE questions, chosen as follows:
• Section A: ONE question. Attempt the whole section.
• Section B: Attempt TWO out of the THREE questions.
Special instructions
• Students should assume reasonable values for any data not given in a question nor available on a datasheet, and should make any such assumptions clear on their script.
• Students in any doubt as to the interpretation of the wording of a question, should make their own decision, and should state it clearly on their script.
• Please write your name in the space indicated at the right hand side on the front cover of the answer book. Also enter your examination number in the appropriate space on the front cover.
• Write ONLY your examination number on any extra sheets or worksheets used and firmly attach these to the answer book(s).
• This examination will be marked anonymously.
Special items
• Formula sheets (2 pages)
Convenor of Board of Examiners: Professor R Cheung External Examiner: Professor J Morrow and Dr Z Durrani

SECTION A Question A1
a) (i) With the aid of appropriate diagrams describe the key differences between the Von Neumann and Harvard computer architectures. (4)
(ii) Explain the problem with the Von Neumann architecture that the Harvard architecture is designed to solve. (2)
b) In a multiple issue processor, parts of the datapath are duplicated in order to provide instruction level parallelism.
(i) Describe the key differences between dynamic and static multiple- issue process architectures. (2)
(ii) Consider a programme with a loop structure that involves reading
data from successive memory locations, performing an arithmetic operation and writing the result back. Describe the process by
which such a loop is “unrolled” in order to improve performance on
a typical multiple issue processor. (4)
c) Suppose you need to perform two mathematical operations, the first is a sum of 20 scalar variables while the second is a matrix sum of a pair of 100×100 arrays. Assume for the following questions that the scalar sums cannot be parallelised and each sum takes t seconds.
(i) What are the execution time speedup factors if 10 processors or
100 processors are used rather than 1? (2)
(ii) What concept or law of computer design does the previous result illustrate? (1)
(iii) What would happen if the load wasn’t balanced when the
calculations are performed with 100 processors? (1)
d) The roofline model is a method of characterising the performance of different processor systems undertaking a range of different tasks (kernels) with different characteristics.
(i) Referring to Figure A1 overleaf, state which processor has the
highest performance for a kernel which requires one word of data
to be read in and then written out to memory for every four floating
point operations. (2)
(ii) What aspect of the processor with the lower performance should be improved for it to equal the processor with the higher performance, and by how much?
Continued overleaf
(2)
ELEE10007 Digital System Design 4 – May 2019

Figure A1
ELEE10007 Digital System Design 4 – May 2019

SECTION B Question B1
Consider the following MIPS code when answering this question. Refer to the MIPS Green Card which is appended to the exam paper.
sll $t0, $s3, 2 add $t0, $t0, $s6 lw $t0, 0($t0)
sll $t1, $s4, 2 add $t1, $t1, $s6 lw $t1, 0($t1)
add $t2, $t0 $t1 sw $t2, 32($s7)
a) For the FIRST THREE instructions in the MIPS code shown above:
(i) State which of these are R-format, and which are I-format
instructions? (2)
(ii) For the R-format instruction(s), show the decimal value of the
opcode (OP), source register (RS), target register (RT), destination register (RD), shift amount (shamt) and function bit (funct). For the I-format instruction(s), show the value of the opcode (OP), source register (RS), target register (RT), and the immediate field. (6)
(iii) Translate the three instructions into machine language expressed
in hexadecimal. (3)
b) For the LAST THREE instructions in the MIPS code shown above:
(i) What hardware elements are needed to implement these
instructions? (2)
(ii) Sketch the datapath with high-level abstractions of the hardware
elements that can run the add/lw/sw instructions. (4)
c) Translate the MIPS code to C. Assume that the variables f, g, h, i, and
j are assigned to registers $s0, $s1, $s2, $s3, and $s4, respectively. Assume that the base address of the arrays A and B are in registers
$s6 and $s7, respectively. (3)
ELEE10007 Digital System Design 4 – May 2019

Question B2
a) Considering the following MIPS code:
0000 0000 0000 1000 0101 0000 0010 1010 bne $t2, $0, ELSE
j DONE
ELSE: addi $t2, $t2, 2 DONE:
(i) The first instruction is shown as machine code instead of assembly language. Use the MIPS green-card appended to the exam paper to decode this instruction to the assembly language representation.
(2)
(4)
the PC to the address as 0x4000 0000? Explain your answer. (3)
(ii) Is it possible to use the branch-not-equal (bne) MIPS assembly instruction to set the PC to 0x4000 0000? Explain your answer.
(iii) What are the addressing modes when using j and bne?
c) To implement the bne instruction in MIPS architecture:
(ii) Assume $t0 holds the value 0x00101000, what is the value of $t2 after running the above instructions? In your answer you should explain what this MIPS loop is doing.
b) Suppose that the program counter (PC) is set to 0x2000 0000.
(i) Is it possible to use the jump (j) MIPS assembly instruction to set
(i) What arithmetic functions does the Arithmetic Logic Unit (ALU) perform? (2)
(ii) Sketch the datapath with high-level abstractions of the hardware elements that can run the bne instructions. (4)
ELEE10007 Digital System Design 4 – May 2019
(3) (2)

Question B3
a) The following questions are concerned with the arrangement of
memory and cache in a modern computer system.
(i) With the aid of an appropriate diagram describe the hierarchy of memory and storage, with reference to the aim or purpose of this memory architecture. (3)
(ii) Explain the concept of temporal locality and suggest how this is exploited through the arrangement and operation of the cache. (2)
(iii) Explain the concept of spatial locality and suggest how this is
exploited through the arrangement and operation of the cache. (2)
(iv) Explain the key purpose of a virtual memory system. (2)
b) If a processor has a 4-way set associative instruction cache with a total size of 32 KiB and a block size of 16 words, define the following characteristics of the cache, assuming that the address is 32 bits wide:
(i) Number of blocks in the cache; (1)
(ii) Number of index bits in the address; (1)
(iii) Number of tag bits in the address; (1)
(iv) If the cache is rearranged to be directly mapped, describe the new address structure with a suitable diagram. (2)
c) Decisions made in the design of a memory architecture will have an impact (positive or negative) on the hit time, miss rate and miss penalty. Describe the effects on these parameters when:
(i) The level of associativity is decreased; (2) (ii) The block size is increased; (2) (iii) Write back is used instead of write through in the data cache. (2)
ELEE10007 Digital System Design 4 – May 2019
END OF PAPER

M I P S Reference Data CORE INSTRUCTION SET
1
ARITHMETIC CORE INSTRUCTION SET
2
OPCODE / FMT /FT / FUNCT (Hex) 11/8/1/– 11/8/0/– 0/–/–/1a 0/–/–/1b 11/10/–/0
11/11/–/0 11/10/–/y 11/11/–/y
11/10/–/3
11/11/–/3
11/10/–/2
11/11/–/2
11/10/–/1
11/11/–/1
31/–/–/–
35/–/–/–
0 /–/–/10 0 /–/–/12 10 /0/–/0 0/–/–/18 0/–/–/19 0/–/–/3 39/–/–/–
3d/–/–/–
NAME, MNEMONIC Add add Add Immediate addi Add Imm. Unsigned addiu Add Unsigned addu And and And Immediate andi
Branch On Equal beq Branch On Not Equal bne
divu add.s
Lo=R[rs]/R[rt]; Hi=R[rs]%R[rt] R Lo=R[rs]/R[rt]; Hi=R[rs]%R[rt]
(6)
Jump
Jump And Link Jump Register
Load Byte Unsigned
Load Halfword Unsigned
Load Linked
Load Upper Imm.
Load Word
Nor
Or
Or Immediate
Set Less Than
Set Less Than Imm.
Set Less Than Imm. Unsigned
Set Less Than Unsig. Shift Left Logical Shift Right Logical
Store Byte
Store Conditional
Store Halfword
Store Word Subtract
Subtract Unsigned
j J jal J jr R
lbu I lhu I
ll I lui I lw I nor R or R ori I slt R slti I
sltiu I sltu R
sll R srl R
sb I sc I
sh I sw I
sub R subu R
PC=JumpAddr R[31]=PC+8;PC=JumpAddr PC=R[rs] R[rt]={24’b0,M[R[rs]
+SignExtImm](7:0)} R[rt]={16’b0,M[R[rs]
+SignExtImm](15:0)} R[rt] = M[R[rs]+SignExtImm] R[rt] = {imm, 16’b0}
R[rt] = M[R[rs]+SignExtImm] R[rd] = ~ (R[rs] | R[rt])
R[rd] = R[rs] | R[rt]
R[rt] = R[rs] | ZeroExtImm
R[rd] = (R[rs] < R[rt]) ? 1 : 0 R[rt] = (R[rs] < SignExtImm)? 1 : R[rt] = (R[rs] < SignExtImm) ? 1 :0 R[rd] = (R[rs] < R[rt]) ? 1 : 0 R[rd] = R[rt] << shamt R[rd] = R[rt] >> shamt M[R[rs]+SignExtImm](7:0) =
R[rt](7:0) M[R[rs]+SignExtImm] = R[rt]; R[rt] = (atomic) ? 1 : 0 M[R[rs]+SignExtImm](15:0) =
(4)
(5) 2hex (5) 3hex
F[fd]=F[fs] – F[ft]
{F[fd],F[fd+1]} = {F[fs],F[fs+1]} –
{F[ft],F[ft+1]} F[rt]=M[R[rs]+SignExtImm] F[rt]=M[R[rs]+SignExtImm];
F[rt+1]=M[R[rs]+SignExtImm+4] R[rd] = Hi
R[rd] = Lo
R[rd] = CR[rs]
{Hi,Lo} = R[rs] * R[rt]
{Hi,Lo} = R[rs] * R[rt]
R[rd] = R[rt] >>> shamt M[R[rs]+SignExtImm] = F[rt] M[R[rs]+SignExtImm] = F[rt]; M[R[rs]+SignExtImm+4] = F[rt+1]
FOR-
MAT OPERATION (in Verilog)
R R[rd] = R[rs] + R[rt]
I R[rt] = R[rs] + SignExtImm I R[rt] = R[rs] + SignExtImm
R R[rd] = R[rs] + R[rt] R R[rd] = R[rs] & R[rt]
I R[rt] = R[rs] & ZeroExtImm
I if(R[rs]==R[rt]) PC=PC+4+BranchAddr
I if(R[rs]!=R[rt]) PC=PC+4+BranchAddr
OPCODE / FUNCT (Hex)
(1) 0 / 20hex (1,2) 8hex
(2) 9hex
0 / 21hex
0 / 24hex
(3) chex
FOR- NAME, MNEMONIC MAT
Branch On FP True bc1t FI Branch On FP False bc1f FI
OPERATION if(FPcond)PC=PC+4+BranchAddr (4) if(!FPcond)PC=PC+4+BranchAddr (4)
4hex
0 / 08hex (2) 24hex
(2) 25hex (2,7) 30hex
fhex
(2) 23hex
0 / 27hex
0 / 25hex
(3) dhex
0 / 2ahex
0 (2) ahex
0 / 00hex 0 / 02hex
28hex 38hex 29hex
(2) 2bhex (1) 0 / 22hex 0 / 23hex
FP Divide Single FP Divide
Double
FP Multiply Single FP Multiply Double
FP Subtract Single FP Subtract Double
Load FP Single Load FP
Double
Move From Hi Move From Lo Move From Control Multiply
Multiply Unsigned Shift Right Arith. Store FP Single Store FP
Double
div.s FR div.d FR mul.s FR mul.d FR sub.s FR sub.d FR
F[fd] = F[fs] / F[ft] {F[fd],F[fd+1]} = {F[fs],F[fs+1]} /
{F[ft],F[ft+1]} F[fd] = F[fs] * F[ft]
{F[fd],F[fd+1]} = {F[fs],F[fs+1]} * {F[ft],F[ft+1]}
(4)
5hex
Divide
Divide Unsigned
FP Add Single
FP Add
Double
FP Compare Single c.x.s* FP Compare c.x.d* Double
div R
add.d
FR F[fd ]= F[fs] + F[ft]
FR {F[fd],F[fd+1]} = {F[fs],F[fs+1]} +
{F[ft],F[ft+1]} FR FPcond = (F[fs] op F[ft]) ? 1 : 0
FR FPcond = ({F[fs],F[fs+1]} op {F[ft],F[ft+1]}) ? 1 : 0
* (x is eq, lt, or le) (op is ==, <, or <=) ( y is 32, 3c, or 3e) (2,6) (6) 0 / 2bhex 0 0 bhex FLOATING-POINT INSTRUCTION FORMATS 31 26 25 21 20 16 15 11 10 31 26 25 21 20 16 15 lwc1 ldc1 mfhi mflo mfc0 mult multu sra swc1 sdc1 I I R R R R R R I I (2) (2) (6) (2) (2) 6 5 opcode fmt ft immediate M[R[rs]+SignExtImm] = R[rt] R[rd] = R[rs] - R[rt] R[rd] = R[rs] - R[rt] REGISTER NAME, NUMBER, USE, CALL CONVENTION } 31 26 25 21 20 16 15 11 10 6 5 0 BASIC INSTRUCTION FORMATS (2) (2,7) R[rt](15:0) (2) blt bgt ble bge li PSEUDOINSTRUCTION SET NAME Branch Less Than Branch Greater Than Branch Less Than or Equal Branch Greater Than or Equal Load Immediate Move move MNEMONIC OPERATION if(R[rs]R[rt]) PC = Label
if(R[rs]<=R[rt]) PC = Label if(R[rs]>=R[rt]) PC = Label R[rd] = immediate
R[rd] = R[rs]
NAME NUMBER USE
PRESERVED ACROSS A CALL?
$zero 0 The Constant Value 0 N.A.
$at 1 Assembler Temporary No
$v0-$v1 2-3 Values for Function Results No and Expression Evaluation
$a0-$a3 4-7 Arguments No
$t0-$t7 8-15 Temporaries No
$s0-$s7 16-23 Saved Temporaries Yes
$t8-$t9 24-25 Temporaries No
$k0-$k1 26-27 Reserved for OS Kernel No
$gp 28 Global Pointer Yes
$sp 29 Stack Pointer Yes
$fp 30 Frame Pointer Yes
$ra 31 Return Address No
(1) May cause overflow exception
(2) SignExtImm = { 16{immediate[15]}, immediate
(3) ZeroExtImm = { 16{1b’0}, immediate }
(4) BranchAddr = { 14{immediate[15]}, immediate, 2’b0 }
(5) JumpAddr = { PC+4[31:28], address, 2’b0 }
(6) Operands considered unsigned numbers (vs. 2’s comp.)
(7) Atomic test&set pair; R[rt] = 1 if pair atomic, 0 if not atomic
opcode
rs
rt
immediate
31 26 25 21 20 16 15
31 26 25
0
0
opcode
address
Copyright 2009 by Elsevier, Inc., All rights reserved. From Patterson and Hennessy, Computer Organization and Design, 4th ed.
FR FI
opcode
fmt
ft
fs
fd
funct
R I J
opcode
rs
rt
rd
shamt
funct
MIPS Reference Data Card (“Green Card”) 1. Pull along perforation to separate card 2. Fold bottom side (columns 3 and 4) together

OPCODES, BASE CONVERSION, ASCII SYMBOLS
3
IEEE 754 FLOATING-POINT
STANDARD
(-1)S × (1 + Fraction) × 2(Exponent – Bias) where Single Precision Bias = 127,
Double Precision Bias = 1023.
IEEE Single Precision and Double Precision Formats:
31 30 2322
63 62 5251
MEMORY ALLOCATION
4
IEEE 754 Symbols
S.P. MAX = 255, D.P. MAX = 2047
MIPS opcode (31:26)
(1) MIPS funct (5:0)
(2) MIPS funct (5:0)
Deci- Hexa- ASCII Binary mal deci- Char- mal acter
Deci- Hexa- ASCII mal deci- Char- mal acter
(1) sll add.f sub.f j srl mul.f jal sra div.f
00 0000 0 0 NUL 00 0001 1 1 SOH 00 0010 2 2 STX 00 0011 3 3 ETX
64 40 @ 65 41 A 66 42 B 67 43 C
beq sllv sqrt.f bne abs.f blez srlv mov.f bgtz srav neg.f
00 0100 4 4 EOT 00 0101 5 5 ENQ 00 0110 6 6 ACK 00 0111 7 7 BEL
68 44 D 69 45 E 70 46 F 71 47 G
addi jr addiu jalr slti movz sltiu movn
00 1000 8 8 BS 00 1001 9 9 HT 00 1010 10 a LF 00 1011 11 b VT
72 48 H 73 49 I 74 4a J 75 4b K
andi syscall round.w.f ori break trunc.w.f xori ceil.w.f lui sync floor.w.f
00 1100 00 1101 00 1110 00 1111
12 c FF 13 d CR 14 e SO 15 f SI
76 4c L 77 4d M 78 4e N 79 4f O
mfhi (2) mthi
mflo movz.f mtlo movn.f
01 0000 01 0001 01 0010 01 0011
16 10 DLE 17 11 DC1 18 12 DC2 19 13 DC3
80 50 P 81 51 Q 82 52 R 83 53 S
01 0100 01 0101 01 0110 01 0111
20 14 DC4 21 15 NAK 22 16 SYN 23 17 ETB
84 54 T 85 55 U 86 56 V 87 57 W
mult multu div divu
01 1000
01 1001
01 1010
01 1011 27 1b ESC
24 18 CAN 25 19 EM 26 1a SUB
88 58 X 89 59 Y 90 5a Z 91 5b [
01 1100 01 1101 01 1110 01 1111
28 1c FS 29 1d GS 30 1e RS 31 1f US
92 5c \ 93 5d ] 94 5e ^ 95 5f _
lb add cvt.s.f lh addu cvt.d.f lwl sub
lw subu
10 0000 10 0001 10 0010 10 0011
32 20 Space 33 21 ! 34 22 ” 35 23 #
96 60 ‘ 97 61 a 98 62 b 99 63 c
lbu and cvt.w.f lhu or
lwr xor
nor
10 0100 10 0101 10 0110 10 0111
36 24 $ 37 25 % 38 26 & 39 27 ’
100 64 d 101 65 e 102 66 f 103 67 g
sb
sh
swl slt sw sltu
10 1000 10 1001 10 1010 10 1011
40 28 ( 41 29 ) 42 2a * 43 2b +
104 68 h 105 69 i 106 6a j 107 6b k
swr cache
10 1100 10 1101 10 1110 10 1111
44 2c , 45 2d – 46 2e . 47 2f /
108 6c l 109 6d m 110 6e n 111 6f o
ll tge c.f.f lwc1 tgeu c.un.f lwc2 tlt c.eq.f pref tltu c.ueq.f
11 0000 11 0001 11 0010 11 0011
48 30 0 49 31 1 50 32 2 51 33 3
112 70 p 113 71 q 114 72 r 115 73 s
teq c.olt.f c.ult.f ldc2 tne c.ole.f c.ule.f
ldc1
11 0100 11 0101 11 0110 11 0111
52 34 4 53 35 5 54 36 6 55 37 7
116 74 t 117 75 u 118 76 v 119 77 w
sc c.sf.f
swc1 c.ngle.f
swc2 c.seq.f
c.ngl.f
11 1000 11 1001 11 1010 11 1011
56 38 8 57 39 9 58 3a : 59 3b ;
120 78 x 121 79 y 122 7a z 123 7b {
c.lt.f
sdc1 c.nge.f
sdc2 c.le.f c.ngt.f
11 1100 11 1101 11 1110 11 1111
60 3c < 61 3d = 62 3e > 63 3f ?
124 7c | 125 7d } 126 7e ~ 127 7f DEL
Exponent
Fraction
Object
0
0
±0
0
≠0
± Denorm
1 to MAX – 1
anything
± Fl. Pt. Num.
MAX
0
±∞
MAX
≠0
NaN
S
Exponent
Fraction
S
Exponent
Fraction

Argument 6
Argument 5
Saved Registers
Local Variables
Stack
Dynamic Data
Static Data
Text
Reserved
$sp
$gp
pc
7fff fffchex
1000 8000hex 1000 0000hex 0040 0000hex
0hex
STACK FRAME
$fp
$sp
0 0
Higher Memory Addresses
Stack Grows
Lower Memory Addresses
DATA ALIGNMENT
Double Word
Word
Word
Halfword
Halfword
Halfword
Halfword
Byte
Byte
Byte
Byte
Byte
Byte
Byte
Byte
01234567 Value of three least significant bits of byte address (Big Endian)
EXCEPTION CONTROL REGISTERS: CAUSE AND STATUS
31 15 86 2
B D
Interrupt Mask
Exception Code
Pending
Interrupt
15 8
U
M 4
E
L 1
I
E 0
BD = Branch Delay, UM = User Mode, EL = Exception Level, IE =Interrupt Enable
EXCEPTION CODES
SIZE PREFIXES (10 x for Disk, Communication; 2 x for Memory)
if fmt(25:21)==17ten (11hex) f = d (double)
Copyright 2009 by Elsevier, Inc., All rights reserved. From Patterson and Hennessy, Computer Organization and Design, 4th ed.
Number Name Cause of Exception
Number Name Cause of Exception
0 Int Interrupt (hardware)
9 Bp Breakpoint Exception
4 AdEL Address Error Exception (load or instruction fetch)
10 RI
Reserved Instruction Exception
5 AdES Address Error Exception (store)
11 CpU
Coprocessor Unimplemented
6 IBE
Bus Error on Instruction Fetch
12 Ov
Arithmetic Overflow Exception
7 DBE
Bus Error on Load or Store
13 Tr Trap
8 Sys Syscall Exception
15 FPE Floating Point Exception
PRE- SIZE FIX
PRE- SIZE FIX
PRE- SIZE FIX
PRE- SIZE FIX
103, 210 Kilo-
1015, 250 Peta-
10-3 milli-
10-15 femto-
106, 220 Mega-
1018, 260 Exa-
10-6 micro-
10-18 atto-
109, 230 Giga-
1021, 270 Zetta-
10-9 nano-
10-21 zepto-
1012, 240 Tera-
1024, 280 Yotta-
10-12 pico-
10-24 yocto-
(1) opcode(31:26) == 0
(2) opcode(31:26) == 17ten (11hex); if fmt(25:21)==16ten (10hex) f = s (single);
The symbol for each prefix is just its first letter, except μ is used for micro.
MIPS Reference Data Card (“Green Card”) 1. Pull along perforation to separate card 2. Fold bottom side (columns 3 and 4) together