CS计算机代考程序代写 mips assembly assembler computer architecture cache SCHOOL OF ENGINEERING

SCHOOL OF ENGINEERING
DIGITAL SYSTEM DESIGN 4 ELEE10007
Exam Date: 17/05/2017 From and To: 09:30-11:30 Exam Diet: May 2017 Please read full instructions before commencing writing
Exam paper information
• Paper consists of TWO Sections
• Candidates to answer THREE questions
• SECTION A: (ONE question) Answer whole section
• SECTION B: Answer TWO out of THREE questions
Special instructions
• Students should assume reasonable values for any data not given in a question nor available on a datasheet, and should make any such assumptions clear on their script.
• Students in any doubt as to the interpretation of the wording of a question, should make their own decision, and should state it clearly on their script.
• Please write your name in the space indicated at the right hand side on the front cover of the answer book. Also enter you examination number in the appropriate space on the front cover.
• Write ONLY your examination number on any extra sheets or worksheets used and firmly attach these to the answer book(s).
• This examination will be marked anonymously.
Special items
• Data sheets (two pages)
Convenor of Board of Examiners: Professor R Cheung External Examiners: Professor P Mellor and Dr Z Durrani

SECTION A Question A1
a) The MIPS architecture is an example of a RISC machine.
(i) Define the term RISC and explain why there has been a recent
move away from complex instruction machines. (2)
(ii) With respect to a single-cycle processor MIPS implementation,
and the four MIPS design principles, describe the main factors
that motivated designers to use a pipelined architecture. (2)
b) Describe the Von-Neumann bottleneck, use a diagram to show its
physical origin and, again using diagrams, describe how the two
Harvard architectures solve this problem. (4)
c) Assume that registers s0, s1, and s2 contain the values -10, 5 and 3 respectively.
In the MIPS code below, the third instruction is shown as machine- code instead of assembly language.
main:add $s3, $s1, $s1 #5+5 add $s4, $s1, $s0 #5+-10
exit: 0000 0010 0011 0010 1010 1000 0010 0010
(i) Use the MIPS green-card to decode this instruction. In your
answer you should provide the assembly language
representation and an appropriate comment to explain what the code is doing. (3)
(ii) Write a series of MIPS assembly instructions that implement the c-code fragment below, assume that $s1 stores i, $s2 stores j, and $s3 stores k. You should fully annotate your MIPS code.
while ( i < j ) { k++ ; }i=i*2; (3) (2) (i) Calculate the maximum reverse address (in hexadecimal) that can be reached for this instruction. (2) (ii) Explain why MIPS architectures include a shift-left-by-two unit. What would be the implications if this unit were omitted? (2) (iii) In your answer to part c)(ii) you may have used a pseudo- instruction: explain how the assembler deals with these. d) Consider a 32-bit MIPS architecture and assume the programme counter is currently at a decimal address of 3248674260. Consider an i-format instruction which uses PC-relative addressing. ELEE10007 Digital System Design 4 - May 2017 SECTION B Question B1 a) b) (i) With the aid of a suitable diagram, describe the hierarchy of memory and storage in a modern computer system with reference to the purpose of the memory architecture. (5) (ii) Explain the two forms of locality in cache memory systems and how these are exploited to improve memory access times. (4) c) (ii) Define the address structure if it is a directly mapped cache. (iii) How does the address structure change if it is changed to an 8-way set-associative cache? (iv) If this was changed to a fully associative cache, would you expect the hit time to increase or decrease? Give a reason for your answer. Assume a directly mapped cache with the same characteristics as in part b). Consider a data streaming process that reads a 1024 KiB working set with the following byte address stream: A cache system has the following characteristics: • Cache size of 32 KiB • Block size of 16 words (64 bytes) • 32 bit address size (i) How many blocks are in the cache? (1) (2) (2) (2) (2) (2) 0 2 4 6 8 10 12 14 16 ... (i) Calculate the miss rate for this address stream. (ii) How does the miss rate change if the cache is changed to an 8-way set associative structure and what form of locality does this represent? ELEE10007 Digital System Design 4 - May 2017 Question B2 a) Amdahl’s Law can be embodied by the following equation: Execution time after improvement = Execution time affected by improvement + Execution time unaffected Amount of Improvement (i) Explain how this applies to parallel computing systems and the concept of weak scaling versus strong scaling. (ii) Consider a problem that involves the multiplication of two floating point numbers and the sum of two 10x10 arrays of integers. Assume that multiplication cannot take place at the same time as addition and that each takes roughly the same time. Calculate the relative speedup for running this problem on a 10 core multiprocessor and a 100 core multiprocessor system, over a single processor with the same characteristics. (2) (iii) Comment on the efficiency of the parallel computer systems used for the problem in part a)(ii). (2) b) Flynn’s taxonomy defines four different computer architectures based on the arrangement of data and instruction streams. For each type provide a diagram illustrating the architecture and suggest an example implementation or application. (i) SISD (2) (ii) SIMD (2) (iii) MISD (2) (iv) MIMD (2) c) Figure B2c shows a roofline model diagram for a multicore processor. (i) Which properties of the processor limit the performance for arithmetic intensities of 1/2 and 2 respectively? (2) (ii) Sketch the graph you would expect if the memory performance was improved to double the attainable speed. Comment on the performance gain. Continued overleaf (2) ELEE10007 Digital System Design 4 - May 2017 (4) 64 32 16 8 4 2 1 1/8 1/4 1/2 1 2 4 8 16 Arithmetic Intensity: FLOPs/Byte Ratio Figure B2c ELEE10007 Digital System Design 4 - May 2017 Attainable GFLOPs/second Question B3 a) Figure B3a shows part of a simple MIPS architecture, which is capable of implementing four instructions: load word, store word, branch-on-equal, and R-type arithmetic. The four R-type functions are shown within the ALU block. The two-bit AluOP is derived from the opcode by the main controller. (i) What arithmetic functions does the ALU perform for load/store- word and branch instructions? For full marks explain the features of the MIPS ISA that require the ALU to perform these functions. (6) (ii) Use the MIPS green card to determine the “funct” codes for each of the four ALU functions shown in Figure B3a. (2) (iii) Using information provided by Figure B3a and the MIPS green card, design a combinational logic block which implements the desired functionality of the ALU Controller. You should clearly show each step in your working and draw the full logic diagram. (12) Figure B3a END OF PAPER ELEE10007 Digital System Design 4 - May 2017 M I P S Reference Data CORE INSTRUCTION SET 1 ARITHMETIC CORE INSTRUCTION SET 2 OPCODE / FMT /FT / FUNCT (Hex) 11/8/1/-- 11/8/0/-- 0/--/--/1a 0/--/--/1b 11/10/--/0 11/11/--/0 11/10/--/y 11/11/--/y 11/10/--/3 11/11/--/3 11/10/--/2 11/11/--/2 11/10/--/1 11/11/--/1 31/--/--/-- 35/--/--/-- 0 /--/--/10 0 /--/--/12 10 /0/--/0 0/--/--/18 0/--/--/19 0/--/--/3 39/--/--/-- 3d/--/--/-- NAME, MNEMONIC Add add Add Immediate addi Add Imm. Unsigned addiu Add Unsigned addu And and And Immediate andi Branch On Equal beq Branch On Not Equal bne divu add.s Lo=R[rs]/R[rt]; Hi=R[rs]%R[rt] R Lo=R[rs]/R[rt]; Hi=R[rs]%R[rt] (6) Jump Jump And Link Jump Register Load Byte Unsigned Load Halfword Unsigned Load Linked Load Upper Imm. Load Word Nor Or Or Immediate Set Less Than Set Less Than Imm. Set Less Than Imm. Unsigned Set Less Than Unsig. Shift Left Logical Shift Right Logical Store Byte Store Conditional Store Halfword Store Word Subtract Subtract Unsigned j J jal J jr R lbu I lhu I ll I lui I lw I nor R or R ori I slt R slti I sltiu I sltu R sll R srl R sb I sc I sh I sw I sub R subu R PC=JumpAddr R[31]=PC+8;PC=JumpAddr PC=R[rs] R[rt]={24’b0,M[R[rs] +SignExtImm](7:0)} R[rt]={16’b0,M[R[rs] +SignExtImm](15:0)} R[rt] = M[R[rs]+SignExtImm] R[rt] = {imm, 16’b0} R[rt] = M[R[rs]+SignExtImm] R[rd] = ~ (R[rs] | R[rt]) R[rd] = R[rs] | R[rt] R[rt] = R[rs] | ZeroExtImm R[rd] = (R[rs] < R[rt]) ? 1 : 0 R[rt] = (R[rs] < SignExtImm)? 1 : R[rt] = (R[rs] < SignExtImm) ? 1 :0 R[rd] = (R[rs] < R[rt]) ? 1 : 0 R[rd] = R[rt] << shamt R[rd] = R[rt] >> shamt M[R[rs]+SignExtImm](7:0) =
R[rt](7:0) M[R[rs]+SignExtImm] = R[rt]; R[rt] = (atomic) ? 1 : 0 M[R[rs]+SignExtImm](15:0) =
(4)
(5) 2hex (5) 3hex
F[fd]=F[fs] – F[ft]
{F[fd],F[fd+1]} = {F[fs],F[fs+1]} –
{F[ft],F[ft+1]} F[rt]=M[R[rs]+SignExtImm] F[rt]=M[R[rs]+SignExtImm];
F[rt+1]=M[R[rs]+SignExtImm+4] R[rd] = Hi
R[rd] = Lo
R[rd] = CR[rs]
{Hi,Lo} = R[rs] * R[rt]
{Hi,Lo} = R[rs] * R[rt]
R[rd] = R[rt] >>> shamt M[R[rs]+SignExtImm] = F[rt] M[R[rs]+SignExtImm] = F[rt]; M[R[rs]+SignExtImm+4] = F[rt+1]
FOR-
MAT OPERATION (in Verilog)
R R[rd] = R[rs] + R[rt]
I R[rt] = R[rs] + SignExtImm I R[rt] = R[rs] + SignExtImm
R R[rd] = R[rs] + R[rt] R R[rd] = R[rs] & R[rt]
I R[rt] = R[rs] & ZeroExtImm
I if(R[rs]==R[rt]) PC=PC+4+BranchAddr
I if(R[rs]!=R[rt]) PC=PC+4+BranchAddr
OPCODE / FUNCT (Hex)
(1) 0 / 20hex (1,2) 8hex
(2) 9hex
0 / 21hex
0 / 24hex
(3) chex
FOR- NAME, MNEMONIC MAT
Branch On FP True bc1t FI Branch On FP False bc1f FI
OPERATION if(FPcond)PC=PC+4+BranchAddr (4) if(!FPcond)PC=PC+4+BranchAddr (4)
4hex
0 / 08hex (2) 24hex
(2) 25hex (2,7) 30hex
fhex
(2) 23hex
0 / 27hex
0 / 25hex
(3) dhex
0 / 2ahex
0 (2) ahex
0 / 00hex 0 / 02hex
28hex 38hex 29hex
(2) 2bhex (1) 0 / 22hex 0 / 23hex
FP Divide Single FP Divide
Double
FP Multiply Single FP Multiply Double
FP Subtract Single FP Subtract Double
Load FP Single Load FP
Double
Move From Hi Move From Lo Move From Control Multiply
Multiply Unsigned Shift Right Arith. Store FP Single Store FP
Double
div.s FR div.d FR mul.s FR mul.d FR sub.s FR sub.d FR
F[fd] = F[fs] / F[ft] {F[fd],F[fd+1]} = {F[fs],F[fs+1]} /
{F[ft],F[ft+1]} F[fd] = F[fs] * F[ft]
{F[fd],F[fd+1]} = {F[fs],F[fs+1]} * {F[ft],F[ft+1]}
(4)
5hex
Divide
Divide Unsigned
FP Add Single
FP Add
Double
FP Compare Single c.x.s* FP Compare c.x.d* Double
div R
add.d
FR F[fd ]= F[fs] + F[ft]
FR {F[fd],F[fd+1]} = {F[fs],F[fs+1]} +
{F[ft],F[ft+1]} FR FPcond = (F[fs] op F[ft]) ? 1 : 0
FR FPcond = ({F[fs],F[fs+1]} op {F[ft],F[ft+1]}) ? 1 : 0
* (x is eq, lt, or le) (op is ==, <, or <=) ( y is 32, 3c, or 3e) lwc1 ldc1 mfhi mflo mfc0 mult multu sra swc1 sdc1 I I R R R R R R I I (2) (2) (6) (2) (2) 6 5 (2,6) (6) 0 / 2bhex 0 0 bhex FLOATING-POINT INSTRUCTION FORMATS 31 26 25 21 20 16 15 11 10 31 26 25 21 20 16 15 opcode fmt ft immediate M[R[rs]+SignExtImm] = R[rt] R[rd] = R[rs] - R[rt] R[rd] = R[rs] - R[rt] REGISTER NAME, NUMBER, USE, CALL CONVENTION BASIC INSTRUCTION FORMATS 31 26 25 21 20 16 15 11 10 6 5 0 (2) (2,7) R[rt](15:0) (2) blt bgt ble bge li } 2’b0 } PSEUDOINSTRUCTION SET NAME Branch Less Than Branch Greater Than Branch Less Than or Equal Branch Greater Than or Equal Load Immediate Move move MNEMONIC OPERATION if(R[rs]R[rt]) PC = Label
if(R[rs]<=R[rt]) PC = Label if(R[rs]>=R[rt]) PC = Label R[rd] = immediate
R[rd] = R[rs]
NAME NUMBER USE
PRESERVED ACROSS A CALL?
$zero 0 The Constant Value 0 N.A.
$at 1 Assembler Temporary No
$v0-$v1 2-3 Values for Function Results No and Expression Evaluation
$a0-$a3 4-7 Arguments No
$t0-$t7 8-15 Temporaries No
$s0-$s7 16-23 Saved Temporaries Yes
$t8-$t9 24-25 Temporaries No
$k0-$k1 26-27 Reserved for OS Kernel No
$gp 28 Global Pointer Yes
$sp 29 Stack Pointer Yes
$fp 30 Frame Pointer Yes
$ra 31 Return Address No
(1) May cause overflow exception
(2) SignExtImm = { 16{immediate[15]}, immediate
(3) ZeroExtImm = { 16{1b’0}, immediate }
(4) BranchAddr = { 14{immediate[15]}, immediate,
(5) JumpAddr = { PC+4[31:28], address, 2’b0 }
(6) Operands considered unsigned numbers (vs. 2’s comp.)
(7) Atomic test&set pair; R[rt] = 1 if pair atomic, 0 if not atomic
opcode
rs
rt
immediate
31 26 25 21 20 16 15
0
opcode
address
31 26 25
Copyright 2009 by Elsevier, Inc., All rights reserved. From Patterson and Hennessy, Computer Organization and Design, 4th ed.
0
FR FI
opcode
fmt
ft
fs
fd
funct
R I J
opcode
rs
rt
rd
shamt
funct
MIPS Reference Data Card (“Green Card”) 1. Pull along perforation to separate card 2. Fold bottom side (columns 3 and 4) together

OPCODES, BASE CONVERSION, ASCII SYMBOLS
3
IEEE 754 FLOATING-POINT STANDARD
(-1)S × (1 + Fraction) × 2(Exponent – Bias) where Single Precision Bias = 127,
Double Precision Bias = 1023.
IEEE Single Precision and Double Precision Formats:
31 30 2322
63 62 5251
MEMORY ALLOCATION
4
IEEE 754 Symbols
S.P. MAX = 255, D.P. MAX = 2047
MIPS opcode (31:26)
(1) MIPS funct (5:0)
(2) MIPS funct (5:0)
Deci- Hexa- ASCII Binary mal deci- Char- mal acter
Deci- Hexa- ASCII mal deci- Char- mal acter
(1) sll add.f sub.f j srl mul.f jal sra div.f
00 0000 0 0 NUL 00 0001 1 1 SOH 00 0010 2 2 STX 00 0011 3 3 ETX
64 40 @ 65 41 A 66 42 B 67 43 C
beq sllv sqrt.f bne abs.f blez srlv mov.f bgtz srav neg.f
00 0100 4 4 EOT 00 0101 5 5 ENQ 00 0110 6 6 ACK 00 0111 7 7 BEL
68 44 D 69 45 E 70 46 F 71 47 G
addi jr addiu jalr slti movz sltiu movn
00 1000 8 8 BS 00 1001 9 9 HT 00 1010 10 a LF 00 1011 11 b VT
72 48 H 73 49 I 74 4a J 75 4b K
andi syscall round.w.f ori break trunc.w.f xori ceil.w.f lui sync floor.w.f
00 1100 00 1101 00 1110 00 1111
12 c FF 13 d CR 14 e SO 15 f SI
76 4c L 77 4d M 78 4e N 79 4f O
mfhi (2) mthi
mflo movz.f mtlo movn.f
01 0000 01 0001 01 0010 01 0011
16 10 DLE 17 11 DC1 18 12 DC2 19 13 DC3
80 50 P 81 51 Q 82 52 R 83 53 S
01 0100 01 0101 01 0110 01 0111
20 14 DC4 21 15 NAK 22 16 SYN 23 17 ETB
84 54 T 85 55 U 86 56 V 87 57 W
mult multu div divu
01 1000
01 1001
01 1010
01 1011 27 1b ESC
24 18 CAN 25 19 EM 26 1a SUB
88 58 X 89 59 Y 90 5a Z 91 5b [
01 1100 01 1101 01 1110 01 1111
28 1c FS 29 1d GS 30 1e RS 31 1f US
92 5c \ 93 5d ] 94 5e ^ 95 5f _
lb add cvt.s.f lh addu cvt.d.f lwl sub
lw subu
10 0000 10 0001 10 0010 10 0011
32 20 Space 33 21 ! 34 22 ” 35 23 #
96 60 ‘ 97 61 a 98 62 b 99 63 c
lbu and cvt.w.f lhu or
lwr xor
nor
10 0100 10 0101 10 0110 10 0111
36 24 $ 37 25 % 38 26 & 39 27 ’
100 64 d 101 65 e 102 66 f 103 67 g
sb
sh
swl slt sw sltu
10 1000 10 1001 10 1010 10 1011
40 28 ( 41 29 ) 42 2a * 43 2b +
104 68 h 105 69 i 106 6a j 107 6b k
swr cache
10 1100 10 1101 10 1110 10 1111
44 2c , 45 2d – 46 2e . 47 2f /
108 6c l 109 6d m 110 6e n 111 6f o
ll tge c.f.f lwc1 tgeu c.un.f lwc2 tlt c.eq.f pref tltu c.ueq.f
11 0000 11 0001 11 0010 11 0011
48 30 0 49 31 1 50 32 2 51 33 3
112 70 p 113 71 q 114 72 r 115 73 s
teq c.olt.f c.ult.f ldc2 tne c.ole.f c.ule.f
ldc1
11 0100 11 0101 11 0110 11 0111
52 34 4 53 35 5 54 36 6 55 37 7
116 74 t 117 75 u 118 76 v 119 77 w
sc c.sf.f
swc1 c.ngle.f
swc2 c.seq.f
c.ngl.f
11 1000 11 1001 11 1010 11 1011
56 38 8 57 39 9 58 3a : 59 3b ;
120 78 x 121 79 y 122 7a z 123 7b {
c.lt.f
sdc1 c.nge.f
sdc2 c.le.f c.ngt.f
11 1100 11 1101 11 1110 11 1111
60 3c < 61 3d = 62 3e > 63 3f ?
124 7c | 125 7d } 126 7e ~ 127 7f DEL
Exponent
Fraction
Object
0
0
±0
0
≠0
± Denorm
1 to MAX – 1
anything
± Fl. Pt. Num.
MAX
0
±∞
MAX
≠0
NaN
S
Exponent
Fraction
S
Exponent
Fraction

Argument 6
Argument 5
Saved Registers
Local Variables
Stack
Dynamic Data
Static Data
Text
Reserved
$sp
$gp
pc
7fff fffchex
1000 8000hex 1000 0000hex 0040 0000hex
0hex
STACK FRAME
$fp
$sp
0
0
Higher Memory Addresses
Stack Grows
Lower Memory Addresses
DATA ALIGNMENT
Double Word
Word
Word
Halfword
Halfword
Halfword
Halfword
Byte
Byte
Byte
Byte
Byte
Byte
Byte
Byte
01234567 Value of three least significant bits of byte address (Big Endian)
EXCEPTION CONTROL REGISTERS: CAUSE AND STATUS
31 15 86 2
B D
Interrupt Mask
Exception Code
Pending
Interrupt
15 8
U
M 4
E
L 1
I
E 0
BD = Branch Delay, UM = User Mode, EL = Exception Level, IE =Interrupt Enable
EXCEPTION CODES
SIZE PREFIXES (10x for Disk, Communication; 2x for Memory)
if fmt(25:21)==17ten (11hex) f = d (double)
Copyright 2009 by Elsevier, Inc., All rights reserved. From Patterson and Hennessy, Computer Organization and Design, 4th ed.
Number Name Cause of Exception
Number Name Cause of Exception
0 Int Interrupt (hardware)
9 Bp Breakpoint Exception
4 AdEL Address Error Exception (load or instruction fetch)
10 RI
Reserved Instruction Exception
5 AdES Address Error Exception (store)
11 CpU
Coprocessor Unimplemented
6 IBE
Bus Error on Instruction Fetch
12 Ov
Arithmetic Overflow Exception
7 DBE
Bus Error on Load or Store
13 Tr Trap
8 Sys Syscall Exception
15 FPE Floating Point Exception
PRE- SIZE FIX
PRE- SIZE FIX
PRE- SIZE FIX
PRE- SIZE FIX
103, 210 Kilo-
1015, 250 Peta-
10-3 milli-
10-15 femto-
106, 220 Mega-
1018, 260 Exa-
10-6 micro-
10-18 atto-
109, 230 Giga-
1021, 270 Zetta-
10-9 nano-
10-21 zepto-
1012, 240 Tera-
1024, 280 Yotta-
10-12 pico-
10-24 yocto-
(1) opcode(31:26) == 0
(2) opcode(31:26) == 17ten (11hex); if fmt(25:21)==16ten (10hex) f = s (single);
The symbol for each prefix is just its first letter, except μ is used for micro.
MIPS Reference Data Card (“Green Card”) 1. Pull along perforation to separate card 2. Fold bottom side (columns 3 and 4) together