SCHOOL OF ENGINEERING
DIGITAL SYSTEM DESIGN 4 ELEE10007
Exam Date: 03/05/2016 From and To: 09.30-11.30 Exam Diet: Apr/May 2016 Please read full instructions before commencing writing
Exam paper information
Copyright By PowCoder代写 加微信 powcoder
• Paper consists of 2 Sections
• Candidates to answer THREE questions
• Section A: (One question) Answer whole Section
• Section B: Answer TWO out of THREE questions
Special instructions
Students should assume reasonable values for any data not given in a question nor available on a datasheet, and should make any such assumptions clear on their script.
Students in any doubt as to the interpretation of the wording of a question, should make their own decision, and should state it clearly on their script.
Please write your name in the space indicated at the right hand side on the front cover of the answer book. Also enter you examination number in the appropriate space on the front cover.
Write ONLY your examination number on any extra sheets or worksheets used and firmly attach these to the answer book(s).
This examination will be marked anonymously.
Special items
Convenor of Board of Examiners: Professor R Examiners: Professor P Mellor and Professor J Robinson
Question A1
Define the term RISC and explain, with examples, the four design
principles upon which it was constructed. (3)
Describe the Von-Neumann bottleneck. Use a diagram to show the
physical origin of the bottleneck and, again using diagrams, describe
how the two Harvard architectures solve this problem. (5)
How can the bit (branch if-less-than) pseudoinstruction be
implemented using the MIPS core instruction set? (2)
The roofline model was introduced in the lectures as a method of characterising the performance of different processor systems undertaking a range of tasks (kernels) with different characteristics.
(i) Describe what aspects of an application or processor task are defined by the arithmetic intensity, and give an example of a
task with low arithmetic intensity. (2)
(ii) Referring to Figure A 1e, state which processor has the highest performance for a kernel which requires one word of data to be accessed (read or write) for every two floating point operations. (1)
(CONTINUED OVER)
The latencies for each of the five stages in a MIPS processor are given below.
E X MEM WB
Latency (ps) 170
(i) Calculate the maximum clock frequency for a single-cycle processor that uses these stages.
(ii) Calculate the maximum clock frequency for a pipelined processor that uses these stages.
(iii) Calculate the relative performance of the pipelined processor as compared to the single-cycle processor, assuming that the number of instructions is much greater than the number of pipeline stages. Use diagrams to show your working.
(iv) If you could split one stage into two new stages (each with half the latency) which would you split, and what would be the new theoretical maximum clock frequency? Discuss any potential problems.
ELEE10007 Digital System Design 4 – May 2016
__…,.. Processor A – & – Processor B
(iii) What aspect of the processor with the lower performance
should be improved for it to equal the processor with the higher performance, and by how much? (2)
0 1/8 1/4 1/2 1 2 4 8 16 32
Arithmetic Intensity: FLOPs/Byte Ratio
Figure A1e
ELEE10007 Digital System Design 4 – May 2016
SECTION B Question 81
a) Give two reasons why modern computers use a virtual memory
addressing system. (2)
b) For each of the two situations described below, state which form of locality this represents and describe how a cache memory is set up to exploit it.
(i) A loop in a programme which reads and writes a particular
variable during each cycle around the loop. (3)
(ii) A programme which sequentially accesses items in a data array structure. (3)
c) If a 16 KiB cache with 4-way set associativity has a block size of 32 words, define the following characteristics of the cache if the address is 32 bits wide:
(i) Number of blocks; (1)
(ii) Number of sets; (1)
(iii) Number of index bits in each address; (1) (iv) Number of tag bits in each address. (1)
d) If the cache described in 81 (c) is fully associative instead of being set associative:
(i) Describe the structure of the address with a suitable diagram; (2)
(ii) State which cache you would expect to have the lower hit-time,
and give a reason for your answer; (2)
(iii) If the block size is doubled, describe the effects you would expect
to observe on the miss rate and the hit time of the cache; (2)
(iv) Describe an appropriate method for deciding which block of data
in this fully associative cache should be replaced if a memory
access results in a miss at this level of the cache. (2)
ELEE10007 Digital System Design 4 – May 2016
Question 82 a)
Explain the key concept in parallel computing which is described and quantified in “Amdahl’s Law”. (2)
Two main forms for a parallel computing system were defined in this course: “message passing” and “shared memory”.
(i) With the aid of an appropriate diagram describe the key
components of each architecture. (6)
(ii) Describe a possible application for each type of parallel system. (2)
Multiprocessor systems with shared memory can have either uniform
(UMA) or non-uniform memory access (NUMA).
(i) Describe one advantage and one disadvantage of each type. (4)
An application with two threads can be run on a dual-processor system where each thread needs to access an entire array of 4096 words. For each of the two systems described below calculate the minimum number of cycles taken to run the application:
(ii) UMA, two processors with 1 GiB shared memory and memory
access latency of 35 cycles per word; (1)
(iii) NUMA, with two processors, each with local memory of 512 MiB
where local memory access time is 20 cycles/word, and remote memory access latency is 60 cycles per word. (3)
Imagine that you have been asked to design a new 2-core NUMA system with local memory latency of 15 cycles per word.
(iv) What is the maximum remote memory access latency required to
run the same application in fewer cycles than required by the
fastest system above? (2)
ELEE10007 Digital System Design 4 – May 2016
Question 83 a)
Consider a 32-bit MIPS architecture and assume the programme counter is currently at OxOOOOOOOO. Consider two instructions: An i-format instruction which uses PC-relative addressing; and a j-format instruction which uses pseudo-direct addressing.
(i) Explain, with the aid of diagrams how the two addressing
modes are physically implemented. Label the width of all
busses. (6)
(ii) Calculate the maximum forward address (in hexadecimal) that
can be reached for each of the two instructions. (2)
Describe how to design an arithmetic logic unit (ALU) which
implements overflow and zero-detection. (2)
An attractive feature of the simplified MIPS single-cycle architecture, that we have studied in-class, is that the main-controller can be implemented using purely combinational logic.
(i) Using the data from Table B3c, show with the help of a
diagram, how you would implement a single-cycle main
controller on a programmable logic array PLA using a sum of (6) products representation.
(ii) Explain, using diagrams, how this controller could easily be extended to provide the correct signals for a pipelined (4) processor. Assume all hazards are eliminated by the compiler.
Instruction
Op5:0 )> ::a ::a )> CJ
s:s:s:)> CD CD CD c:
c: CD CD c: DJ
…, en ::r
;:::+ …..
…, CD ;:::+ I l l CD a .
…, :::J C’)
000000 0 1 1 0 0 0 0 0 10 100011 1 1 0 1 0 0 1 1 00 101011 1 0 x 1 0 1 0 x 00 000100 0 0 x 0 1 0 0 x 01
END OF PAPER
ELEE10007 Digital System Design 4 – May 2016
R[rt] = R[rs] + SignExtlmm
R[rt) = R[rs] + SignExtlmm
R[rd] = R[rs] + R[rt]
R[rd] = R[rs] & R[rt)
R[rt) = R[rs] & ZeroExtlmm
if(R[rs]=R[rt])
(!) “O ·;;;
•(xis eq, 1t, or le) (op is=,<, or<=) (y is 32, Jc, or Je)
MIPSReference Data CORE INSTRUCTION SET
FOR- NAME, MNEMONIC MAT
Branch On FP True b e ! t Fl Branch On FP False be! f Fl
OPERATION if(FPcond)PC=PC+4+BranchAddr (4) if(!FPcond)PC=PC+4+BranchAddr(4)
FOR- NAME, MNEMONIC MAT
Add add R Add Immediate addi
Add Imm. Unsigned addiu
OPERATION (in Verilog) = R[rs] + R[rt]
OPCODE /FUNCT (Hex)
9hcx 0/21hcx 0 I 24hcx
Divide Unsigned FP Add Single
FP Compare Single FP Compare Double
d i v R d i vu R add.s FR
a d d . d FR c.<.s • FR cs.ct• FR
Lo=R[rs]/R[rt]; Hi=R[rs]%R[rt] Lo=R[rs]/R[rt]; Hi=R[rs)%R(rt) F(fd ]= F[fs] + F[ft) {F[fd],F[fd+l]l = {F[fs],F[fs+I]}
{F[ft),F(ft+l)} FPcond = (F[fs] op F[ft]l? I : 0
FPcond=({F[fs],F[fs+l]l op {F[ft],F[ft+I]})? I : 0
Add Unsigned
And Immediate andi
Branch On Equal heq
Branch On Not Equal b n e Jump
Jump And Link Jump Register
F[fd]=F[fs] - F[ft]
{F(fd],F(fd+I]} = {F[fs],F[fs+I)} -
Load Byte Unsigned 1bu Load Halfword
{F[ft],F[ft+I)} F[rt]=M[R[rs]+SignExtlmm] F(rt]=M(R[rs]+SignExtlmm];
ell lhu u Unsigned
R[rd] = CR[rs]
{Hi,Lo} = R[rs] • R(rt)
{Hi,Lo} = R[rs] * R(rt]
R[rd] = R(rt) » > shamt M[R[rs]+SignExtlmm] = F(rt) M[R[rs]+SignExtlmm] = F(rt); M[R[rs]+SignExtlmm+4) = F(rt+I)
Load Linked 11
~ Load Upper Imm. lui 0.
(!) Load Word lw
R(rt) = M[R[rs]+SignExtlmm] R[rt) = {imm, 16’b0}
R(rt) = M[R[rs]+SignExtlmm] R[rd] = – (R[rs] IR[rt])
R[rd] = R[rs] IR(rt]
R(rt] = R[rs] IZeroExtlmm
R[rd] = (R[rs] < R[rt]l? I : 0
30hcx fhcx 23hcx 0 I 27hcx 0 / 25hcx dhcx
0 / 2ahcx ahcx
Multiply Unsigned Shift Right Arith. Store FP Single Store FP
m u l t R m u l t u R sra R swcl I
i::OrorR 0
.,E .... (!)
Or Immediate ori
SetLessThan sit R
Set Less Than Imm. sl ti
R[rt] = (R[rs] < SignExtlmm)? I : 0 (2)
FLOATING-POINT INSTRUCTION FORMATS
Shift Left Logical Shift Right Logical
Store Byte
Store Conditional
Store Halfword
Store Word Subtract
Subtract Unsigned
R[rd] = R[rt] « shamt
R[rd] = R[rt] » shamt M[R[rs]+SignExtlmm](7:0) =
R[rt)(7:0) M[R[rs]+SignExtlmm] - R(rt]; R[rt) - (atomic)? I : 0
BASIC INSTRUCTION FORMATS
opcode I 26 25
11 Ill 6 5 immediate
Ssp 29 Sfp 30 Sra 31
( I ) ( 1,2) (2)
(4) PC=PC+4+BranchAddr (4)
PC=JumpAddr (5)
R[3 l]=PC+S;PC=JumpAddr (5)
R[rt]={24'b0,M[R[rs] +SignExtlmm](7:0) l (2)
R[rt]={ 16'b0,M[R[rs]
+SignExtlmm]( 15:0) l (2)
PC=PC+4+BranchAddr
FP Divide Single FP Divide Double
FP Multiply
div.s FR div.d FR mul . s FR mul.d FR sub. s FR sub.ct FR
F[fd] = F[fs] I F[ft]
{F[fd),F[fd+I]} = {F[fs],F[fs+I)} /
{F[ft],F[ft+l]l
if(R[rs]!=R[rt]l
F[fd] = F[fs] * F[ft]
{F[fd),F[fd+I]} = {F[fs],F[fs+I]} *
R(rt] = (R[rs] < SignExtlmm)
FR I opcode I fmt
fun ct I 0
SetLessThanImm. sltiu
i:: Unsigned ?I:0 26 "
I fs 16 15
ca SetLessThanUnsig.sltu R R[rd]=(R[rs]
if(R[rs]<=R[rt]) P C = Label if(R[rs]>=R[rt]) P C = Label R[rd] =immediate
R[rd] = R[rs]
ARITHMETIC CORE INSTRUCTION SET
0 OPCODE / FMT/FT I FUNCT
(Hex) 11/8/1/- 11/8/0/-
0/–/–/la 0/–/–/lb 11110/–/0
11/11/–/0
11/10/–~I’ 11111/–~)’
11/10/–/3
11/11/–/3
11/10/–/2
11/11/–/2
11/10/–11
11/11/–/1
31/–/–/-
351–1–1-
01–1–110 01–1–112 10/0/–/0 0/–/–/18 0/–/–/19 01–1–13 391-1-1–
3d/–/–/–
0IOOhcx JI 2625 2120 1615 0
FP Multiply
FP Subtract Single
FP Subtract
Load FP Single
Double F(rt+I)=M(R[rs)+SignExtlmm+4] Move From Hi mfhi R
Move From Lo mflo R
Move From Control mfcO R
The Constant Value 0 N.A. Assembler Temporary No
Values for Function Results
and Expression Evaluation
Argtunents No Temporaries No Saved Temporaries Yes Temporaries No Reserved for OS Kernel No Global Pointer Yes Stack Pointer Yes Frame Pointer Yes Return Address No
{F[ft],F[ft+l]l
opcode (31 :26)
(5:0) s l l
exa- A Char-
Exponent 0
I to MAX-I
Fraction 0 ;tO
bne absf blez srlv movf bgtz srav negf addi jr
addiu jalr slti. movz sltiu movn andi syscall ori break xori
lui sync mfhi (2) mthi
mflo movzf mtlo movnf
anything ±Fl. Pt. Num MAX 0 ±~
MAX ;tO NaN S.P. MAX – 255, D.P. MAX – 2047
multu y div z divu [
lh addu a lwl sub b lw subu
!bu and cvt.w.
lhu or e lwr xor f
Halfword I Halfword
Byte IByte IByte IByte
I 2 .l 4 5 6 7
Value ofthree least significant bits ofbyte address (Big Endian)
EXCEPTION CONTROL REGISTERS: CAUSE AND STATUS
sh JI15 sh
m swr n cache 0 11 tge p lwcl tgcu q
lwc2 tit pref tltu
teq c.olt.
Ide! c.ultf u ldc2 tne c.olef v
c . u l e f w SC c.sf. x
EXCEPTION CODES
Number Name Cause of Exception
Number Name
Cause of Exception Breakpoint Exception Reserved Instruction Exception Coprocessor Unimplemented Arithmetic Overflow Exception
Floating Point Exceptmn
swcl c. nglef swc2 c. seqf c. nglf
c. lt. sdcl c.ngef
(I) opcode(31:26) = 0
(2) opcode(31:26) = 171en (llhexl; if fmt(25:21)=1610n (IOhexlf= s (single);
iffmt(25:21)=1710n (I lhex)f= d (double)
13 Syscall Exception 15
sdc2 c.lef 111110 62 c.ngtf II 1111 63
PRE- FIX Kilo- Mega- Giga- Tera-
PRE- PRE- FIX SIZE FIX Peta- 10·3 milli-
Exa- 10·6 micro- Zetta- 10·9 nano- Votta- 10·12 pico-
PRE- FIX femto- atto- zepto- yocto-
SIZE lOl, 210
106, 220 109, 230 1012. 240
SIZE 1015, 250
1018, 260 1021. 210 1024. 2so
SIZE 10-15
10-18 10-21 10-24
c 125 7d > 126 7e
SIZE PREFIXES (10″ for Disk, Communication; 2• for Memory)
? 127 7f DEL
DynalcData Static Data
-Amument6 Argument 5
Higher Memory Addresses
Stack Grows
Lower Memory Addresses
The symbol for each prefix 1s JUSt us first letter, except μ 1s used for micro. Copyright 2009 by Elsevier, Inc., All rights reserved. From Patterson and Hennessy, =ation and Design, 4th ed.
IEEE 754 FLOATING-POINT STANDARD
(-l)s x (I+ Fraction) x 2