ECE437: Introduction to Digital Computer Design
Chapter 2 – Instructions Spring 2021
Announcements
• Lab1 is on basic 270/337 material and is not covered in 437 lectures
• Homeworks given out every Friday, due the next Friday at the start of the lecture
• Slidesandhomeworkspostedonthecoursewebsite – Slides will be posted before each lecture so you
can print before the lecture for taking notes
• Please speak your questions/answers – Unmute, speak, mute (Chat is no good)
ECE437,S’21 © Vijaykumar and Thottethodi (2) 1/27/2021
Instructions
• Instructions are the “words” of a computer
• Instruction set architecture (ISA) is its “vocabulary”
• With a few other things, this defines the interface of computers
– But implementations vary
• We will study MIPSTM ISA
– but the next ISA is not too hard after the first
• We won’t write much assembly code – ECE362
ECE437,S’21 © Vijaykumar and Thottethodi (3) 1/27/2021
Computer • What does it do?
– Fetch instruction from address in Program Counter (PC)
– Execute instruction
– Update PC to point to next instruction
ECE437,S’21 © Vijaykumar and Thottethodi
Fetch [PC]
Execute
(4)
Update PC
1/27/2021
1
Outline
– Registers and ALU ops – Memory and load/store – Branches and jumps
– And more . . .
• Read Chapter 2 and skim through Appendix E (on CD) for MIPS ISA
ECE437,S’21 © Vijaykumar and Thottethodi (5) 1/27/2021
• Instructions – Basics
Basics
• C statement
• f=(g+h)-(i+j)
• MIPS instructions*
addt0,g,h //t0=g+h addt1,i,j // t1=i+j subf,t0,t1 // f =t0–t1
• Opcodes/mnemonic, operands, source/destination
ECE437,S’21 © Vijaykumar and Thottethodi (6) 1/27/2021
Basics
• Opcode: Specifies the kind of operation (mnemonic)
• Operands: input and output data (source/destination)
• Operands t0, t1 are temps
• One operation, two inputs, one output
• Multiple instructions for one C statement
ECE437,S’21 © Vijaykumar and Thottethodi (7) 1/27/2021
Why not have bigger instructions?
• Whynothave“f=(g+h)-(i+j)”asoneinstruction?
• Church-Turingthesis:Averyprimitivecomputercan
compute anything that a fancy computer can compute – need only logical functions, read and write memory and data-
dependent decisions
• Therefore,ISAselectedforpracticalreasons: performance and cost, not computability
• Simplicity/regularitytendtoimproveboth
– E.g.,H/Wtohandlearbitrarynumberofoperands
– complex, slow and rarely usable and NOT NECESSARY
ECE437,S’21 © Vijaykumar and Thottethodi (8) 1/27/2021
2
Registers and ALU ops
• Ok,Ilied!
• Operandsmustberegisters,notvariables
– add $8, $17, $18 // $8 <- $17 + $18 – destination LEFT MOST – if you confuse this, you won’t get anything right in 437
– add$9,$19,$20
– sub$16,$8,$9
• MIPShas32registers$0-$31(figurenextslide)
– $0always0(writingto$0hasnoeffect–anoop(NOP))
• Registers$8,$9aretemps,$16-f,$17-g,$18-h,
• MIPSalsoallowsoneconstantcalled“immediate” – laterwewillseeimmediateis16bits[-32768,32767]
ECE437,S'21 © Vijaykumar and Thottethodi (9) 1/27/2021
$19 - i, $20 – j
Registers and ALU
$0 $1
$31
ALU
ECE437,S'21 © Vijaykumar and Thottethodi (10) 1/27/2021
ALU ops
• Some ALU ops:
– add, addi, addu, addiu (immediate, unsigned)
• Caution: casting signed to unsigned in System Verilog may cause weirdness
– sub...
– mul, div
– and, andi – or, ori
– sll,srl,...
- weird!
• Why registers? Specifiers fit in instructions, smaller memory is faster
• Are registers enough?
ECE437,S'21 © Vijaykumar and Thottethodi (11) 1/27/2021
Memory and load/store
• But need more than 32 words of storage
• Memory - an array of locations M[addr], indexed by addr (figure next slide)
• Data movement (on words or integers) – load word for reg <-- memory
– lw $17, 1002 // get input g
– store word for reg --> memory
– sw $16, 1001 // save output f
• Note: destination LAST for stores!
ECE437,S’21 © Vijaykumar and Thottethodi (12) 1/27/2021
3
Memory and load/store
$0 $1
$31
ALU
ECE437,S’21 © Vijaykumar and Thottethodi
0 1 2 3
1001 f 1002 g
Maxmem
(13) 1/27/2021
Byte vs. Word addresses
• I lied again!
– we need to address bytes for characters
– words take 4 bytes
– therefore, word addresses must be multiples of 4 (i.e., end in 00)
– Addresses are ALWAYS byte addresses (next slide)
• NOT word addresses (previous slide) • figure next slide
ECE437,S’21 © Vijaykumar and Thottethodi (14) 1/27/2021
Memory and load/store
$0 $1
$31
ALU
ECE437,S’21 © Vijaykumar and Thottethodi
0 4 8 0xC
0x4004 f 0x4008 g
Maxmem
(15) 1/27/2021
Memory and load/store
• Important for arrays
– A[i] = A[i] + h (figure next slide)
– # $8 – temp, $18 – h, $21 – (i x 4)
– Astart is 0x8000, A is an array of words NOT bytes
lw $8, Astart($21) // == 0x8000($21)
// Astart or 0x8000 is offset
add $8, $18, $8
// $8 <- M[Astart+$21]
// or $8 <- M[0x8000+$21]
sw $8, Astart($21) //offset is signed, can be negative • MIPS has other load/store for byte/halfword
ECE437,S'21 © Vijaykumar and Thottethodi (16) 1/27/2021
4
Memory and load/store
$0 $1
$31
ALU
ECE437,S'21 © Vijaykumar and Thottethodi
0 4 8 0xC
0x4004 f 0x4008 g
0x8000 A[0] 0x8004 A[1] 0x8008 A[2]
Maxmem
(17) 1/27/2021
Endian
• Word storage
• Word = bytes (0x400,0x401,0x402,0x403)? • Word = bytes (0x403,0x402,0x401,0x400)?
– It depends...
• Big endian: MSB at address xxxxxx00
– e.g., IBM, SPARC
• Little endian: MSB at address xxxxxx11
– e.g., Intel x86
• Mode selectable
– e.g., PowerPC, MIPS
ECE437,S'21 © Vijaykumar and Thottethodi (18) 1/27/2021
Branches and Jumps
While ( i != j) { j= j + i;
i= i + 1; }
• HLL->Assembly,$8isi,$9isj
Loop: beq $8, $9, Exit
// note that the condition is reversed
add $9, $9, $8 addi $8, $8 , 1 j Loop
Exit:
ECE437,S’21 © Vijaykumar and Thottethodi (19) 1/27/2021
Branches and Jumps
• Better yet:
beq $8, $9, Exit
Loop: add $9, $9, $8 addi $8, $8 , 1
bne $8, $9, Loop Exit:
// not !=
• Let compilers worry about such optimizations
ECE437,S’21 © Vijaykumar and Thottethodi (20) 1/27/2021
5
Branches and Jumps
• What does bne do really?
– read $8; read $9, compare
– setPC=PC+4orPC=Target
• To do compares other than “equal” or “not equal”
– e.g., blt $8, $9, Target // assembler PSEUDO-
• expands to
– slt $1, $8, $9 // set less than $1 = ($8 < $9)?1:0 – bne $1, $0, Target // $0 is always 0
ECE437,S'21 © Vijaykumar and Thottethodi (21) 1/27/2021
instruction
Branches and Jumps
• Other MIPS branches/jumps
– beq $8, $9, imm
– //if($8==$9)PC=PC+imm<<2elsePC+=4
• bne ...
• slt, sle, sgt, sge
– with immediate, unsigned
• j addr •jr $12 • jal addr
// PC = addr //PC=$12
// $31 = PC+4; PC = addr; used for procedure calls
ECE437,S'21 © Vijaykumar and Thottethodi (22) 1/27/2021
Two-minute quiz
Fill in the blanks
• Abstraction separates __________ from ___________
True or False:
– A computer’s instruction set typically decides the subset of programs that the computer can run and the instruction set has to be expanded to run other programs
ECE437,S'21 © Vijaykumar and Thottethodi (23) 1/27/2021
Assembly Exercise
for(i=0;i
a = a+b else
b = a+b
ECE437,S’21 © Vijaykumar and Thottethodi
(25) 1/27/2021
Assembly Exercise
• HLL-> assembly if (a=b)
a = a+b else
lw$8,a //a->$8 lw$9,b //b->$9 beq $8,$9,L_IF
j L_ELSE
L_IF:
add $8,$8,$9 // a = a+b sw $8,a // write a to mem j EXIT
b = a+b
L_ELSE:
add $9,$8,$9 // b= a+b
sw $9, b // write b to mem EXIT:
ECE437,S’21 © Vijaykumar and Thottethodi (26) 1/27/2021
Procedure Calls • See section 2.7 for more details
– save registers
– set up parameters – call procedure
– get results
– restore registers
Proc: save more registers do function
set up results
restore more registers return
• jal (call) is the ONLY operation in hardware: the rest is in software
• jal addr // $31 = PC+4 (return addr); PC = addr (UNLIKE 362 where return addr is saved on stack – too complicated/slow to do in fast clock)
ECE437,S’21 © Vijaykumar and Thottethodi (27) 1/27/2021
Stack
• An important data structure for calls
• Stack grows from larger to smaller addresses • $29 is the stack pointer, points to top of stack
• push $2:
– addi $29, $29, -4 – sw $2, 4($29)
pop $2:
lw $2, 4($29) addi $29, $29, 4
• push, pop NOT real instructions
• Addi-sw/lw-addi order cannot change. Why?
– sw $2, 0($29)
– addi $29, $29, -4
ECE437,S’21 © Vijaykumar and Thottethodi (28) 1/27/2021
7
Procedure Example
• HLL code for swap
{ temp = v[k]; /* code for swap */
v[k] = v[k+1]; v[k+1] = temp; return (temp); }
• Corresponding assembly code
swap: addi $29, $29, -8 // save registers
sw $15, 4($29) sw $16, 8($29)
ECE437,S’21 © Vijaykumar and Thottethodi (29) 1/27/2021
Procedure Example
muli* add lw
lw sw sw mov* lw
ECE437,S’21 © Vijaykumar and Thottethodi
$2, $5, 4 $2, $4, $2 $15, 0($2) $16, 4($2) $16, 0($2) $15, 4($2)
$2, $15 $15, 4($29) $16, 8($29) $29, $29, 8
// procedure body; $2 <- k*4
// $2 <- v + k*4 == address of v[k]
// $15 <- v[k] // $16 <- v[k+1] // v[k] <- $16 // v[k+1] <- $15
// return value
// restore registers
// return
lw
addi
jr
• Allpush/popins/w
$31
(30) 1/27/2021
Procedure Call Example
• Now, calling the procedure – HLL code
return_value = swap(arr_A, i); /* call swap */
mov $4, $18 mov $5, $17 jal swap mov $20, $2
ECE437,S'21 © Vijaykumar and Thottethodi
// first parameter // second parameter
// copy return value
(31) 1/27/2021
Layers of Software
• Notation - program: input data -> output data – executable: input data -> output data
– loader: executable file -> executable in memory
– linker: object files -> executable file
– assembler: assembly file -> object file – compiler: HLL file -> assembly file
– editor: editor commands -> HLL file
• Possible only because programs can be manipulated as data
– “stored program computer”
ECE437,S’21 © Vijaykumar and Thottethodi (32) 1/27/2021
8
MIPS Machine Language
• All 32-bit instructions
• Assembly: add $1, $2, $3
– 14 ASCII (or Unicode) codes 6164642024312C202432 2C202433(hex)
• Machine code: 1 word
000000 00010 00011 00001 00000 010000
ECE437,S’21 © Vijaykumar and Thottethodi (33) 1/27/2021
alu-rr
2
3
1
zero
add/signed
Instruction Format
• R-format:
opcode rs rt rd shamt funct
655556
• Digression:
– Howdoyoustorethenumber4,392,976? – Sameasadd$1,$2,$3
• Storedprogram:instructionsarerepresentedas numbers
– programscanberead/writteninmemorylikenumbers
• Other R-format: addu, sub*, and, or ….. etc
ECE437,S’21 © Vijaykumar and Thottethodi (34) 1/27/2021
Instruction Format
• Assembly: lw $1, 100($2) • Machine:
100011 00010 00001 0000000001100100
lw 2 1 100 (in signed binary)
• I-format:
opcode rs rt address/immediate
– Addr/immediate is 16 bits signed 2’s complement
ECE437,S’21 © Vijaykumar and Thottethodi (35) 1/27/2021
6
5
5
16 (signed)
Instruction Format
• I-format also used for ALU ops with immediates
– addi $1, $2, -4
– 001000 00010 00001 1111111111111100
• What about constants larger than 16 bits = [-
– e.g., 0000 0000 0000 1100 0000 0000 0000 1111 or 0x000C000F?
lui $4, 0xC // $4 <- 0x000C0000
ori $4,$4, 0xF // $4 <- 0x000C000F
• All loads and stores use I-format
ECE437,S'21 © Vijaykumar and Thottethodi (36) 1/27/2021
32768, 32767]?
9
Instruction Format • I-format for branches: beq $1, $2, 7
• 000100 00001 00010 0000 0000 0000 0111 • PC=PC+(00000111<<2) //wordoffset Finally, J-format: opcode addr
6 26
• addr is weird in J-format:
– Target address = 4 MSB of PC:addr:00 (4+26+2 = 32) – Why MSB of PC? We have 26, we need 30
– Can we Pad with 0s? Restricts jumps to top
16th of address space
ECE437,S'21 © Vijaykumar and Thottethodi (37) 1/27/2021
MIPS: branch vs jump
• Branch: 16 bit displacement – Current word +/- 215
– Range is symmetric
• Range(forward) = Range(backward) = 215
0xE000 0000
• Jump: 26 lower-bit concatenation in word address
– Jump anywhere within a 226 segment – Range may be asymmetric
• Range(forward) + Range(backward) = 226
ECE437,S'21 © Vijaykumar and Thottethodi (38)
215
215
226-X 0xEFFF FFFF
1/27/2021
X
Summary: Instruction Formats • R-format: opcode rs rt rd shamt funct
• I-format: opcode rs rt address/immediate • J-format: opcode addr
• Instr decode
– Theory: Inst bits -> identify instrs -> control signals
– Practice: Instruction bits -> control signals
ECE437,S’21 © Vijaykumar and Thottethodi (39) 1/27/2021
6
5
5
5
5
6
6
5
5
16
6
26
Addressing modes
• There are many ways of accessing data • 1. Register addressing
– add $1, $2, $3
5
ECE437,S’21 © Vijaykumar and Thottethodi (40) 1/27/2021
opcode
rs
rt
rd
Register file
shamt
funct
10
Addressing Modes
• 2. Base addressing (aka displacement)
– lw $1, 100($2) // $2 == 400, M[500] == 42
ECE437,S’21 © Vijaykumar and Thottethodi (41) 1/27/2021
opcode
rs
rt
address (= 100)
5
Register file 400
Memory 42
Addressing Modes
• 3. Immediate addressing – addi $1, $2, 100
ECE437,S’21 © Vijaykumar and Thottethodi (42) 1/27/2021
opcode
rs
rt
immediate
Addressing Modes
• 4. PC relative addressing
– beq$1,$2,100//if($1==$2)PC=PC+100
ECE437,S’21 © Vijaykumar and Thottethodi (43) 1/27/2021
opcode
rs
rt
displacement (= 100)
Memory
PC
Addressing Modes
• Pseudodirect addressing
– j Loop // instruction contains address*
ECE437,S’21 © Vijaykumar and Thottethodi (44) 1/27/2021
opcode
PC
address
Memory
11
Addressing Modes
NOT found in MIPS
• 5. Indexed: add two registers – base + index
• 6. Indirect: M[M[addr]] – two memory references
• 7. Autoincrement/decrement: add data size
• 8. Autoupdate – found in IBM PowerPC, HP PA-RISC
– like displacement but update register
ECE437,S’21 © Vijaykumar and Thottethodi (45) 1/27/2021
Addressing Modes
• Autoupdate
lwupdate $1, 4[$2] // $1 == M[4+$2]; $2 == $2 + 4
ECE437,S’21 © Vijaykumar and Thottethodi (46) 1/27/2021
opcode
rs
5
rt
Register file
address
Memory
Delay
Addressing Modes
for (i = 0, I < N, i +=1) sum += A[i];
• $7-sum,$8-addressofa[i],$2–tmp
inner:
lw $2, 0($8) addi $8, $8, 4 add $7, $7, $2
new inner:
lwupdate $2, 4($8)
add $7, $7, $2 • any problems with new inner ?
ECE437,S'21 © Vijaykumar and Thottethodi (47) 1/27/2021
How to Choose ISA
• Minimize what?
– Two way constraint satisfaction
– What can be implemented in hardware? From below
– What ISA is better for optimized compilation? From above
• In 1985-1990 technology, simple modes like MIPS good – Easy to implement, not a real reason in the business world – Very compelling reason in the classroom
• As technology changes, computer design options change
• For small memory, dense instructions important and dense usually complex/irregular
• For high speed, pipelining important and for pipelining simple/regular important
ECE437,S'21 © Vijaykumar and Thottethodi (48) 1/27/2021
12
Intel 80386 ISA
• 8 32-bit registers
• Two register machine: src1/dst src2
– reg-reg,reg-immed,reg-mem,mem-reg,mem - imm
• seven addressing modes
• 16-bit and 32-bit ops on memory and registers
– 8-bit prefix to override default data size – condition codes
• part of normal operation, extends operation time, often not looked at
ECE437,S'21 © Vijaykumar and Thottethodi (49) 1/27/2021
Intel 80386 ISA
• Decoding nightmare
– instructions 1 to 17 bytes
– prefixes, postfixes
– crazy “formats” - e.g., register specifiers move around
– but key instructions not terrible
– yet have to make ALL work correctly
ECE437,S'21 © Vijaykumar and Thottethodi (50) 1/27/2021
One-minute quiz
• In stored-program computers, what is the difference between how code and data are handled (i.e., read from/ written to memory and manipulated)?
• Previous quiz answers
– Abstraction separates the interface (OR the what) from the implementation (OR the how)
– A computer’s instruction set typically decides the subset of programs that the computer can run and the instruction set has to be expanded to run other programs.
• False
ECE437,S'21 © Vijaykumar and Thottethodi (51) 1/27/2021
Current Approach
• Current Intel chips
– Instruction decode logic translates into “RISCy Ops”
– RISC – Reduced Instruction Set Computer
– Execution unit runs RISCy ops
– Backward compatibility
– Complex decoding
– Execution unit as fast as RISC
• We work with MIPS to keep it simple • Learn x86 on the job!
ECE437,S'21 © Vijaykumar and Thottethodi (52) 1/27/2021
13
Complex Instructions
• More powerful instructions not necessarily faster execution
• E.g. - string copy
– Option 1: move with repeat prefix for memory to memory move
– special-purpose
– Option 2: use loads into register and then stores
– generic instructions
– Option 2 faster on the same machine!
ECE437,S'21 © Vijaykumar and Thottethodi (53) 1/27/2021
•
–
–
Outline
So far, too MIPS-centric Rich design space:
Other possibilities: Stack and Accumulator architectures :Ch 2 (on CD)
ECE437,S'21 © Vijaykumar and Thottethodi (54) 1/27/2021
General Purpose Architecture
• General Purpose Register architecture – 3 operands: 2 source + 1 dest
– Depends on addressing modes
• Register-register only
– MIPS : all register operands except load/store
• Memory-memory (all memory operands) • Eg
– f = (g+h) *(i-j)
ECE437,S'21 © Vijaykumar and Thottethodi (55) 1/27/2021
Add t1, g, h Sub t2, i, j Mul f, t1, t2
Stack Machine
• Push, Pop with one address – Last in, first out
• Binary/Unary Op operates on
– Implicit operands
– Top one/two element(s) of stack
• Eg
– f = (g+h) *(i-j)
ECE437,S'21 © Vijaykumar and Thottethodi (56)
1/27/2021
Push g Push h Add Push i Push j Subtract Mul
14
Accumulator Machine
• One address
– 2nd implicit operand: accumulator
– accumulatoris2nd source operand as well as destination
• Eg
– f = (g+h) *(i-j)
ECE437,S'21 © Vijaykumar and Thottethodi
(57) 1/27/2021
Loadg //Acc=g Add h // Acc = Acc+h Storet //t=g+h Loadi //Acc=i
Subt j // Acc = Acc-j Mul t1 // Acc = Acc*t Store f
Concluding Remarks
• Simple and regular
– same length instructions, fields in same place
• Small and fast
– Small number of registers
• Compromises inevitable
– There is NO PERFECT ISA!
– Pipelining should not be hindered (more later)
• Common case fast
• Read Chapter 2
• Read Chapter 4a 4.1-4.4 (coming up next)
– Ch 4a+b is 1/3rds of this course, lab
ECE437,S'21 © Vijaykumar and Thottethodi (58) 1/27/2021
15