CS计算机代考程序代写 assembler x86 compiler mips assembly data structure ECE437: Introduction to Digital Computer Design

ECE437: Introduction to Digital Computer Design
Chapter 2 – Instructions Spring 2021
Announcements
• Lab1 is on basic 270/337 material and is not covered in 437 lectures
• Homeworks given out every Friday, due the next Friday at the start of the lecture
• Slidesandhomeworkspostedonthecoursewebsite – Slides will be posted before each lecture so you
can print before the lecture for taking notes
• Please speak your questions/answers – Unmute, speak, mute (Chat is no good)
ECE437,S’21 © Vijaykumar and Thottethodi (2) 1/27/2021
Instructions
• Instructions are the “words” of a computer
• Instruction set architecture (ISA) is its “vocabulary”
• With a few other things, this defines the interface of computers
– But implementations vary
• We will study MIPSTM ISA
– but the next ISA is not too hard after the first
• We won’t write much assembly code – ECE362
ECE437,S’21 © Vijaykumar and Thottethodi (3) 1/27/2021
Computer • What does it do?
– Fetch instruction from address in Program Counter (PC)
– Execute instruction
– Update PC to point to next instruction
ECE437,S’21 © Vijaykumar and Thottethodi
Fetch [PC]
Execute
(4)
Update PC
1/27/2021
1

Outline
– Registers and ALU ops – Memory and load/store – Branches and jumps
– And more . . .
• Read Chapter 2 and skim through Appendix E (on CD) for MIPS ISA
ECE437,S’21 © Vijaykumar and Thottethodi (5) 1/27/2021
• Instructions – Basics
Basics
• C statement
• f=(g+h)-(i+j)
• MIPS instructions*
addt0,g,h //t0=g+h addt1,i,j // t1=i+j subf,t0,t1 // f =t0–t1
• Opcodes/mnemonic, operands, source/destination
ECE437,S’21 © Vijaykumar and Thottethodi (6) 1/27/2021
Basics
• Opcode: Specifies the kind of operation (mnemonic)
• Operands: input and output data (source/destination)
• Operands t0, t1 are temps
• One operation, two inputs, one output
• Multiple instructions for one C statement
ECE437,S’21 © Vijaykumar and Thottethodi (7) 1/27/2021
Why not have bigger instructions?
• Whynothave“f=(g+h)-(i+j)”asoneinstruction?
• Church-Turingthesis:Averyprimitivecomputercan
compute anything that a fancy computer can compute – need only logical functions, read and write memory and data-
dependent decisions
• Therefore,ISAselectedforpracticalreasons: performance and cost, not computability
• Simplicity/regularitytendtoimproveboth
– E.g.,H/Wtohandlearbitrarynumberofoperands
– complex, slow and rarely usable and NOT NECESSARY
ECE437,S’21 © Vijaykumar and Thottethodi (8) 1/27/2021
2

Registers and ALU ops
• Ok,Ilied!
• Operandsmustberegisters,notvariables
– add $8, $17, $18 // $8 <- $17 + $18 – destination LEFT MOST – if you confuse this, you won’t get anything right in 437 – add$9,$19,$20 – sub$16,$8,$9 • MIPShas32registers$0-$31(figurenextslide) – $0always0(writingto$0hasnoeffect–anoop(NOP)) • Registers$8,$9aretemps,$16-f,$17-g,$18-h, • MIPSalsoallowsoneconstantcalled“immediate” – laterwewillseeimmediateis16bits[-32768,32767] ECE437,S'21 © Vijaykumar and Thottethodi (9) 1/27/2021 $19 - i, $20 – j Registers and ALU $0 $1 $31 ALU ECE437,S'21 © Vijaykumar and Thottethodi (10) 1/27/2021 ALU ops • Some ALU ops: – add, addi, addu, addiu (immediate, unsigned) • Caution: casting signed to unsigned in System Verilog may cause weirdness – sub... – mul, div – and, andi – or, ori – sll,srl,... - weird! • Why registers? Specifiers fit in instructions, smaller memory is faster • Are registers enough? ECE437,S'21 © Vijaykumar and Thottethodi (11) 1/27/2021 Memory and load/store • But need more than 32 words of storage • Memory - an array of locations M[addr], indexed by addr (figure next slide) • Data movement (on words or integers) – load word for reg <-- memory – lw $17, 1002 // get input g – store word for reg --> memory
– sw $16, 1001 // save output f
• Note: destination LAST for stores!
ECE437,S’21 © Vijaykumar and Thottethodi (12) 1/27/2021
3

Memory and load/store
$0 $1
$31
ALU
ECE437,S’21 © Vijaykumar and Thottethodi
0 1 2 3
1001 f 1002 g
Maxmem
(13) 1/27/2021
Byte vs. Word addresses
• I lied again!
– we need to address bytes for characters
– words take 4 bytes
– therefore, word addresses must be multiples of 4 (i.e., end in 00)
– Addresses are ALWAYS byte addresses (next slide)
• NOT word addresses (previous slide) • figure next slide
ECE437,S’21 © Vijaykumar and Thottethodi (14) 1/27/2021
Memory and load/store
$0 $1
$31
ALU
ECE437,S’21 © Vijaykumar and Thottethodi
0 4 8 0xC
0x4004 f 0x4008 g
Maxmem
(15) 1/27/2021
Memory and load/store
• Important for arrays
– A[i] = A[i] + h (figure next slide)
– # $8 – temp, $18 – h, $21 – (i x 4)
– Astart is 0x8000, A is an array of words NOT bytes
lw $8, Astart($21) // == 0x8000($21)
// Astart or 0x8000 is offset
add $8, $18, $8
// $8 <- M[Astart+$21] // or $8 <- M[0x8000+$21] sw $8, Astart($21) //offset is signed, can be negative • MIPS has other load/store for byte/halfword ECE437,S'21 © Vijaykumar and Thottethodi (16) 1/27/2021 4 Memory and load/store $0 $1 $31 ALU ECE437,S'21 © Vijaykumar and Thottethodi 0 4 8 0xC 0x4004 f 0x4008 g 0x8000 A[0] 0x8004 A[1] 0x8008 A[2] Maxmem (17) 1/27/2021 Endian • Word storage • Word = bytes (0x400,0x401,0x402,0x403)? • Word = bytes (0x403,0x402,0x401,0x400)? – It depends... • Big endian: MSB at address xxxxxx00 – e.g., IBM, SPARC • Little endian: MSB at address xxxxxx11 – e.g., Intel x86 • Mode selectable – e.g., PowerPC, MIPS ECE437,S'21 © Vijaykumar and Thottethodi (18) 1/27/2021 Branches and Jumps While ( i != j) { j= j + i; i= i + 1; } • HLL->Assembly,$8isi,$9isj
Loop: beq $8, $9, Exit
// note that the condition is reversed
add $9, $9, $8 addi $8, $8 , 1 j Loop
Exit:
ECE437,S’21 © Vijaykumar and Thottethodi (19) 1/27/2021
Branches and Jumps
• Better yet:
beq $8, $9, Exit
Loop: add $9, $9, $8 addi $8, $8 , 1
bne $8, $9, Loop Exit:
// not !=
• Let compilers worry about such optimizations
ECE437,S’21 © Vijaykumar and Thottethodi (20) 1/27/2021
5

Branches and Jumps
• What does bne do really?
– read $8; read $9, compare
– setPC=PC+4orPC=Target
• To do compares other than “equal” or “not equal”
– e.g., blt $8, $9, Target // assembler PSEUDO-
• expands to
– slt $1, $8, $9 // set less than $1 = ($8 < $9)?1:0 – bne $1, $0, Target // $0 is always 0 ECE437,S'21 © Vijaykumar and Thottethodi (21) 1/27/2021 instruction Branches and Jumps • Other MIPS branches/jumps – beq $8, $9, imm – //if($8==$9)PC=PC+imm<<2elsePC+=4 • bne ... • slt, sle, sgt, sge – with immediate, unsigned • j addr •jr $12 • jal addr // PC = addr //PC=$12 // $31 = PC+4; PC = addr; used for procedure calls ECE437,S'21 © Vijaykumar and Thottethodi (22) 1/27/2021 Two-minute quiz Fill in the blanks • Abstraction separates __________ from ___________ True or False: – A computer’s instruction set typically decides the subset of programs that the computer can run and the instruction set has to be expanded to run other programs ECE437,S'21 © Vijaykumar and Thottethodi (23) 1/27/2021 Assembly Exercise for(i=0;i assembly if (a=b)
a = a+b else
b = a+b
ECE437,S’21 © Vijaykumar and Thottethodi
(25) 1/27/2021
Assembly Exercise
• HLL-> assembly if (a=b)
a = a+b else
lw$8,a //a->$8 lw$9,b //b->$9 beq $8,$9,L_IF
j L_ELSE
L_IF:
add $8,$8,$9 // a = a+b sw $8,a // write a to mem j EXIT
b = a+b
L_ELSE:
add $9,$8,$9 // b= a+b
sw $9, b // write b to mem EXIT:
ECE437,S’21 © Vijaykumar and Thottethodi (26) 1/27/2021
Procedure Calls • See section 2.7 for more details
– save registers
– set up parameters – call procedure
– get results
– restore registers
Proc: save more registers do function
set up results
restore more registers return
• jal (call) is the ONLY operation in hardware: the rest is in software
• jal addr // $31 = PC+4 (return addr); PC = addr (UNLIKE 362 where return addr is saved on stack – too complicated/slow to do in fast clock)
ECE437,S’21 © Vijaykumar and Thottethodi (27) 1/27/2021
Stack
• An important data structure for calls
• Stack grows from larger to smaller addresses • $29 is the stack pointer, points to top of stack
• push $2:
– addi $29, $29, -4 – sw $2, 4($29)
pop $2:
lw $2, 4($29) addi $29, $29, 4
• push, pop NOT real instructions
• Addi-sw/lw-addi order cannot change. Why?
– sw $2, 0($29)
– addi $29, $29, -4
ECE437,S’21 © Vijaykumar and Thottethodi (28) 1/27/2021
7

Procedure Example
• HLL code for swap
{ temp = v[k]; /* code for swap */
v[k] = v[k+1]; v[k+1] = temp; return (temp); }
• Corresponding assembly code
swap: addi $29, $29, -8 // save registers
sw $15, 4($29) sw $16, 8($29)
ECE437,S’21 © Vijaykumar and Thottethodi (29) 1/27/2021
Procedure Example
muli* add lw
lw sw sw mov* lw
ECE437,S’21 © Vijaykumar and Thottethodi
$2, $5, 4 $2, $4, $2 $15, 0($2) $16, 4($2) $16, 0($2) $15, 4($2)
$2, $15 $15, 4($29) $16, 8($29) $29, $29, 8
// procedure body; $2 <- k*4 // $2 <- v + k*4 == address of v[k] // $15 <- v[k] // $16 <- v[k+1] // v[k] <- $16 // v[k+1] <- $15 // return value // restore registers // return lw addi jr • Allpush/popins/w $31 (30) 1/27/2021 Procedure Call Example • Now, calling the procedure – HLL code return_value = swap(arr_A, i); /* call swap */ mov $4, $18 mov $5, $17 jal swap mov $20, $2 ECE437,S'21 © Vijaykumar and Thottethodi // first parameter // second parameter // copy return value (31) 1/27/2021 Layers of Software • Notation - program: input data -> output data – executable: input data -> output data
– loader: executable file -> executable in memory
– linker: object files -> executable file
– assembler: assembly file -> object file – compiler: HLL file -> assembly file
– editor: editor commands -> HLL file
• Possible only because programs can be manipulated as data
– “stored program computer”
ECE437,S’21 © Vijaykumar and Thottethodi (32) 1/27/2021
8

MIPS Machine Language
• All 32-bit instructions
• Assembly: add $1, $2, $3
– 14 ASCII (or Unicode) codes 6164642024312C202432 2C202433(hex)
• Machine code: 1 word
000000 00010 00011 00001 00000 010000
ECE437,S’21 © Vijaykumar and Thottethodi (33) 1/27/2021
alu-rr
2
3
1
zero
add/signed
Instruction Format
• R-format:
opcode rs rt rd shamt funct
655556
• Digression:
– Howdoyoustorethenumber4,392,976? – Sameasadd$1,$2,$3
• Storedprogram:instructionsarerepresentedas numbers
– programscanberead/writteninmemorylikenumbers
• Other R-format: addu, sub*, and, or ….. etc
ECE437,S’21 © Vijaykumar and Thottethodi (34) 1/27/2021
Instruction Format
• Assembly: lw $1, 100($2) • Machine:
100011 00010 00001 0000000001100100
lw 2 1 100 (in signed binary)
• I-format:
opcode rs rt address/immediate
– Addr/immediate is 16 bits signed 2’s complement
ECE437,S’21 © Vijaykumar and Thottethodi (35) 1/27/2021
6
5
5
16 (signed)
Instruction Format
• I-format also used for ALU ops with immediates
– addi $1, $2, -4
– 001000 00010 00001 1111111111111100
• What about constants larger than 16 bits = [-
– e.g., 0000 0000 0000 1100 0000 0000 0000 1111 or 0x000C000F?
lui $4, 0xC // $4 <- 0x000C0000 ori $4,$4, 0xF // $4 <- 0x000C000F • All loads and stores use I-format ECE437,S'21 © Vijaykumar and Thottethodi (36) 1/27/2021 32768, 32767]? 9 Instruction Format • I-format for branches: beq $1, $2, 7 • 000100 00001 00010 0000 0000 0000 0111 • PC=PC+(00000111<<2) //wordoffset Finally, J-format: opcode addr 6 26 • addr is weird in J-format: – Target address = 4 MSB of PC:addr:00 (4+26+2 = 32) – Why MSB of PC? We have 26, we need 30 – Can we Pad with 0s? Restricts jumps to top 16th of address space ECE437,S'21 © Vijaykumar and Thottethodi (37) 1/27/2021 MIPS: branch vs jump • Branch: 16 bit displacement – Current word +/- 215 – Range is symmetric • Range(forward) = Range(backward) = 215 0xE000 0000 • Jump: 26 lower-bit concatenation in word address – Jump anywhere within a 226 segment – Range may be asymmetric • Range(forward) + Range(backward) = 226 ECE437,S'21 © Vijaykumar and Thottethodi (38) 215 215 226-X 0xEFFF FFFF 1/27/2021 X Summary: Instruction Formats • R-format: opcode rs rt rd shamt funct • I-format: opcode rs rt address/immediate • J-format: opcode addr • Instr decode – Theory: Inst bits -> identify instrs -> control signals
– Practice: Instruction bits -> control signals
ECE437,S’21 © Vijaykumar and Thottethodi (39) 1/27/2021
6
5
5
5
5
6
6
5
5
16
6
26
Addressing modes
• There are many ways of accessing data • 1. Register addressing
– add $1, $2, $3
5
ECE437,S’21 © Vijaykumar and Thottethodi (40) 1/27/2021
opcode
rs
rt
rd
Register file
shamt
funct
10

Addressing Modes
• 2. Base addressing (aka displacement)
– lw $1, 100($2) // $2 == 400, M[500] == 42
ECE437,S’21 © Vijaykumar and Thottethodi (41) 1/27/2021
opcode
rs
rt
address (= 100)
5
Register file 400
Memory 42
Addressing Modes
• 3. Immediate addressing – addi $1, $2, 100
ECE437,S’21 © Vijaykumar and Thottethodi (42) 1/27/2021
opcode
rs
rt
immediate
Addressing Modes
• 4. PC relative addressing
– beq$1,$2,100//if($1==$2)PC=PC+100
ECE437,S’21 © Vijaykumar and Thottethodi (43) 1/27/2021
opcode
rs
rt
displacement (= 100)
Memory
PC
Addressing Modes
• Pseudodirect addressing
– j Loop // instruction contains address*
ECE437,S’21 © Vijaykumar and Thottethodi (44) 1/27/2021
opcode
PC
address
Memory
11

Addressing Modes
NOT found in MIPS
• 5. Indexed: add two registers – base + index
• 6. Indirect: M[M[addr]] – two memory references
• 7. Autoincrement/decrement: add data size
• 8. Autoupdate – found in IBM PowerPC, HP PA-RISC
– like displacement but update register
ECE437,S’21 © Vijaykumar and Thottethodi (45) 1/27/2021
Addressing Modes
• Autoupdate
lwupdate $1, 4[$2] // $1 == M[4+$2]; $2 == $2 + 4
ECE437,S’21 © Vijaykumar and Thottethodi (46) 1/27/2021
opcode
rs
5
rt
Register file
address
Memory
Delay
Addressing Modes
for (i = 0, I < N, i +=1) sum += A[i]; • $7-sum,$8-addressofa[i],$2–tmp inner: lw $2, 0($8) addi $8, $8, 4 add $7, $7, $2 new inner: lwupdate $2, 4($8) add $7, $7, $2 • any problems with new inner ? ECE437,S'21 © Vijaykumar and Thottethodi (47) 1/27/2021 How to Choose ISA • Minimize what? – Two way constraint satisfaction – What can be implemented in hardware? From below – What ISA is better for optimized compilation? From above • In 1985-1990 technology, simple modes like MIPS good – Easy to implement, not a real reason in the business world – Very compelling reason in the classroom • As technology changes, computer design options change • For small memory, dense instructions important and dense usually complex/irregular • For high speed, pipelining important and for pipelining simple/regular important ECE437,S'21 © Vijaykumar and Thottethodi (48) 1/27/2021 12 Intel 80386 ISA • 8 32-bit registers • Two register machine: src1/dst src2 – reg-reg,reg-immed,reg-mem,mem-reg,mem - imm • seven addressing modes • 16-bit and 32-bit ops on memory and registers – 8-bit prefix to override default data size – condition codes • part of normal operation, extends operation time, often not looked at ECE437,S'21 © Vijaykumar and Thottethodi (49) 1/27/2021 Intel 80386 ISA • Decoding nightmare – instructions 1 to 17 bytes – prefixes, postfixes – crazy “formats” - e.g., register specifiers move around – but key instructions not terrible – yet have to make ALL work correctly ECE437,S'21 © Vijaykumar and Thottethodi (50) 1/27/2021 One-minute quiz • In stored-program computers, what is the difference between how code and data are handled (i.e., read from/ written to memory and manipulated)? • Previous quiz answers – Abstraction separates the interface (OR the what) from the implementation (OR the how) – A computer’s instruction set typically decides the subset of programs that the computer can run and the instruction set has to be expanded to run other programs. • False ECE437,S'21 © Vijaykumar and Thottethodi (51) 1/27/2021 Current Approach • Current Intel chips – Instruction decode logic translates into “RISCy Ops” – RISC – Reduced Instruction Set Computer – Execution unit runs RISCy ops – Backward compatibility – Complex decoding – Execution unit as fast as RISC • We work with MIPS to keep it simple • Learn x86 on the job! ECE437,S'21 © Vijaykumar and Thottethodi (52) 1/27/2021 13 Complex Instructions • More powerful instructions not necessarily faster execution • E.g. - string copy – Option 1: move with repeat prefix for memory to memory move – special-purpose – Option 2: use loads into register and then stores – generic instructions – Option 2 faster on the same machine! ECE437,S'21 © Vijaykumar and Thottethodi (53) 1/27/2021 • – – Outline So far, too MIPS-centric Rich design space: Other possibilities: Stack and Accumulator architectures :Ch 2 (on CD) ECE437,S'21 © Vijaykumar and Thottethodi (54) 1/27/2021 General Purpose Architecture • General Purpose Register architecture – 3 operands: 2 source + 1 dest – Depends on addressing modes • Register-register only – MIPS : all register operands except load/store • Memory-memory (all memory operands) • Eg – f = (g+h) *(i-j) ECE437,S'21 © Vijaykumar and Thottethodi (55) 1/27/2021 Add t1, g, h Sub t2, i, j Mul f, t1, t2 Stack Machine • Push, Pop with one address – Last in, first out • Binary/Unary Op operates on – Implicit operands – Top one/two element(s) of stack • Eg – f = (g+h) *(i-j) ECE437,S'21 © Vijaykumar and Thottethodi (56) 1/27/2021 Push g Push h Add Push i Push j Subtract Mul 14 Accumulator Machine • One address – 2nd implicit operand: accumulator – accumulatoris2nd source operand as well as destination • Eg – f = (g+h) *(i-j) ECE437,S'21 © Vijaykumar and Thottethodi (57) 1/27/2021 Loadg //Acc=g Add h // Acc = Acc+h Storet //t=g+h Loadi //Acc=i Subt j // Acc = Acc-j Mul t1 // Acc = Acc*t Store f Concluding Remarks • Simple and regular – same length instructions, fields in same place • Small and fast – Small number of registers • Compromises inevitable – There is NO PERFECT ISA! – Pipelining should not be hindered (more later) • Common case fast • Read Chapter 2 • Read Chapter 4a 4.1-4.4 (coming up next) – Ch 4a+b is 1/3rds of this course, lab ECE437,S'21 © Vijaykumar and Thottethodi (58) 1/27/2021 15