CS作业代写 CIS 501 | Dr. | ISAs & Single Cycle 1

Computer Organization and Design
Unit 4: Single-Cycle Datapath
Based on slides by Profs. , C.J. Taylor, &
CIS 501 | Dr. | ISAs & Single Cycle 1

Copyright By PowCoder代写 加微信 powcoder

This Unit: Single-Cycle Datapath
System software
• Overview of ISAs
• Datapath storage elements • MIPS Datapath
• MIPS Control
CIS 501 | Dr. | ISAs & Single Cycle 2

• Sections 4.1 – 4.4
CIS 501 | Dr. | ISAs & Single Cycle 3

Recall from CIS240…
CIS 501 | Dr. | ISAs & Single Cycle 4

240 Review: ISA
System software
• App/OS are software … execute on hardware
• HW/SW interface is ISA (instruction set architecture) • A “contract” between SW and HW
• Encourages compatibility, allows SW/HW to evolve independently • FunctionaldefinitionofHWstoragelocations&operations
• Storage locations: registers, memory
• Operations: add, multiply, branch, load, store, etc. • Precisedescriptionofhowtoinvoke&accessthem
• Instructions (bit-patterns hardware interprets as commands)
CIS 501 | Dr. | ISAs & Single Cycle 8

240 Review: LC4 ISA
array .BLKW #100
sum .FILL #0
CONST R5, #0
LEA R1, array
LEA R2, sum
array_sum_loop
LDR R3, R1, #0
LDR R4, R2, #0
STR R4, R2, #0
ADD R5, R5, #1
CMPI R5, #100
System software
• LC4:atoyISAyouknow
• 16-bit ISA (what does this mean?) • 16-bit insns
• 8 registers (integer)
• ~30 different insns
• Simple OS support
• Assemblylanguage
• Human-readable ISA representation
CIS 501 | Dr. | ISAs & Single Cycle
array_sum_loop 9
ADD R4, R3, R4
ADD R1, R1, #1

371/501 Preview: A Real ISA
array: .space 100
sum: .word 0
array_sum:
la $1, array
la $2, sum
array_sum_loop:
lw $3, 0($1)
lw $4, 0($2)
sw $4, 0($2)
addi $5, $5, 1
li $6, 100
blt $5, $6, array_sum_loop
System software
• MIPS:exampleofrealISA • 32/64-bit operations
• 32-bit insns
• 64 registers
• 32 integer, 32 floating point • ~100 different insns
• Full OS support
add $4, $3, $4
addi $1, $1, 1
Example code is MIPS, but
all ISAs are similar at some level
CIS 501 | Dr. | ISAs & Single Cycle 10

240 Review: Assembly Language
Machine codeAssembly code
x9A00 CONST R5, #0
x9200 CONST R1, array
xD320 HICONST R1, array
x9464 CONST R2, sum
xD520 HICONST R2, sum
x6640 LDR R3, R1, #0
x6880 LDR R4, R2, #0
x7880 STR R4, R2, #0
x1BA1 ADD R5, R5, #1
x2B64 CMPI R5, #100
x03F8 BRn array_sum_loop
System software
• Assemblylanguage
• Human-readable representation
• Machinelanguage
• Machine-readable representation
• 1s and 0s (often displayed in “hex”)
• Assembler
• Translates assembly to machine
ADD R4, R3, R4
ADD R1, R1, #1
CIS 501 | Dr. | ISAs & Single Cycle

240 Review: Insn Execution Model
System software
• The computer is just finite state machine • Registers(fewofthem,butfast)
• Memory(lotsofmemory,butslower)
• Programcounter(nextinsntoexecute)
• Sometimes called “instruction pointer”
• A computer executes instructions • Fetchesnextinstructionfrommemory • Decodesit(figureoutwhatitdoes)
• Readsitsinputs(registers&memory) • Executesit(adds,multiply,etc.)
• Writeitsoutputs(registers&memory) • Nextinsn(adjusttheprogramcounter)
• Programisjust“datainmemory”
• Makes computers programmable (“universal”)
Read Inputs
Write Output
Instruction ® Insn
CIS 501 | Dr. | ISAs & Single Cycle 13

What is an ISA?
CIS 501 | Dr. | ISAs & Single Cycle 17

What Is An ISA?
• ISA(instructionsetarchitecture)
• A well-defined hardware/software interface
• The “contract” between software and hardware
• Functional definition of storage locations & operations • Storage locations: registers, memory
• Operations: add, multiply, branch, load, store, etc
• Precise description of how to invoke & access them
• Not in the “contract”: non-functional aspects • How operations are implemented
• Which operations are fast and which are slow and when • Which operations take more power and which take less
• Instructions
• Bit-patterns hardware interprets as commands
• Instruction ® Insn (instruction is too long to write in slides)
CIS 501 | Dr. | ISAs & Single Cycle 18

A Language Analogy for ISAs
• Communication
• Person-to-person ® software-to-hardware
• Similar structure
• Narrative ® program
• Sentence ® insn
• Verb ® operation (add, multiply, load, branch)
• Noun ® data item (immediate, register value, memory value) • Adjective ® addressing mode
• Many different languages, many different ISAs
• Similar basic structure, details differ (sometimes greatly)
• Key differences between languages and ISAs
• Languages evolve organically, many ambiguities, inconsistencies • ISAs are explicitly engineered and extended, unambiguous
CIS 501 | Dr. | ISAs & Single Cycle 19

LC4 vs Real ISAs
• LC4 has the basic features of a real-world ISAs ± LC4 lacks a good bit of realism
• Address size is only 16 bits
• Only one data type (16-bit signed integer)
• Little support for system software, none for multiprocessing (later)
• Many real-world ISAs to choose from: • Intel x86 (laptops, desktop, and servers) • MIPS (used throughout in book)
• ARM (in all your mobile phones)
• PowerPC (servers & game consoles)
• SPARC (servers)
• Intel’s Itanium
• Historical: IBM 370, VAX, Alpha, PA-RISC, 68k, …
CIS 501 | Dr. | ISAs & Single Cycle 20

Some Key Attributes of ISAs
• Instruction encoding
• Fixed length (16-bit for LC4, 32-bit for MIPS & ARM)
• Variable length (1 byte to 16 bytes, average of ~3 bytes)
• Number and type of registers
• LC-4 has 8 registers
• MIPS has 32 “integer” registers and 32 “floating point” registers
• ARM & x86 both have 16 “integer” regs and 16 “floating point” regs
• Address space
• LC4: 16-bit addresses at 16-bit granularity (128KB total) • ARM: 32-bit addresses at 8-bit granularly (4GB total)
• Modern x86 and ARM64: 64-bit addresses (16 exabytes!)
• Memory addressing modes
• MIPS & LC4: address calculated by “reg+offset”
• x86 and others have much more complicated addressing modes
CIS 501 | Dr. | ISAs & Single Cycle 21

Access Granularity & Alignment
• Byteaddressability
• An address points to a byte (8 bits) of data
• The ISA’s minimum granularity to read or write memory
• ISAs also support wider load/stores
• “Half” (2 bytes), “Longs” (4 bytes), “Quads” (8 bytes)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
• Load.byte [6] -> r1 Load.long [12] -> r2
However, physical memory systems operate on even larger chunks
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
• Load.long [4] -> r1 Load.long [11] -> r2 “unaligned”
• Access alignment: if address % size != 0, then it is “unaligned”
• A single unaligned access may require multiple physical memory accesses
CIS 501 | Dr. | ISAs & Single Cycle 22

Handling Unaligned Accesses
• Access alignment: if address % size != 0, then it is “unaligned”
• A single unaligned access may require multiple physical memory accesses
• Howtohandlesuchunalignedaccesses?
1. Disallow (unaligned operations are considered illegal)
• MIPS, ARMv5 and earlier took this route 2. Support in hardware? (allow such operations)
• x86, ARMv6+ allow regular loads/stores to be unaligned
• Unaligned access still slower, adds significant hardware complexity
3. Trap to software routine?
• Simpler hardware, but high penalty when unaligned
4. In software (compiler can use regular instructions when possibly unaligned • Load, shift, load, shift, and (slow, needs help from compiler)
CIS 501 | Dr. | ISAs & Single Cycle 23

How big is this struct?
struct foo { char c; int i;
CIS 501 | Dr. | ISAs & Single Cycle 24

Another Addressing Issue: Endian-ness
Endian-ness: arrangement of bytes in a multi-byte number
• Big-endian: sensible order (e.g., MIPS, PowerPC, ARM)
• A 4-byte integer: “00000000 00000000 00000010 00000011” is 515
• Little-endian: reverse order (e.g., x86)
• A 4-byte integer: “00000011 00000010 00000000 00000000” is 515
• Why little endian?
00000011 00000010 00000000 00000000
starting address
integer casts are free on little-endian architectures
CIS 501 | Dr. | ISAs & Single Cycle

ISA Code Examples
CIS 501 | Dr. | ISAs & Single Cycle 26

Array Sum Loop: LC4
array .BLKW #100
sum .FILL #0
CONST R5, #0
LEA R1, array
LEA R2, sum
LDR R3, R1, #0
LDR R4, R2, #0
ADD R4, R3, R4
STR R4, R2, #0
ADD R1, R1, #1
ADD R5, R5, #1
CMPI R5, #100
int array[100];
void array_sum() {
for (int i=0; i<100;i++) sum += array[i]; CIS 501 | Dr. | ISAs & Single Cycle 27 Array Sum Loop: LC4 è MIPS array .BLKW #100 sum .FILL #0 CONST R5, #0 LEA R1, array LEA R2, sum LDR R3, R1, LDR R4, R2, ADD R4, R3, STR R4, R2, ADD R1, R1, ADD R5, R5, CMPI R5, #100 array: .space 100 sum: .word 0 array_sum: MIPS (right) similar to LC4 Syntactic differences: register names begin with $ immediates are un-prefixed 0 array sum add $4, $3, $4 sw $4, 0($2) addi $1, $1, 1 addi $5, $5, 1 li $6, 100 blt $5, $6, L1 Only simple addressing modes 0($1) 0($2) displacement(reg CIS 501 | Dr. | ISAs & Single Cycle 28 Left-most register is generally destination register Array Sum Loop: LC4 è x86 array .BLKW #100 sum .FILL #0 CONST R5, #0 LEA R1, array LEA R2, sum LDR R3, R1, LDR R4, R2, ADD R4, R3, STR R4, R2, ADD R1, R1, ADD R5, R5, CMPI R5, #100 .comm array,400,32 .comm sum,4,4 .globl array_sum array_sum: x86 (right) is different movl $0, -4(%rbp) movl -4(%rbp), %eax movl array(,%eax,4), %edx movl sum(%rip), %eax addl %edx, %eax movl %eax, sum(%rip) addl $1, -4(%rbp) cmpl $99,-4(%rbp) CIS 501 | Dr. | ISAs & Single Cycle 29 Many addressing modes Syntactic differences: register names begin with % immediates begin with $ %rbp is base (frame) pointer x86 Operand Model .comm array,400,32 .comm sum,4,4 movl -4(%rbp), %eax • x86 uses explicit accumulators • Both register and memory • Distinguished by addressing mode . array_sum most is typically source & destination) Two operand cmpl $99,-4(%rbp) array(,%eax,4), % Register accumulator: %eax = %eax + %edx Memory accumulator: Memory[%rbp 4] = Memory[%rbp CIS 501 | Dr. | ISAs & Single Cycle 30 “L” insn suffix and “%e...” reg. prefix mean “32-bit value” CIS 501 | Dr. | ISAs & Single Cycle 31 Implementing an ISA CIS 501 | Dr. | ISAs & Single Cycle 32 Implementing an ISA Register File Data Memory Insn memory • Datapath:performscomputation(registers,ALUs,etc.) • ISA specific: can implement every insn (single-cycle: in one pass!) • Control:determineswhichcomputationisperformed • Routes data through datapath (which regs, which ALU op) • Fetch:getinsn,translateopcodeintocontrol • Fetch ® Decode ® Execute “cycle” CIS 501 | Dr. | ISAs & Single Cycle 33 Two Types of Components Register File Data Memory Insn memory • Purelycombinational:statelesscomputation • ALUs, muxes, control • Arbitrary Boolean functions • Combinational/sequential:storage • PC, insn/data memories, register file • Internally contain some combinational components CIS 501 | Dr. | ISAs & Single Cycle 34 Example Datapath CIS 501 | Dr. | ISAs & Single Cycle 35 LC4 Datapath insn[2:0] insn[11:9] insn[11:9] 3’b111 insn[11:9] 3’b111 216 by 16 bit r1sel r2sel r1data 216 by 16 bit Branch Logic CIS 501 | Dr. | ISAs & Single Cycle MIPS Datapath CIS 501 | Dr. | ISAs & Single Cycle 37 Unified vs Split Memory Architecture Register File Insn/Data Memory • Unifiedarchitecture:unifiedinsn/datamemory • “Harvard”architecture:splitinsn/datamemories CIS 501 | Dr. | ISAs & Single Cycle 38 Datapath for MIPS ISA • MIPS: 32-bit instructions, registers are $0, $2... $31 • Consider only the following instructions add $1,$2,$3 addi $1,$2,3 lw $1,4($3) sw $1,4($3) beq $1,$2,PC_relative_target (branch equal) j absolute_target (unconditional jump) $1 = $2 + $3 (add) $1 = $2 + 3 (add immed) $1 = Memory[4+$3] (load) Memory[4+$3] = $1 (store) • Why only these? • Most other instructions are the same from datapath viewpoint • Theonesthataren’tareleftforyoutofigureoutJ CIS 501 | Dr. | ISAs & Single Cycle 39 MIPS Instruction layout CIS 501 | Dr. | ISAs & Single Cycle 40 Start With Fetch • PC and instruction memory (split insn/data architecture, for now) • A +4 incrementer computes default next instruction PC • How would Verilog for this look given insn memory as interface? CIS 501 | Dr. | ISAs & Single Cycle 41 First Instruction: add + Register File • Add register file • Add arithmetic/logical unit (ALU) CIS 501 | Dr. | ISAs & Single Cycle 42 Wire Select in Verilog • How to rip out individual fields of an insn? Wire select wire [31:0] insn; wire [5:0] op = insn[31:26]; wire [4:0] rs = insn[25:21]; wire [4:0] rt = insn[20:16]; wire [4:0] rd = insn[15:11]; wire [4:0] sh = insn[10:6]; wire [5:0] func = insn[5:0]; CIS 501 | Dr. | ISAs & Single Cycle 43 Second Instruction: addi + Register File • Destination register can now be either Rd or Rt • Add sign extension unit and mux into second ALU input CIS 501 | Dr. | ISAs & Single Cycle 44 Verilog Wire Concatenation • Recall two Verilog constructs • Wire concatenation: {bus0, bus1, ... , busn} • Wire repeat: {repeat_x_times{w0}} • How do you specify sign extension? Wire concatenation wire [31:0] insn; wire [15:0] imm16 = insn[15:0]; wire [31:0] sximm16 = {{16{imm16[15]}}, imm16}; CIS 501 | Dr. | ISAs & Single Cycle 45 Third Instruction: lw + Register File a Data d Mem • Add data memory, address is ALU output • Add register write data mux to select memory output or ALU output CIS 501 | Dr. | ISAs & Single Cycle 46 Fourth Instruction: sw + Register File a Data d Mem • Add path from second input register to data memory data input CIS 501 | Dr. | ISAs & Single Cycle 47 Fifth Instruction: beq + Register File • Add left shift unit and adder to compute PC-relative branch target • Add PC input mux to select PC+4 or branch target CIS 501 | Dr. | ISAs & Single Cycle 48 a Data d Mem Another Use of Wire Concatenation • How do you do <<2? Wire concatenation wire [31:0] insn; wire [25:0] imm26 = insn[25:0] wire [31:0] imm26_shifted_by_2 = {4’b0000, imm26, 2’b00}; CIS 501 | Dr. | ISAs & Single Cycle 49 Sixth Instruction: j + Register File • Add shifter to compute left shift of 26-bit immediate • Add additional PC input mux for jump target CIS 501 | Dr. | ISAs & Single Cycle 50 a Data d Mem MIPS Control CIS 501 | Dr. | ISAs & Single Cycle 51 What Is Control? Register File • 8 signals control flow of data through this datapath • MUX selectors, or register/memory write enable signals • A real datapath has 300-500 control signals CIS 501 | Dr. | ISAs & Single Cycle 52 ALUop ALUinB Example: Control for add << Register File ALUop=0 DMwe=0 Rdst=1 ALUinB=0 CIS 501 | Dr. | ISAs & Single Cycle 53 a Data d =0 Example: Control for sw Register File CIS 501 | Dr. | ISAs & Single Cycle 54 a Data d =X • Differencebetweenswandaddis5signals • 3 if you don’t count the X (don’t care) signals ALUop=0 DMwe=1 Rdst=X ALUinB=1 Example: Control for beq << Register File CIS 501 | Dr. | ISAs & Single Cycle 55 a Data d =X • Differencebetweenswandbeqisonly4signals ALUop=1 DMwe=0 Rdst=X ALUinB=0 How Is Control Implemented? CIS 501 | Dr. | ISAs & Single Cycle Register File a Data d Mem Implementing Control • Each instruction has a unique set of control signals • Most are function of opcode • Some may be encoded in the instruction itself • E.g., the ALUop signal is some portion of the MIPS Func field + Simplifies controller implementation • Requires careful ISA design CIS 501 | Dr. | ISAs & Single Cycle 57 Control Implementation: ROM • ROM(readonlymemory):likeaRAMbutunwritable • Bits in data words are control signals • Lines indexed by opcode • Example: ROM control for 6-insn MIPS datapath • X is “don’t care” add addi opcode lw sw beq j CIS 501 | Dr. | ISAs & Single Cycle 58 Control Implementation: Logic • Real machines have 100+ insns 300+ control signals • 30,000+ control bits (~4KB) – Not huge, but hard to make faster than datapath (important!) • Alternative: logic gates or “random logic” (unstructured) • Exploits the observation: many signals have few 1s or few 0s • Example: random logic control for 6-insn MIPS datapath add addi lw sw beq j BR JP D ALUop ALUinB CIS 501 | Dr. | ISAs & Single Cycle 59 Control Logic in Verilog wire [31:0] insn; wire [5:0] func = insn[5:0] 程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com