Computer Organization and Design
Unit 4: Single-Cycle Datapath
Based on slides by Profs. , C.J. Taylor, &
CIS 501 | Dr. | ISAs & Single Cycle 1
Copyright By PowCoder代写 加微信 powcoder
This Unit: Single-Cycle Datapath
System software
• Overview of ISAs
• Datapath storage elements • MIPS Datapath
• MIPS Control
CIS 501 | Dr. | ISAs & Single Cycle 2
• Sections 4.1 – 4.4
CIS 501 | Dr. | ISAs & Single Cycle 3
Recall from CIS240…
CIS 501 | Dr. | ISAs & Single Cycle 4
240 Review: ISA
System software
• App/OS are software … execute on hardware
• HW/SW interface is ISA (instruction set architecture) • A “contract” between SW and HW
• Encourages compatibility, allows SW/HW to evolve independently • FunctionaldefinitionofHWstoragelocations&operations
• Storage locations: registers, memory
• Operations: add, multiply, branch, load, store, etc. • Precisedescriptionofhowtoinvoke&accessthem
• Instructions (bit-patterns hardware interprets as commands)
CIS 501 | Dr. | ISAs & Single Cycle 8
240 Review: LC4 ISA
array .BLKW #100
sum .FILL #0
CONST R5, #0
LEA R1, array
LEA R2, sum
array_sum_loop
LDR R3, R1, #0
LDR R4, R2, #0
STR R4, R2, #0
ADD R5, R5, #1
CMPI R5, #100
System software
• LC4:atoyISAyouknow
• 16-bit ISA (what does this mean?) • 16-bit insns
• 8 registers (integer)
• ~30 different insns
• Simple OS support
• Assemblylanguage
• Human-readable ISA representation
CIS 501 | Dr. | ISAs & Single Cycle
array_sum_loop 9
ADD R4, R3, R4
ADD R1, R1, #1
371/501 Preview: A Real ISA
array: .space 100
sum: .word 0
array_sum:
la $1, array
la $2, sum
array_sum_loop:
lw $3, 0($1)
lw $4, 0($2)
sw $4, 0($2)
addi $5, $5, 1
li $6, 100
blt $5, $6, array_sum_loop
System software
• MIPS:exampleofrealISA • 32/64-bit operations
• 32-bit insns
• 64 registers
• 32 integer, 32 floating point • ~100 different insns
• Full OS support
add $4, $3, $4
addi $1, $1, 1
Example code is MIPS, but
all ISAs are similar at some level
CIS 501 | Dr. | ISAs & Single Cycle 10
240 Review: Assembly Language
Machine codeAssembly code
x9A00 CONST R5, #0
x9200 CONST R1, array
xD320 HICONST R1, array
x9464 CONST R2, sum
xD520 HICONST R2, sum
x6640 LDR R3, R1, #0
x6880 LDR R4, R2, #0
x7880 STR R4, R2, #0
x1BA1 ADD R5, R5, #1
x2B64 CMPI R5, #100
x03F8 BRn array_sum_loop
System software
• Assemblylanguage
• Human-readable representation
• Machinelanguage
• Machine-readable representation
• 1s and 0s (often displayed in “hex”)
• Assembler
• Translates assembly to machine
ADD R4, R3, R4
ADD R1, R1, #1
CIS 501 | Dr. | ISAs & Single Cycle
240 Review: Insn Execution Model
System software
• The computer is just finite state machine • Registers(fewofthem,butfast)
• Memory(lotsofmemory,butslower)
• Programcounter(nextinsntoexecute)
• Sometimes called “instruction pointer”
• A computer executes instructions • Fetchesnextinstructionfrommemory • Decodesit(figureoutwhatitdoes)
• Readsitsinputs(registers&memory) • Executesit(adds,multiply,etc.)
• Writeitsoutputs(registers&memory) • Nextinsn(adjusttheprogramcounter)
• Programisjust“datainmemory”
• Makes computers programmable (“universal”)
Read Inputs
Write Output
Instruction ® Insn
CIS 501 | Dr. | ISAs & Single Cycle 13
What is an ISA?
CIS 501 | Dr. | ISAs & Single Cycle 17
What Is An ISA?
• ISA(instructionsetarchitecture)
• A well-defined hardware/software interface
• The “contract” between software and hardware
• Functional definition of storage locations & operations • Storage locations: registers, memory
• Operations: add, multiply, branch, load, store, etc
• Precise description of how to invoke & access them
• Not in the “contract”: non-functional aspects • How operations are implemented
• Which operations are fast and which are slow and when • Which operations take more power and which take less
• Instructions
• Bit-patterns hardware interprets as commands
• Instruction ® Insn (instruction is too long to write in slides)
CIS 501 | Dr. | ISAs & Single Cycle 18
A Language Analogy for ISAs
• Communication
• Person-to-person ® software-to-hardware
• Similar structure
• Narrative ® program
• Sentence ® insn
• Verb ® operation (add, multiply, load, branch)
• Noun ® data item (immediate, register value, memory value) • Adjective ® addressing mode
• Many different languages, many different ISAs
• Similar basic structure, details differ (sometimes greatly)
• Key differences between languages and ISAs
• Languages evolve organically, many ambiguities, inconsistencies • ISAs are explicitly engineered and extended, unambiguous
CIS 501 | Dr. | ISAs & Single Cycle 19
LC4 vs Real ISAs
• LC4 has the basic features of a real-world ISAs ± LC4 lacks a good bit of realism
• Address size is only 16 bits
• Only one data type (16-bit signed integer)
• Little support for system software, none for multiprocessing (later)
• Many real-world ISAs to choose from: • Intel x86 (laptops, desktop, and servers) • MIPS (used throughout in book)
• ARM (in all your mobile phones)
• PowerPC (servers & game consoles)
• SPARC (servers)
• Intel’s Itanium
• Historical: IBM 370, VAX, Alpha, PA-RISC, 68k, …
CIS 501 | Dr. | ISAs & Single Cycle 20
Some Key Attributes of ISAs
• Instruction encoding
• Fixed length (16-bit for LC4, 32-bit for MIPS & ARM)
• Variable length (1 byte to 16 bytes, average of ~3 bytes)
• Number and type of registers
• LC-4 has 8 registers
• MIPS has 32 “integer” registers and 32 “floating point” registers
• ARM & x86 both have 16 “integer” regs and 16 “floating point” regs
• Address space
• LC4: 16-bit addresses at 16-bit granularity (128KB total) • ARM: 32-bit addresses at 8-bit granularly (4GB total)
• Modern x86 and ARM64: 64-bit addresses (16 exabytes!)
• Memory addressing modes
• MIPS & LC4: address calculated by “reg+offset”
• x86 and others have much more complicated addressing modes
CIS 501 | Dr. | ISAs & Single Cycle 21
Access Granularity & Alignment
• Byteaddressability
• An address points to a byte (8 bits) of data
• The ISA’s minimum granularity to read or write memory
• ISAs also support wider load/stores
• “Half” (2 bytes), “Longs” (4 bytes), “Quads” (8 bytes)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
• Load.byte [6] -> r1 Load.long [12] -> r2
However, physical memory systems operate on even larger chunks
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
• Load.long [4] -> r1 Load.long [11] -> r2 “unaligned”
• Access alignment: if address % size != 0, then it is “unaligned”
• A single unaligned access may require multiple physical memory accesses
CIS 501 | Dr. | ISAs & Single Cycle 22
Handling Unaligned Accesses
• Access alignment: if address % size != 0, then it is “unaligned”
• A single unaligned access may require multiple physical memory accesses
• Howtohandlesuchunalignedaccesses?
1. Disallow (unaligned operations are considered illegal)
• MIPS, ARMv5 and earlier took this route 2. Support in hardware? (allow such operations)
• x86, ARMv6+ allow regular loads/stores to be unaligned
• Unaligned access still slower, adds significant hardware complexity
3. Trap to software routine?
• Simpler hardware, but high penalty when unaligned
4. In software (compiler can use regular instructions when possibly unaligned • Load, shift, load, shift, and (slow, needs help from compiler)
CIS 501 | Dr. | ISAs & Single Cycle 23
How big is this struct?
struct foo { char c; int i;
CIS 501 | Dr. | ISAs & Single Cycle 24
Another Addressing Issue: Endian-ness
Endian-ness: arrangement of bytes in a multi-byte number
• Big-endian: sensible order (e.g., MIPS, PowerPC, ARM)
• A 4-byte integer: “00000000 00000000 00000010 00000011” is 515
• Little-endian: reverse order (e.g., x86)
• A 4-byte integer: “00000011 00000010 00000000 00000000” is 515
• Why little endian?
00000011 00000010 00000000 00000000
starting address
integer casts are free on little-endian architectures
CIS 501 | Dr. | ISAs & Single Cycle
ISA Code Examples
CIS 501 | Dr. | ISAs & Single Cycle 26
Array Sum Loop: LC4
array .BLKW #100
sum .FILL #0
CONST R5, #0
LEA R1, array
LEA R2, sum
LDR R3, R1, #0
LDR R4, R2, #0
ADD R4, R3, R4
STR R4, R2, #0
ADD R1, R1, #1
ADD R5, R5, #1
CMPI R5, #100
int array[100];
void array_sum() {
for (int i=0; i<100;i++)
sum += array[i];
CIS 501 | Dr. | ISAs & Single Cycle 27
Array Sum Loop: LC4 è MIPS
array .BLKW #100
sum .FILL #0
CONST R5, #0
LEA R1, array
LEA R2, sum
LDR R3, R1,
LDR R4, R2,
ADD R4, R3,
STR R4, R2,
ADD R1, R1,
ADD R5, R5,
CMPI R5, #100
array: .space 100
sum: .word 0
array_sum:
MIPS (right) similar to LC4
Syntactic differences: register names begin with $ immediates are un-prefixed
0 array sum
add $4, $3, $4
sw $4, 0($2)
addi $1, $1, 1
addi $5, $5, 1
li $6, 100
blt $5, $6, L1
Only simple addressing modes
0($1) 0($2)
displacement(reg
CIS 501 | Dr. | ISAs & Single Cycle 28
Left-most register is generally destination register
Array Sum Loop: LC4 è x86
array .BLKW #100
sum .FILL #0
CONST R5, #0
LEA R1, array
LEA R2, sum
LDR R3, R1,
LDR R4, R2,
ADD R4, R3,
STR R4, R2,
ADD R1, R1,
ADD R5, R5,
CMPI R5, #100
.comm array,400,32
.comm sum,4,4
.globl array_sum
array_sum:
x86 (right) is different
movl $0, -4(%rbp)
movl -4(%rbp), %eax
movl array(,%eax,4), %edx
movl sum(%rip), %eax
addl %edx, %eax
movl %eax, sum(%rip)
addl $1, -4(%rbp)
cmpl $99,-4(%rbp)
CIS 501 | Dr. | ISAs & Single Cycle 29
Many addressing modes
Syntactic differences: register names begin with % immediates begin with $
%rbp is base (frame) pointer
x86 Operand Model
.comm array,400,32
.comm sum,4,4
movl -4(%rbp), %eax
• x86 uses explicit accumulators • Both register and memory
• Distinguished by addressing mode
. array_sum
most is typically source & destination)
Two operand
cmpl $99,-4(%rbp)
array(,%eax,4), %
Register accumulator: %eax = %eax + %edx
Memory accumulator:
Memory[%rbp
4] = Memory[%rbp
CIS 501 | Dr. | ISAs & Single Cycle 30
“L” insn suffix and “%e...” reg. prefix mean “32-bit value”
CIS 501 | Dr. | ISAs & Single Cycle 31
Implementing an ISA
CIS 501 | Dr. | ISAs & Single Cycle 32
Implementing an ISA
Register File
Data Memory
Insn memory
• Datapath:performscomputation(registers,ALUs,etc.) • ISA specific: can implement every insn (single-cycle: in one pass!)
• Control:determineswhichcomputationisperformed • Routes data through datapath (which regs, which ALU op)
• Fetch:getinsn,translateopcodeintocontrol
• Fetch ® Decode ® Execute “cycle”
CIS 501 | Dr. | ISAs & Single Cycle 33
Two Types of Components
Register File
Data Memory
Insn memory
• Purelycombinational:statelesscomputation • ALUs, muxes, control
• Arbitrary Boolean functions
• Combinational/sequential:storage • PC, insn/data memories, register file
• Internally contain some combinational components
CIS 501 | Dr. | ISAs & Single Cycle 34
Example Datapath
CIS 501 | Dr. | ISAs & Single Cycle 35
LC4 Datapath
insn[2:0] insn[11:9]
insn[11:9] 3’b111
insn[11:9] 3’b111
216 by 16 bit
r1sel r2sel r1data
216 by 16 bit
Branch Logic
CIS 501 | Dr. | ISAs & Single Cycle
MIPS Datapath
CIS 501 | Dr. | ISAs & Single Cycle 37
Unified vs Split Memory Architecture
Register File
Insn/Data Memory
• Unifiedarchitecture:unifiedinsn/datamemory
• “Harvard”architecture:splitinsn/datamemories
CIS 501 | Dr. | ISAs & Single Cycle 38
Datapath for MIPS ISA
• MIPS: 32-bit instructions, registers are $0, $2... $31 • Consider only the following instructions
add $1,$2,$3
addi $1,$2,3
lw $1,4($3)
sw $1,4($3)
beq $1,$2,PC_relative_target (branch equal)
j absolute_target (unconditional jump)
$1 = $2 + $3 (add)
$1 = $2 + 3 (add immed)
$1 = Memory[4+$3] (load)
Memory[4+$3] = $1 (store)
• Why only these?
• Most other instructions are the same from datapath viewpoint
• Theonesthataren’tareleftforyoutofigureoutJ
CIS 501 | Dr. | ISAs & Single Cycle 39
MIPS Instruction layout
CIS 501 | Dr. | ISAs & Single Cycle 40
Start With Fetch
• PC and instruction memory (split insn/data architecture, for now) • A +4 incrementer computes default next instruction PC
• How would Verilog for this look given insn memory as interface?
CIS 501 | Dr. | ISAs & Single Cycle 41
First Instruction: add +
Register File
• Add register file
• Add arithmetic/logical unit (ALU)
CIS 501 | Dr. | ISAs & Single Cycle 42
Wire Select in Verilog
• How to rip out individual fields of an insn? Wire select wire [31:0] insn;
wire [5:0] op = insn[31:26];
wire [4:0] rs = insn[25:21];
wire [4:0] rt = insn[20:16];
wire [4:0] rd = insn[15:11];
wire [4:0] sh = insn[10:6];
wire [5:0] func = insn[5:0];
CIS 501 | Dr. | ISAs & Single Cycle 43
Second Instruction: addi +
Register File
• Destination register can now be either Rd or Rt
• Add sign extension unit and mux into second ALU input
CIS 501 | Dr. | ISAs & Single Cycle 44
Verilog Wire Concatenation
• Recall two Verilog constructs
• Wire concatenation: {bus0, bus1, ... , busn}
• Wire repeat: {repeat_x_times{w0}}
• How do you specify sign extension? Wire concatenation wire [31:0] insn;
wire [15:0] imm16 = insn[15:0];
wire [31:0] sximm16 = {{16{imm16[15]}}, imm16};
CIS 501 | Dr. | ISAs & Single Cycle 45
Third Instruction: lw +
Register File
a Data d Mem
• Add data memory, address is ALU output
• Add register write data mux to select memory output or ALU output
CIS 501 | Dr. | ISAs & Single Cycle 46
Fourth Instruction: sw +
Register File
a Data d Mem
• Add path from second input register to data memory data input
CIS 501 | Dr. | ISAs & Single Cycle 47
Fifth Instruction: beq +
Register File
• Add left shift unit and adder to compute PC-relative branch target • Add PC input mux to select PC+4 or branch target
CIS 501 | Dr. | ISAs & Single Cycle 48
a Data d Mem
Another Use of Wire Concatenation
• How do you do <<2? Wire concatenation
wire [31:0] insn;
wire [25:0] imm26 = insn[25:0]
wire [31:0] imm26_shifted_by_2 = {4’b0000, imm26, 2’b00};
CIS 501 | Dr. | ISAs & Single Cycle 49
Sixth Instruction: j +
Register File
• Add shifter to compute left shift of 26-bit immediate • Add additional PC input mux for jump target
CIS 501 | Dr. | ISAs & Single Cycle 50
a Data d Mem
MIPS Control
CIS 501 | Dr. | ISAs & Single Cycle 51
What Is Control?
Register File
• 8 signals control flow of data through this datapath • MUX selectors, or register/memory write enable signals
• A real datapath has 300-500 control signals
CIS 501 | Dr. | ISAs & Single Cycle 52
ALUop ALUinB
Example: Control for add <<
Register File
ALUop=0 DMwe=0 Rdst=1 ALUinB=0
CIS 501 | Dr. | ISAs & Single Cycle 53
a Data d =0
Example: Control for sw
Register File
CIS 501 | Dr. | ISAs & Single Cycle 54
a Data d =X
• Differencebetweenswandaddis5signals
• 3 if you don’t count the X (don’t care) signals
ALUop=0 DMwe=1 Rdst=X ALUinB=1
Example: Control for beq <<
Register File
CIS 501 | Dr. | ISAs & Single Cycle 55
a Data d =X
• Differencebetweenswandbeqisonly4signals
ALUop=1 DMwe=0 Rdst=X ALUinB=0
How Is Control Implemented?
CIS 501 | Dr. | ISAs & Single Cycle
Register File
a Data d Mem
Implementing Control
• Each instruction has a unique set of control signals • Most are function of opcode
• Some may be encoded in the instruction itself
• E.g., the ALUop signal is some portion of the MIPS Func field + Simplifies controller implementation
• Requires careful ISA design
CIS 501 | Dr. | ISAs & Single Cycle 57
Control Implementation: ROM
• ROM(readonlymemory):likeaRAMbutunwritable • Bits in data words are control signals
• Lines indexed by opcode
• Example: ROM control for 6-insn MIPS datapath
• X is “don’t care”
add addi opcode lw sw beq j
CIS 501 | Dr. | ISAs & Single Cycle 58
Control Implementation: Logic
• Real machines have 100+ insns 300+ control signals
• 30,000+ control bits (~4KB)
– Not huge, but hard to make faster than datapath (important!)
• Alternative: logic gates or “random logic” (unstructured) • Exploits the observation: many signals have few 1s or few 0s
• Example: random logic control for 6-insn MIPS datapath
add addi lw sw beq j
BR JP D ALUop ALUinB CIS 501 | Dr. | ISAs & Single Cycle 59
Control Logic in Verilog
wire [31:0] insn;
wire [5:0] func = insn[5:0]
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com