CS代考 CSE 371 Computer Organization and Design

CSE 371 Computer Organization and Design

CIS 501 | Dr. | ISAs & Single Cycle
Computer Organization and Design

Copyright By PowCoder代写 加微信 powcoder

Unit 4: Single-Cycle Datapath

Based on slides by Profs. , C.J. Taylor, &

CIS 501 | Dr. | ISAs & Single Cycle
This Unit: Single-Cycle Datapath
Overview of ISAs
Datapath storage elements
MIPS Datapath
MIPS Control

System software

credential:
bring a computer

This can be an hidden slide. I just want to use this to do my own planning.
I have rearranged Culler’s lecture slides slightly and add more slides. This covers everything he covers in his first lecture (and more) but may
We will save the fun part, “ Levels of Organization,” at the end (so student can stay awake): I will show the internal stricture of the SS10/20.

Notes to Patterson: You may want to edit the slides in your section or add extra slides to taylor your needs.

CIS 501 | Dr. | ISAs & Single Cycle

Sections 4.1 – 4.4

Recall from CIS240…

CIS 501 | Dr. | ISAs & Single Cycle

240 Review: Applications
Applications (Firefox, iTunes, Skype, Word, Google)
Run on hardware … but how?

CIS 501 | Dr. | ISAs & Single Cycle

System software

240 Review: I/O
Apps interact with us & each other via I/O (input/output)
With us: display, sound, keyboard, mouse, touch-screen, camera
With each other: disk, network (wired or wireless)
Most I/O proper is analog-digital and domain of EE
I/O devices present rest of computer a digital interface (1s and 0s)

CIS 501 | Dr. | ISAs & Single Cycle

System software

240 Review: OS
I/O (& other services) provided by OS (operating system)
A super-app with privileged access to all hardware
Abstracts away a lot of the nastiness of hardware
Virtualizes hardware to isolate programs from one another
Each application is oblivious to presence of others
Simplifies programming, makes system more robust and secure
Privilege is key to this
Commons OSes are Windows, Linux, MacOS

CIS 501 | Dr. | ISAs & Single Cycle

System software

240 Review: ISA
App/OS are software … execute on hardware
HW/SW interface is ISA (instruction set architecture)
A “contract” between SW and HW
Encourages compatibility, allows SW/HW to evolve independently
Functional definition of HW storage locations & operations
Storage locations: registers, memory
Operations: add, multiply, branch, load, store, etc.
Precise description of how to invoke & access them
Instructions (bit-patterns hardware interprets as commands)
CIS 501 | Dr. | ISAs & Single Cycle

System software

240 Review: LC4 ISA
LC4: a toy ISA you know
16-bit ISA (what does this mean?)
16-bit insns
8 registers (integer)
~30 different insns
Simple OS support

Assembly language
Human-readable ISA representation

CIS 501 | Dr. | ISAs & Single Cycle

System software
array .BLKW #100
sum .FILL #0
CONST R5, #0
LEA R1, array
LEA R2, sum
array_sum_loop
LDR R3, R1, #0
LDR R4, R2, #0
ADD R4, R3, R4
STR R4, R2, #0
ADD R1, R1, #1
ADD R5, R5, #1
CMPI R5, #100
BRn array_sum_loop

371/501 Preview: A Real ISA
MIPS: example of real ISA
32/64-bit operations
32-bit insns
64 registers
32 integer, 32 floating point
~100 different insns
Full OS support

CIS 501 | Dr. | ISAs & Single Cycle

System software
array: .space 100
sum: .word 0

array_sum:
la $1, array
la $2, sum
array_sum_loop:
lw $3, 0($1)
lw $4, 0($2)
add $4, $3, $4
sw $4, 0($2)
addi $1, $1, 1
addi $5, $5, 1
li $6, 100
blt $5, $6, array_sum_loop

Example code is MIPS, but
all ISAs are similar at some level

240 Review: Program Compilation
Program written in a “high-level” programming language
C, C++, Java, C#
Hierarchical, structured control: loops, functions, conditionals
Hierarchical, structured data: scalars, arrays, pointers, structures
Compiler: translates program to assembly
Parsing and straight-forward translation
Compiler also optimizes
Compiler itself another application … who compiled compiler?

CIS 501 | Dr. | ISAs & Single Cycle

System software
int array[100], sum;
void array_sum() {
for (int i=0; i<100;i++) { sum += array[i]; 240 Review: Assembly Language Assembly language Human-readable representation Machine language Machine-readable representation 1s and 0s (often displayed in “hex”) Translates assembly to machine CIS 501 | Dr. | ISAs & Single Cycle System software x9A00 CONST R5, #0 x9200 CONST R1, array xD320 HICONST R1, array x9464 CONST R2, sum xD520 HICONST R2, sum x6640 LDR R3, R1, #0 x6880 LDR R4, R2, #0 x18C4 ADD R4, R3, R4 x7880 STR R4, R2, #0 x1261 ADD R1, R1, #1 x1BA1 ADD R5, R5, #1 x2B64 CMPI R5, #100 x03F8 BRn array_sum_loop Machine code Assembly code 240 Review: Insn Execution Model The computer is just finite state machine Registers (few of them, but fast) Memory (lots of memory, but slower) Program counter (next insn to execute) Sometimes called “instruction pointer” A computer executes instructions Fetches next instruction from memory Decodes it (figure out what it does) Reads its inputs (registers & memory) Executes it (adds, multiply, etc.) Write its outputs (registers & memory) Next insn (adjust the program counter) Program is just “data in memory” Makes computers programmable (“universal”) CIS 501 | Dr. | ISAs & Single Cycle System software Read Inputs Write Output Instruction  Insn Role of the Compiler CIS 501 | Dr. | ISAs & Single Cycle CIS 501 | Dr. | ISAs & Single Cycle Compiler Optimizations Primarily goal: reduce instruction count Eliminate redundant computation, keep more things in registers Registers are faster, fewer loads/stores An ISA can make this difficult by having too few registers Reduce branches and jumps (later) Reduce cache misses (later) Reduce dependences between nearby insns (later) An ISA can make this difficult by having implicit dependences How effective are these? Can give 4X performance over unoptimized code Collective wisdom of 40 years (“Proebsting’s Law”): 4% per year Allows higher-level languages to perform adequately (Javascript) Compiler Optimization Example (LC4) Left: common sub-expression elimination Remove calculations whose results are already in some register Right: register allocation Keep temporary in register across statements, avoid stack spill/fill CIS 501 | Dr. | ISAs & Single Cycle ;; temp = *first LDR R7, R5, #2 ; R7=first LDR R4, R7, #0 STR R4, R5, #-1 ;; *first = *second LDR R3, R5, #3 ; R3=second LDR R2, R3, #0 LDR R7, R5, #2 ; redundant STR R2, R7, #0 ;; *second = temp LDR R4, R5, #-1 LDR R3, R5, #3 ; redundant STR R4, R3, #0 ;; temp = *first LDR R7, R5, #2 LDR R4, R7, #0 STR R4, R5, #-1 ; unneeded ;; *first = *second LDR R3, R5, #3 LDR R2, R3, #0 STR R2, R7, #0 ;; *second = temp LDR R4, R5, #-1 ; unneeded STR R4, R3, #0 What is an ISA? CIS 501 | Dr. | ISAs & Single Cycle CIS 501 | Dr. | ISAs & Single Cycle What Is An ISA? ISA (instruction set architecture) A well-defined hardware/software interface The “contract” between software and hardware Functional definition of storage locations & operations Storage locations: registers, memory Operations: add, multiply, branch, load, store, etc Precise description of how to invoke & access them Not in the “contract”: non-functional aspects How operations are implemented Which operations are fast and which are slow and when Which operations take more power and which take less Instructions Bit-patterns hardware interprets as commands Instruction  Insn (instruction is too long to write in slides) CIS 501 | Dr. | ISAs & Single Cycle A Language Analogy for ISAs Communication Person-to-person  software-to-hardware Similar structure Narrative  program Sentence  insn Verb  operation (add, multiply, load, branch) Noun  data item (immediate, register value, memory value) Adjective  addressing mode Many different languages, many different ISAs Similar basic structure, details differ (sometimes greatly) Key differences between languages and ISAs Languages evolve organically, many ambiguities, inconsistencies ISAs are explicitly engineered and extended, unambiguous CIS 501 | Dr. | ISAs & Single Cycle LC4 vs Real ISAs LC4 has the basic features of a real-world ISAs LC4 lacks a good bit of realism Address size is only 16 bits Only one data type (16-bit signed integer) Little support for system software, none for multiprocessing (later) Many real-world ISAs to choose from: Intel x86 (laptops, desktop, and servers) MIPS (used throughout in book) ARM (in all your mobile phones) PowerPC (servers & game consoles) SPARC (servers) Intel’s Itanium Historical: IBM 370, VAX, Alpha, PA-RISC, 68k, … Some Key Attributes of ISAs Instruction encoding Fixed length (16-bit for LC4, 32-bit for MIPS & ARM) Variable length (1 byte to 16 bytes, average of ~3 bytes) Number and type of registers LC-4 has 8 registers MIPS has 32 “integer” registers and 32 “floating point” registers ARM & x86 both have 16 “integer” regs and 16 “floating point” regs Address space LC4: 16-bit addresses at 16-bit granularity (128KB total) ARM: 32-bit addresses at 8-bit granularly (4GB total) Modern x86 and ARM64: 64-bit addresses (16 exabytes!) Memory addressing modes MIPS & LC4: address calculated by “reg+offset” x86 and others have much more complicated addressing modes CIS 501 | Dr. | ISAs & Single Cycle Advantages of fixed-length insn over variable-length: Compact execution Simple to decode Easy to add instructions Access Granularity & Alignment Byte addressability An address points to a byte (8 bits) of data The ISA’s minimum granularity to read or write memory ISAs also support wider load/stores “Half” (2 bytes), “Longs” (4 bytes), “Quads” (8 bytes) Load.byte [6] -> r1 Load.long [12] -> r2

However, physical memory systems operate on even larger chunks

Load.long [4] -> r1 Load.long [11] -> r2 “unaligned”

Access alignment: if address % size != 0, then it is “unaligned”
A single unaligned access may require multiple physical memory accesses
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
01001001 00101101 01101001 11001011 00001001 01011000 00111001 11011101 01001001 00101101 01101001 11001011 00001001 01011000 00111001 11011101

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
01001001 00101101 01101001 11001011 00001001 01011000 00111001 11011101 01001001 00101101 01101001 11001011 00001001 01011000 00111001 11011101

CIS 501 | Dr. | ISAs & Single Cycle

Q: unaligned byte access?

Handling Unaligned Accesses
Access alignment: if address % size != 0, then it is “unaligned”
A single unaligned access may require multiple physical memory accesses

How to handle such unaligned accesses?
1. Disallow (unaligned operations are considered illegal)
MIPS, ARMv5 and earlier took this route
2. Support in hardware? (allow such operations)
x86, ARMv6+ allow regular loads/stores to be unaligned
Unaligned access still slower, adds significant hardware complexity
3. Trap to software routine?
Simpler hardware, but high penalty when unaligned
4. In software (compiler can use regular instructions when possibly unaligned
Load, shift, load, shift, and (slow, needs help from compiler)

CIS 501 | Dr. | ISAs & Single Cycle

What does LC4 do?

How big is this struct?
struct foo {
CIS 501 | Dr. | ISAs & Single Cycle

hint: avoid unaligned accesses
Java doesn’t enforce field ordering

Another Addressing Issue: Endian-ness
Endian-ness: arrangement of bytes in a multi-byte number
Big-endian: sensible order (e.g., MIPS, PowerPC, ARM)
A 4-byte integer: “00000000 00000000 00000010 00000011” is 515
Little-endian: reverse order (e.g., x86)
A 4-byte integer: “00000011 00000010 00000000 00000000” is 515
Why little endian?

00000011 00000010 00000000 00000000
starting address

integer casts are free on little-endian architectures
CIS 501 | Dr. | ISAs & Single Cycle

sign checks cheaper on big-endian

ISA Code Examples

CIS 501 | Dr. | ISAs & Single Cycle

Array Sum Loop: LC4
CIS 501 | Dr. | ISAs & Single Cycle
array .BLKW #100
sum .FILL #0
CONST R5, #0
LEA R1, array
LEA R2, sum
LDR R3, R1, #0
LDR R4, R2, #0
ADD R4, R3, R4
STR R4, R2, #0
ADD R1, R1, #1
ADD R5, R5, #1
CMPI R5, #100

int array[100];
void array_sum() {
for (int i=0; i<100;i++) { sum += array[i]; Array Sum Loop: LC4  MIPS CIS 501 | Dr. | ISAs & Single Cycle array .BLKW #100 sum .FILL #0 CONST R5, #0 LEA R1, array LEA R2, sum LDR R3, R1, #0 LDR R4, R2, #0 ADD R4, R3, R4 STR R4, R2, #0 ADD R1, R1, #1 ADD R5, R5, #1 CMPI R5, #100 array: .space 100 sum: .word 0 array_sum: la $1, array la $2, sum lw $3, 0($1) lw $4, 0($2) add $4, $3, $4 sw $4, 0($2) addi $1, $1, 1 addi $5, $5, 1 li $6, 100 blt $5, $6, L1 Syntactic differences: register names begin with $ immediates are un-prefixed MIPS (right) similar to LC4 Only simple addressing modes syntax: displacement(reg) Left-most register is generally destination register Array Sum Loop: LC4  x86 CIS 501 | Dr. | ISAs & Single Cycle array .BLKW #100 sum .FILL #0 CONST R5, #0 LEA R1, array LEA R2, sum LDR R3, R1, #0 LDR R4, R2, #0 ADD R4, R3, R4 STR R4, R2, #0 ADD R1, R1, #1 ADD R5, R5, #1 CMPI R5, #100 .comm array,400,32 .comm sum,4,4 .globl array_sum array_sum: movl $0, -4(%rbp) movl -4(%rbp), %eax movl array(,%eax,4), %edx movl sum(%rip), %eax addl %edx, %eax movl %eax, sum(%rip) addl $1, -4(%rbp) cmpl $99,-4(%rbp) x86 (right) is different Syntactic differences: register names begin with % immediates begin with $ %rbp is base (frame) pointer Many addressing modes x86 Operand Model x86 uses explicit accumulators Both register and memory Distinguished by addressing mode CIS 501 | Dr. | ISAs & Single Cycle .comm array,400,32 .comm sum,4,4 .globl array_sum array_sum: movl $0, -4(%rbp) movl -4(%rbp), %eax movl array(,%eax,4), %edx movl sum(%rip), %eax addl %edx, %eax movl %eax, sum(%rip) addl $1, -4(%rbp) cmpl $99,-4(%rbp) Register accumulator: %eax = %eax + %edx Memory accumulator: Memory[%rbp-4] = Memory[%rbp-4] + 1 Two operand insns (right-most is typically source & destination) “L” insn suffix and “%e…” reg. prefix mean “32-bit value” CIS 501 | Dr. | ISAs & Single Cycle Implementing an ISA CIS 501 | Dr. | ISAs & Single Cycle CIS 501 | Dr. | ISAs & Single Cycle Implementing an ISA Datapath: performs computation (registers, ALUs, etc.) ISA specific: can implement every insn (single-cycle: in one pass!) Control: determines which computation is performed Routes data through datapath (which regs, which ALU op) Fetch: get insn, translate opcode into control Fetch  Decode  Execute “cycle” CIS 501 | Dr. | ISAs & Single Cycle Two Types of Components Purely combinational: stateless computation ALUs, muxes, control Arbitrary Boolean functions Combinational/sequential: storage PC, insn/data memories, register file Internally contain some combinational components Example Datapath CIS 501 | Dr. | ISAs & Single Cycle Partial implementation of LC4 Leaves out the condition codes – NZP and privilege bit insn[11:9] LC4 Datapath insn[11:9] insn[11:9] 216 by 16 bit CIS 501 | Dr. | ISAs & Single Cycle MIPS Datapath CIS 501 | Dr. | ISAs & Single Cycle CIS 501 | Dr. | ISAs & Single Cycle Unified vs Split Memory Architecture Unified architecture: unified insn/data memory “Harvard” architecture: split insn/data memories Insn/Data Memory CIS 501 | Dr. | ISAs & Single Cycle Datapath for MIPS ISA MIPS: 32-bit instructions, registers are $0, $2… $31 Consider only the following instructions add $1,$2,$3 $1 = $2 + $3 (add) addi $1,$2,3 $1 = $2 + 3 (add immed) lw $1,4($3) $1 = Memory[4+$3] (load) sw $1,4($3) Memory[4+$3] = $1 (store) beq $1,$2,PC_relative_target (branch equal) j absolute_target (unconditional jump) Why only these? Most other instructions are the same from datapath viewpoint The ones that aren’t are left for you to figure out  MIPS Instruction layout CIS 501 | Dr. | ISAs & Single Cycle There is also a J-format instruction Remember there are 32 registers defined in MIPS so 18, 19 and 17 are valid register indices. CIS 501 | Dr. | ISAs & Single Cycle Start With C and instruction memory (split insn/data architecture, for now) A +4 incrementer computes default next instruction PC How would Verilog for this look given insn memory as interface? CIS 501 | Dr. | ISAs & Single Cycle First Instruction: add Add register file Add arithmetic/logical unit (ALU) Wire Select in Verilog How to rip out individual fields of an insn? Wire select wire [31:0] insn; wire [5:0] op = insn[31:26]; wire [4:0] rs = insn[25:21]; wire [4:0] rt = insn[20:16]; wire [4:0] rd = insn[15:11]; wire [4:0] sh = insn[10:6]; wire [5:0] func = insn[5:0]; CIS 501 | Dr. | ISAs & Single Cycle CIS 501 | Dr. | ISAs & Single Cycle Second Instruction: addi Destination register can now be either Rd or Rt Add sign extension unit and mux into second ALU input Verilog Wire Concatenation Recall two Verilog constructs Wire concatenation: {bus0, bus1, … , busn} Wire repeat: {repeat_x_times{w0}} How do you specify sign extension? Wire concatenation wire [31:0] insn; wire [15:0] imm16 = insn[15:0]; wire [31:0] sximm16 = {{16{imm16[15]}}, imm16}; CIS 501 | Dr. | ISAs & Single Cycle CIS 501 | Dr. | ISAs & Single Cycle Third Instruction: lw Add data memory, address is ALU output Add register write data mux to select memory output or ALU output CIS 501 | Dr. | ISAs & Single Cycle Fourth Instruction: sw Add path from second input register to data memory data input CIS 501 | Dr. | ISAs & Single Cycle Fifth Instruction: beq Add left shift unit and adder to compute PC-relative branch target Add PC input mux to select PC+4 or branch target 程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com