Formal Processor Verification
Processor Architecture
Copyright By PowCoder代写 加微信 powcoder
Acknowledgement: These slides are based on the textbook
(Computer Systems: A Programmer’s Perspective) and its slides.
Background
Instruction set architecture
Logic design
Sequential implementation of processor
Simple but not fast
Pipelining
Run things simultaneously
Potential hazards caused by data/control dependencies
How to make it correct?
Instruction Set Architecture (ISA)
Assembly Language View
Processor state
Registers, memory, …
Instruction set: Y86-64
A simplified version of the Intel x86-64
How to encode instructions
(e.g., addq, pushq, …) as bytes?
Layer of Abstraction
Layers above ISA: how to use instructions?
Layers below ISA: what needs to be built?
How to make it run fast?
E.g., execute multiple instructions simultaneously
Application
Y86-64 Processor State
Program Registers
15 registers (omit %r15), each 64 bits
Condition Codes
Single-bit flags set by arithmetic or logical instructions
ZF: Zero SF: Negative OF: Overflow
Program Counter
Indicates address of next instruction
Program Status
Indicates either normal operation or some error condition
Byte-addressable storage array
Words stored in little-endian byte order
RF: Program registers
CC: Condition codes
DMEM: Memory
Stat: Program status
Y86-64 Instruction Set
cmovXX rA, rB
irmovq V, rB
rmmovq rA, D(rB)
mrmovq D(rB), rA
OPq rA, rB
Y86-64 Instructions
Format of instruction
Size: 1–10 bytes
The first byte is the instruction type;
it can also be used to determine the instruction length
Each instruction may read or write some part(s) of the program state
Encoding Registers
Each register has 4-bit ID
Register ID 15 (0xF) indicates “no register”
Used to indicate “no register” in some instructions
No Register
Instruction Example
Addition Instruction
Add value in register rA to that in register rB
Store result in register rB
Note: Y86-64 only allows addition on register data
Set condition codes based on result
e.g., addq %rax,%rsi Encoding: 60 06
Two-byte encoding
The first byte indicates instruction type
The second byte gives source and destination registers
addq rA, rB
Encoded Representation
Generic Form
Arithmetic and Logical Instructions
Refer to generically as “OPq”
Encodings differ only by “function code”
Low-order 4 bits in first instruction word
Set condition codes as side effect
addq rA, rB
subq rA, rB
andq rA, rB
xorq rA, rB
Subtract (rA from rB)
Exclusive-Or
Instruction Code
Function Code
Move Instructions
They have different names based on different operand types (e.g., register or memory)
Much simpler than the movq instruction in x86-64
rrmovq rA, rB
Register Register
Immediate Register
irmovq V, rB
Register Memory
rmmovq rA, D(rB)
Memory Register
mrmovq D(rA), rB
Move Instruction Examples:
X86-64 (Intel) vs. Y86-64 (simplified)
irmovq $0xabcd, %rdx
movq $0xabcd, %rdx
30 f2 cd ab 00 00 00 00 00 00
rrmovq %rsp, %rbx
movq %rsp, %rbx
mrmovq -12(%rbp),%rcx
movq -12(%rbp),%rcx
50 51 f4 ff ff ff ff ff ff ff
rmmovq %rsi,0x41c(%rsp)
movq %rsi,0x41c(%rsp)
40 64 1c 04 00 00 00 00 00 00
little endian
Logic Design
Digital Signals and Logic Gates
Logic gates:
Outputs are Boolean functions of inputs
Respond continuously to inputs
With some delay
Digital signal:
Use voltage thresholds to represent discrete values
1-bit signal: either high (1) or low (0)
With guard range between them
The definitions of these functions can be found in lecture 2
Example: Bit Equality
Generate 1 if a and b are equal
Hardware Control Language (HCL)
Simple hardware description language
Boolean operations have syntax similar to C logical operations
bool eq = (a&&b)||(!a&&!b)
HCL Expression
Example: Word Equality (64-bit)
Word-Level Representation
bool Eq = (A == B)
HCL Representation
Example: Word Multiplexor (64-bit)
Select input word A or B depending on control signal s
Word-Level Representation
HCL Representation
int Out = [
Arithmetic Logic Unit
The control signal selects the function to be computed
Corresponding to 4 arithmetic/logical instructions in Y86-64
ALU also computes condition codes
[Exercise] Draw the circuit diagram for X+Y
Background
Instruction set architecture
Logic design
Sequential implementation of processor
Simple but not fast
Pipelining
Run things simultaneously
Potential hazards caused by data/control dependencies
How to make it correct?
Building Blocks
Combinational Logic
Compute Boolean functions of inputs
Continuously respond to inputs
Storage Elements
Addressable memories
Loaded only as clock rises
SEQ Hardware Structure
Program counter register (PC)
Condition code register (CC)
Register File
Data memory: for reading/writing data
Instruction memory: for reading instructions
Instruction Flow
Read instruction at address specified by PC
Process through stages
Update program counter
Instruction
Instruction
Write back
SEQ Stages
instruction from instruction memory
Read operands from registers
Compute value or address
Read or write data
Write Back
Write to registers
Update program counter
Instruction
Instruction
Write back
Instruction Decoding
Instruction Format
Instruction byte icode:ifun
Optional register byte rA:rB
Optional constant word valC
Computed Values
(to be used in the following slides)
icode Instruction code
ifun Instruction function
rA Instr. Register A
rB Instr. Register B
valC Instruction constant
valP Incremented PC
srcA Register ID A
srcB Register ID B
dstE Destination Register E
dstM Destination Register M
valA Register value A
valB Register value B
valE ALU result
Cnd Branch/move flag
valM Value from memory
Executing Arith. / Logical Instruction
OPq rA, rB
icode:ifun M1[PC]
rA:rB M1[PC+1]
valP PC+2
valA R[rA]
valB R[rB]
valE valB OP valA
R[rB] valE
Write back result
OPq rA, rB
Read 2 bytes
Read operand registers
Perform operation
Set condition codes
Executing Move Instruction: rmmovq
(by using ALU)
rmmovq rA, D(rB)
icode:ifun M1[PC]
rA:rB M1[PC+1]
valC M8[PC+2]
valP PC+10
valA R[rA]
valB R[rB]
valE valB + valC
Compute effective address
M8[valE] valA
Write value to memory
rmmovq rA, D(rB)
Read 10 bytes
Read operand registers
Executing Jumps
Compute both addresses
Choose based on setting of condition codes and branch condition
icode:ifun M1[PC]
valC M8[PC+1]
valP PC+9
instruction byte
Read destination address
Fall through address
Cnd Cond(CC,ifun)
Take branch?
PC Cnd ? valC : valP
fall thru:
SEQ Hardware
Blue boxes: predesigned hardware blocks
E.g., memories, ALU
Gray boxes: control logic
Describe in HCL
White ovals: labels for signals
Thick lines: 64-bit word values
Thin lines: 4-8 bit values
Dotted lines: 1-bit values
Background
Instruction set architecture
Logic design
Sequential implementation of processor
Simple but not fast
Pipelining
Run things simultaneously
Potential hazards caused by data/control dependencies
How to make it correct?
Computational Example
Computation: 300 picoseconds
Save result in register: 20 picoseconds
Must have clock cycle of at least 320 ps
Combinational
Delay = 320 ps
Throughput = 3.12 GIPS
3-Way Pipelined Version
Divide combinational logic into 3 parts of 100 ps each
Begin new instruction as soon as previous one completes stage A
Begin new instruction every 120 ps
Overall latency increases
360 ps from start to finish
Delay = 360 ps
Throughput = 8.33 GIPS
Pipeline Diagrams
Unpipelined
Cannot start new instruction until previous one completes
3-Way Pipelined
Up to 3 instructions running simultaneously
Pipeline Stages
current PC
Read instruction
Compute incremented PC
Read program registers
Operate ALU
Read or write data memory
Write Back
Update register file
Predicting the PC
Start fetching new instruction after current one has completed fetch stage
Not enough time to reliably determine next instruction
Guess which instruction will follow
Recover if prediction was incorrect
xorq %rax,%rax
irmovq $1, %rax # case 1
t: irmovq $3, %rdx # case 2
irmovq $4, %rcx
irmovq $5, %rdx
What are potential problems in a pipeline?
How to make a pipeline work correctly?
Data Dependencies
Some instruction depends on result from previous ones
Combinational
0x016: halt
# demo-h0.ys
Data Hazard in Pipeline
0x016: halt
# demo-h0.ys
E: execute
W: write back
Values in %rdx and %rax
have not been updated yet!
Incorrect operands
Solution: Stalling
0x000: irmovq $10,%rdx
0x00a: irmovq $3,%rax
0x014: addq %rdx,%rax
0x016: halt
# demo-h0.ys
W_dstE = %rax
srcA = %rdx
srcB = %rax
M_dstE = %rax
srcA = %rdx
srcB = %rax
e_dstE = %rax
srcA = %rdx
srcB = %rax
addq instruction in Cycle 4: detected data hazards (for source registers)
Inject a bubble, repeats decoding the addq instruction in cycle 5
Finally, the addq instruction
can be decoded in cycle 7
Branch Misprediction
After executing the instruction jne t, there are two branches:
What happens to the pipeline
if the branch cannot be predicted correctly?
0x000: xorq %rax,%rax
0x002: jne target # Not taken
0x00b: irmovq $1, %rax # Fall through
0x015: halt
0x016: target: irmovq $2, %rdx # Target
0x020: irmovq $3, %rbx
# demo-j.ys
0x016: target: irmovq $2, %rdx
0x020: irmovq $3, %rbx
0x00b: irmovq $1, %rax
0x015: halt
Handling Misprediction
Predict branch as taken
Fetch 2 instructions at target
Cancel when mispredicted
Detect branch not-taken in execute stage
On following cycle, replace instructions in execute and decode by bubbles
No side effects have occurred yet
/docProps/thumbnail.jpeg
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com