CS代考 ID 15 (0xF) indicates “no register”

Formal Processor Verification

Processor Architecture

Copyright By PowCoder代写 加微信 powcoder

Acknowledgement: These slides are based on the textbook
(Computer Systems: A Programmer’s Perspective) and its slides.

Background
Instruction set architecture
Logic design

Sequential implementation of processor
Simple but not fast

Pipelining
Run things simultaneously
Potential hazards caused by data/control dependencies
How to make it correct?

Instruction Set Architecture (ISA)
Assembly Language View
Processor state
Registers, memory, …
Instruction set: Y86-64
A simplified version of the Intel x86-64
How to encode instructions
(e.g., addq, pushq, …) as bytes?

Layer of Abstraction
Layers above ISA: how to use instructions?
Layers below ISA: what needs to be built?
How to make it run fast?
E.g., execute multiple instructions simultaneously
Application

Y86-64 Processor State
Program Registers
15 registers (omit %r15), each 64 bits
Condition Codes
Single-bit flags set by arithmetic or logical instructions
ZF: Zero SF: Negative OF: Overflow
Program Counter
Indicates address of next instruction
Program Status
Indicates either normal operation or some error condition
Byte-addressable storage array
Words stored in little-endian byte order
RF: Program registers
CC: Condition codes

DMEM: Memory

Stat: Program status

Y86-64 Instruction Set

cmovXX rA, rB

irmovq V, rB

rmmovq rA, D(rB)

mrmovq D(rB), rA

OPq rA, rB

Y86-64 Instructions
Format of instruction
Size: 1–10 bytes
The first byte is the instruction type;
it can also be used to determine the instruction length
Each instruction may read or write some part(s) of the program state

Encoding Registers
Each register has 4-bit ID

Register ID 15 (0xF) indicates “no register”
Used to indicate “no register” in some instructions

No Register

Instruction Example
Addition Instruction

Add value in register rA to that in register rB
Store result in register rB
Note: Y86-64 only allows addition on register data
Set condition codes based on result
e.g., addq %rax,%rsi Encoding: 60 06
Two-byte encoding
The first byte indicates instruction type
The second byte gives source and destination registers
addq rA, rB

Encoded Representation

Generic Form

Arithmetic and Logical Instructions
Refer to generically as “OPq”
Encodings differ only by “function code”
Low-order 4 bits in first instruction word
Set condition codes as side effect

addq rA, rB

subq rA, rB

andq rA, rB

xorq rA, rB

Subtract (rA from rB)
Exclusive-Or

Instruction Code

Function Code

Move Instructions
They have different names based on different operand types (e.g., register or memory)
Much simpler than the movq instruction in x86-64

rrmovq rA, rB

Register  Register
Immediate  Register
irmovq V, rB

Register  Memory
rmmovq rA, D(rB)

Memory  Register
mrmovq D(rA), rB

Move Instruction Examples:
X86-64 (Intel) vs. Y86-64 (simplified)
irmovq $0xabcd, %rdx
movq $0xabcd, %rdx
30 f2 cd ab 00 00 00 00 00 00
rrmovq %rsp, %rbx
movq %rsp, %rbx
mrmovq -12(%rbp),%rcx
movq -12(%rbp),%rcx
50 51 f4 ff ff ff ff ff ff ff
rmmovq %rsi,0x41c(%rsp)
movq %rsi,0x41c(%rsp)
40 64 1c 04 00 00 00 00 00 00

little endian

Logic Design

Digital Signals and Logic Gates
Logic gates:
Outputs are Boolean functions of inputs
Respond continuously to inputs
With some delay

Digital signal:
Use voltage thresholds to represent discrete values
1-bit signal: either high (1) or low (0)
With guard range between them
The definitions of these functions can be found in lecture 2

Example: Bit Equality
Generate 1 if a and b are equal

Hardware Control Language (HCL)
Simple hardware description language
Boolean operations have syntax similar to C logical operations

bool eq = (a&&b)||(!a&&!b)
HCL Expression

Example: Word Equality (64-bit)

Word-Level Representation
bool Eq = (A == B)
HCL Representation

Example: Word Multiplexor (64-bit)
Select input word A or B depending on control signal s
Word-Level Representation
HCL Representation

int Out = [

Arithmetic Logic Unit
The control signal selects the function to be computed
Corresponding to 4 arithmetic/logical instructions in Y86-64

ALU also computes condition codes

[Exercise] Draw the circuit diagram for X+Y

Background
Instruction set architecture
Logic design

Sequential implementation of processor
Simple but not fast

Pipelining
Run things simultaneously
Potential hazards caused by data/control dependencies
How to make it correct?

Building Blocks
Combinational Logic
Compute Boolean functions of inputs
Continuously respond to inputs

Storage Elements
Addressable memories
Loaded only as clock rises

SEQ Hardware Structure
Program counter register (PC)
Condition code register (CC)
Register File
Data memory: for reading/writing data
Instruction memory: for reading instructions

Instruction Flow
Read instruction at address specified by PC
Process through stages
Update program counter

Instruction

Instruction

Write back

SEQ Stages
instruction from instruction memory
Read operands from registers
Compute value or address
Read or write data
Write Back
Write to registers
Update program counter

Instruction

Instruction

Write back

Instruction Decoding
Instruction Format
Instruction byte icode:ifun
Optional register byte rA:rB
Optional constant word valC

Computed Values
(to be used in the following slides)
icode Instruction code
ifun Instruction function
rA Instr. Register A
rB Instr. Register B
valC Instruction constant
valP Incremented PC
srcA Register ID A
srcB Register ID B
dstE Destination Register E
dstM Destination Register M
valA Register value A
valB Register value B

valE ALU result
Cnd Branch/move flag
valM Value from memory

Executing Arith. / Logical Instruction
OPq rA, rB
icode:ifun  M1[PC]
rA:rB  M1[PC+1]

valP  PC+2

valA  R[rA]
valB  R[rB]

valE  valB OP valA

R[rB]  valE

Write back result

OPq rA, rB

Read 2 bytes

Read operand registers

Perform operation
Set condition codes

Executing Move Instruction: rmmovq
(by using ALU)
rmmovq rA, D(rB)
icode:ifun  M1[PC]
rA:rB  M1[PC+1]
valC  M8[PC+2]
valP  PC+10
valA  R[rA]
valB  R[rB]

valE  valB + valC

Compute effective address

M8[valE]  valA
Write value to memory

rmmovq rA, D(rB)

Read 10 bytes

Read operand registers

Executing Jumps
Compute both addresses
Choose based on setting of condition codes and branch condition
icode:ifun  M1[PC]

valC  M8[PC+1]
valP  PC+9

instruction byte

Read destination address
Fall through address

Cnd  Cond(CC,ifun)

Take branch?

PC  Cnd ? valC : valP

fall thru:

SEQ Hardware

Blue boxes: predesigned hardware blocks
E.g., memories, ALU
Gray boxes: control logic
Describe in HCL
White ovals: labels for signals
Thick lines: 64-bit word values
Thin lines: 4-8 bit values
Dotted lines: 1-bit values

Background
Instruction set architecture
Logic design

Sequential implementation of processor
Simple but not fast

Pipelining
Run things simultaneously
Potential hazards caused by data/control dependencies
How to make it correct?

Computational Example
Computation: 300 picoseconds
Save result in register: 20 picoseconds
Must have clock cycle of at least 320 ps
Combinational

Delay = 320 ps
Throughput = 3.12 GIPS

3-Way Pipelined Version
Divide combinational logic into 3 parts of 100 ps each
Begin new instruction as soon as previous one completes stage A
Begin new instruction every 120 ps
Overall latency increases
360 ps from start to finish

Delay = 360 ps
Throughput = 8.33 GIPS

Pipeline Diagrams
Unpipelined

Cannot start new instruction until previous one completes

3-Way Pipelined

Up to 3 instructions running simultaneously

Pipeline Stages
current PC
Read instruction
Compute incremented PC
Read program registers
Operate ALU
Read or write data memory
Write Back
Update register file

Predicting the PC
Start fetching new instruction after current one has completed fetch stage
Not enough time to reliably determine next instruction
Guess which instruction will follow
Recover if prediction was incorrect

xorq %rax,%rax
irmovq $1, %rax # case 1
t: irmovq $3, %rdx # case 2
irmovq $4, %rcx
irmovq $5, %rdx

What are potential problems in a pipeline?

How to make a pipeline work correctly?

Data Dependencies
Some instruction depends on result from previous ones

Combinational

0x016: halt
# demo-h0.ys

Data Hazard in Pipeline

0x016: halt

# demo-h0.ys

E: execute
W: write back
Values in %rdx and %rax
have not been updated yet!
Incorrect operands

Solution: Stalling
0x000: irmovq $10,%rdx
0x00a: irmovq $3,%rax
0x014: addq %rdx,%rax
0x016: halt
# demo-h0.ys

W_dstE = %rax
srcA = %rdx
srcB = %rax
M_dstE = %rax
srcA = %rdx
srcB = %rax
e_dstE = %rax
srcA = %rdx
srcB = %rax

addq instruction in Cycle 4: detected data hazards (for source registers)
Inject a bubble, repeats decoding the addq instruction in cycle 5
Finally, the addq instruction
can be decoded in cycle 7

Branch Misprediction
After executing the instruction jne t, there are two branches:

What happens to the pipeline
if the branch cannot be predicted correctly?
0x000: xorq %rax,%rax
0x002: jne target # Not taken
0x00b: irmovq $1, %rax # Fall through
0x015: halt
0x016: target: irmovq $2, %rdx # Target
0x020: irmovq $3, %rbx
# demo-j.ys
0x016: target: irmovq $2, %rdx
0x020: irmovq $3, %rbx
0x00b: irmovq $1, %rax
0x015: halt

Handling Misprediction
Predict branch as taken
Fetch 2 instructions at target
Cancel when mispredicted
Detect branch not-taken in execute stage
On following cycle, replace instructions in execute and decode by bubbles
No side effects have occurred yet

/docProps/thumbnail.jpeg

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com