Introducti
er Architectur
2: Instruction Set
Architecture
Copyright By PowCoder代写 加微信 powcoder
1 (Martin/Roth)
: Instruction Set Architectures
• Chapter 2 • Further r
pendix C (RISC)
Available from web page
• The Evolution
and Appendix D
chnology at IBM
Much of this chapter will be “on your own reading”
• Hard to talk about ISA features without knowing what they do
1 (Martin/Roth)
many of these issu
: Instruction Set Architectures
Instruction Set
1 (Martin/Roth)
Gates & Transistors
: Instruction Set Architectures
Architecture (ISA)
What is a goo
pects of ISAs
• RISC vs.
• Implementing CISC:
1 (Martin/Roth)
(instruction s
• A well-define hardw
guarantees rega
perations are
• The “contract” between software and hardware • Functional definition of operations, modes,
locations supported by hardware Precise description of how
are implemented
perations take more power
: Instruction Set Architectures
are/software interface
fast and whic
invoke, and access
and storage
guage Analogy
• Allows commu • Language:
• Many common aspects
Part of spe Common o
ISA: hardware to soft
• Need to speak
• Many different languages/ISAs
Different structure
• Both evolve over
ey differences: ISAs
1 (Martin/Roth)
person to person
e same lan
ech: verbs, nouns, adjectives, adverbs, etc. perations: calculation, control/branch, memory
explicitly engine
: Instruction Set Architectures
, many similarities,
ered and extend
unambiguous
many differences
• Easy to exp
1 (Martin/Roth)
Programmability
y to design hi
• More recently
Compatibility
ress programs effic
gh-performan
design low-power implementations? design high-reliability implementations?
design low-cost implementations
: Instruction Set Architectures
• Easy to maintain programmability (imp and programs (technology) evolves?
• x86 (IA32) generations: 8086, 286, 386, 486, Pentium, PentiumIII, Pentium4,…
lementability)
PentiumII,
vs CISC Foresh
ecall perfor
• (instructi
C (Complex Instru
ons/progra
y for assembly-level progra
“cycles/instru
• Increases “instruction/progr • Help from smart compiler
• Perhaps improve
1 (Martin/Roth)
(cycles/instruction)
ction” with
cycle time
essive implementation allowed by s
: Instruction Set Architectures
Computing)
“instructions/program” with “complex” instructions
Computing)
* (seconds/cycle)
many single-cycle instru
but hopefu
(seconds/cycle)
not as much
impler instruc
Programmability
• Easy to express programs efficiently? • For whom?
• Before 1985: human
• Compilers were terrible, most code was hand-assembled • Want high-level coarse-grain instructions
• As similar to high-level language as possible
• After 1985: compiler
• Optimizing compilers generate much better code that you or I • Want low-level fine-grain instructions
• Compiler can’t tell if two high-level idioms match exactly or not
CIS 501 (Martin/Roth): Instruction Set Architectures 8
• Proximity to a high-level language (HLL) • Closing the “semantic gap”
• Semantically heavy
Example: SPARC save/restore
Bad example: x86 rep movsb (copy Ridiculous example: VAX insque (ins
• “Semantic
tranger than fiction
• People once thought compu
• Fortunately, never materialized (but
1 (Martin/Roth)
: Instruction Set Architectures
(CISC-like)
human to program in?
insns that
“loop”, “procedure
you have many hig
ters would
string) ert-into-
mplete idio
evel languages?
language directly
keeps coming back around)
day’s Semantic
• Today’s ISAs are targeted to one language… • Just so happens that this language is very low
e C programming language
Will ISAs be different when Java/C# become dominant? • Object-oriented? Probably not
• Support for
• Support for
1 (Martin/Roth)
garbage collection? Maybe
Smart compilers instructions
Any benefit of tailo
: Instruction Set Architectures
is likely sm
guages to simple
1 (Martin/Roth)
• Low level primitives
• Wulf: “primitives
• Requires
gularity: “princi
Orthogonality One-vs.-all
from which solutio not solutions”
good at breaking complex
Not so good at combining sim
: Instruction Set Architectures
• Requires search, pattern matching (why AI is hard) Easier to synthesize complex insns than to compare them
f least astonishment”
mposability
to program in?
can be syn
uctures to simple ones
ple structures into complex
Implementability
• Every ISA can be implemented
• Not every ISA can be implemented efficiently
• Classic high-performance implementation techniques
• Pipelining, parallel execution, out-of-order execution (more later)
• Certain ISA features make these difficult
– Variable instruction lengths/formats: complicate decoding – Implicit state: complicates dynamic scheduling
– Variable latencies: complicates scheduling
– Difficult to interrupt instructions: complicate many things
CIS 501 (Martin/Roth): Instruction Set Architectures 12
mpatibility
• Very imp
1 (Martin/Roth)
M’s 360/37
Backward compatibility
• New processors must support old
ardware… if it requires new software
• Intel was the first company to realize this
• ISA ust remain compatible, no matter what
the worst designed ISAs EVER, but survives
pward) com
processors must support new New processors redefine only
e to detect
rocessors emulate new inst
: Instruction Set Architectures
0 (the first
(can’t drop
programs (with software help) previously-illegal opcodes
r specific n
instructions
low-level soft
ompatibility’s
mpatibility
• Trap:instructionmakeslow-level“functioncall”toOShandler • Nop: “no operation” – instructions with no functional semantics
• Handle rare
ly used but hard
implement “legacy” opc
ine to trap in new implementation and emulate in software Rid yourself of some ISA mistakes of the past
• Add ISA hints
1 (Martin/Roth)
: performance suffers
compatibility
• Reserve sets of trap & nop opcodes (don’ • Add ISA functionality by overloading traps
firmware patch to “add”
: Instruction Set Architectures
erloading nop
old implementation
mpatibility
asy compatibility r
• Temptation: use • Frequent outcom
1 (Martin/Roth)
equires forethought
some ISA extension for 5% performance gain e: gain diminishes, disappears, or turns to loss
continue to support gadget fo
• Example: register windows (SPARC)
Adds difficulty to out-of- Details shortly
: Instruction Set Architectures
implementations of SPARC
Aspects of ISAs
• VonNeumannmodel
• Implicit structure of all modern ISAs
• Length and encoding
• Operand model
• Where (other than memory) are operands stored?
• Datatypes and operations
• Overview only
• Read about the rest in the book and appendices
CIS 501 (Martin/Roth): Instruction Set Architectures 16
Write Output
Sequential Model
Basic • Def
called VonNeum
Value flows from insn
all modern
as output,
an, but in ENIAC befo
s A as inpu
feature: the program counter (PC) ines total order on dynamic instruction Next PC is PC++ unless insn says otherwise
Read Inputs
define computation
1 (Martin/Roth)
: Instruction Set Architectures
d Y after X
Processor logically executes loop at
• Instruction execution assumed atomic
• Instruction X finishes before insn X+1 starts
been proposed…
Example: M
• 3 formats, sim
• Q: how many instru
1 (Martin/Roth)
IPS Format
ple encoding
ctions can
Rs(5) Rt(5)
Rs(5 Rt(5)
: Instruction Set Architectures
encoded? A: 127
Rd(5) Sh(5)
• Fixed len • Most
1 (Martin/Roth)
common is 32 bits
Simple implementatio
• Variable length
– Complex impl
density: 32 bits
• Compromise: two leng
: Instruction Set Architectures
can do this in one 8-bit instruction
ementation
ute next PC usin
y decoder impleme
egister by 1?
Operand Model: Memory Only
• Where (other than memory) can operands come from? • And how are they specified?
• Example:A=B+C
• Several options
• Memory only
add B,C,A mem[A] = mem[B] + mem[C]
MEM CIS 501 (Martin/Roth): Instruction Set Architectures 20
load B add C
1 (Martin/Roth)
Accumulator: implicit single ele
: Instruction Set Architectures
ACC = mem[B]
ACC = A mem[A]
CC + mem[C] = ACC
store R1,A
oad-store: GPR and only l
store R1,A
1 (Martin/Roth)
add R1,R2,R1
: Instruction Set Architectures
er: multiple e
R1 = mem[B]
R1 = R1 + mem[C] mem[A] = R1
oads/stores access me
R1 = mem[B] R2 = mem[C]
R1 = R1 + R2
mem[A] = R1
it accumula
Stack: TOS
1 (Martin/Roth)
implicit in instru
: Instruction Set Architectures
stk[TOS++]
stk[TOS++] stk[TOS++]
em[A] = stk[–TOS]
= mem[C] = stk[–TO
+ stk[–TO
Operand Model Pros and Cons
• Metric I: static code size
• Number of instructions needed to represent program, size of each • Want many implicit operands, high level instructions
• Good ! bad: memory, accumulator, stack, load-store
• Metric II: data memory traffic
• Number of bytes move to and from memory
• Want as many long-lived operands in on-chip storage • Good ! bad: load-store, stack, accumulator, memory
• Metric III: cycles per instruction
• Want short (1 cycle?), little variability, few nearby dependences • Good ! bad: load-store, stack, accumulator, memory
• Upshot: most new ISAs are load-store or hybrids
CIS 501 (Martin/Roth): Instruction Set Architectures 24
How Many Registers?
• One reason
• Small is • Another is t
– More reg
• Upshot: trend • 64-bit x86
1 (Martin/Roth)
s faster tha
Fewer registe
es, arrays,
registers are faster is th fast (hardware truism)
hat they are directly addressed (
rs per instr
hough compilers are getting better
means more savi
have as many as possible?
r specifiers
uction or indirect addressin
put in registers
ng/restoring
to more registers: 8 (x86)!32 (MIPS) !128 (IA32) has 16 64-bit integer and 16 128-bit FP registers
: Instruction Set Architectures
putting more
1 (Martin/Roth)
• Support me
Address Size
• Alternative (wrong)
• Most critical, inescapable
processor?
irtual address size
• Determines size of addressable (usa
size of 2n
Will limit the
uire nasty hacks
• x86 evolution:
• 4-bit (4004), 8-bit (8008), 16-bit (8086),
• 32-bit + protected memory (80386)
• 64-bit (AMD’s Opteron & Intel’s EM64T Pen
: Instruction Set Architectures
32-bit or 64-bit address spaces
e of calculation operations
ble) memory
t already at) 64 bits
ISA design decision
(E.g., x86 segments)
Global regi
• Sun SPARC om the RISC I) • 32 integer registers divided
• Explicitsave/restoreinst
hardware activation
into: 8 global, 8 ructions
restore: locals zeroed, inputs ! outputs, inputs “popped” Hardware stack provides few (4) on-chip register frames
d-to/filled-from me
omatic param
– Hidden memory operations (some restores fast, o – A nightmare for register renaming (more later)
1 (Martin/Roth)
: Instruction Set Architectures
caller-saved registers
traffic on shallow (<4 deep) call graphs
8 input, 8
locals zeroed
Memory Addressing
• Addressing mode: way of specifying address
• Used in memory-memory or load/store instructions in register ISA
• Examples
• Register-Indirect: R1=mem[R2]
• Displacement: R1=mem[R2+immed]
• Index-base: R1=mem[R2+R3]
• Memory-indirect: R1=mem[mem[R2]]
• Auto-increment: R1=mem[R2], R2= R2+1
• Auto-indexing:R1=mem[R2+immed],R2=R2+immed • Scaled: R1=mem[R2+R3*immed1+immed2]
• PC-relative: R1=mem[PC+imm]
• What high-level program idioms are these used for?
CIS 501 (Martin/Roth): Instruction Set Architectures 28
Example: M
I-type instructions: • Is 16-bits enough?
IPS Addressing
plements only displacement
• Why? Experiment on VAX (I • Disp: 61%, reg-ind: 19%
small displacem
SA with every mode) found , scaled: 11%, mem-ind: 5%,
ent or registe
• Yes? VAX experiment showed 1% accesses use displacem
r indirect (displacement 0)
distribution other: 4%
1 (Martin/Roth)
Rs(5) Rt(5)
Reg+Reg mode
: Instruction Set Architectures
ntrol Instructi
One issue: testing
• Option I: compare and branch insns branch-less-than R1,10,target
1 (Martin/Roth)
• Option II: implicit condition
branch-neg
ndition codes
• Option III: condition
for conditio
wo ALUs: one for c
set-less-than R2,R1,10
branch-not-equal-zero R2,target
ditional instru
: Instruction Set Architectures
ctions, + o
registers, separate
ondition, one
sets “negative”
ALU per, + explicit
for target address
branch insns
e is tricky
dependence
ccess alig
address % si
• Aligned: loa
• Unaligned
load-half @XX
• Question:
(uncommon case)?
Support in hardware? M
akes all accesses slow
Trap to software routine? Possibility
• Load, shift
, load, shift,
MIPS? ISA support: unaligned
access using two instructio
lwl @XXXX10; lwr
• Big-endian: sensibl
rder (e.g., MIPS, PowerPC)
integer: “00000000 00000000
00000010 00000011
• Little-e
integer: “00000011 00000010
00000000 00000000
• Why little
endian? To
different? To be annoying?
Nobody knows
1 (Martin/Roth)
: Instruction Set Architectures
Example: MIPS Conditional Branches
• MIPS uses combination of options II/III • Compare2registersandbranch:beq,bne
• Equality and inequality only
+ Don’t need an adder for comparison
• Compare1registertozeroandbranch:bgtz,bgez,bltz,blez
• Greater/less than comparisons
+ Don’t need adder for comparison
• Setexplicitconditionregisters:slt,sltu,slti,sltiu,etc.
• More than 80% of branches are (in)equalities or comparisons to 0 • OK to take two insns to do remaining branches (MCCF)
CIS 501 (Martin/Roth): Instruction Set Architectures 32
ntrol Instructi
• Option I: PC-relative
• Used for branches • Option II: Absolute
• Position independent outside procedure • Used for procedure calls
• Option III: Indir
• Howfardo
1 (Martin/Roth)
computing targets
on-independe
within proced
ect (target found in regi
for jumping to dynamic targets
for returns
ally not so far within a procedure (t er from one procedure to another
: Instruction Set Architectures
ic procedure calls, switches
hey don’t get
ntrol Instructi
• Implicit r
• Link (remember)
• Directjump-and-link:jal • Indirectjump-and-link:jal
1 (Martin/Roth)
support for
address register is
: Instruction Set Architectures
calling insn + 4
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com