CS代考 EEE8087 1

Computer architecture: processors
Dr Fei Xia and Dr Alex Bystrov

Introduction to processors

Copyright By PowCoder代写 加微信 powcoder

• The brain of the computing system, meant to carry out the intended functionality, as and when needed.
A simplified view v1.0
Processed data
29/10/20 Architecture topics, EEE8087 1

Simplified View v2.0 – data types
Instructions
Processed data
Simplified example of an instruction:
29/10/20 Architecture topics, EEE8087 2

Functional view
Instruction / Data
Architecture topics, EEE8087

CPU Structure
Arithmetic and
Logic Units
Internal Interconnects
Control Unit
Computer I/O
System bus
Architecture topics, EEE8087

Control unit: data flow
Arithmetic and
Logic Units
Internal Interconnects
Control Unit
Computer I/O
System bus
Architecture topics, EEE8087

CPU control steps: data flow
• Fetch instructions
• Interpret instructions
• Fetch data
• Process data
• Write data
Fetch next instruction
Decode instruction
Architecture topics, EEE8087 6
Execute instruction
Simplified view

Data flow: execute
• Fetch and Decode are very common in all CPU architectures; however, Execute flow may take many forms
• Depends on instruction being executed
• May include
– Memoryread/write – Input/Output
– Registertransfers
– ALUoperations
Architecture topics, EEE8087 7

• Some architectures have this additional step to improve performance
• Can fetch next instruction during execution of current instruction (pipelining)
• Called instruction prefetch
• Prefetch can require accessing main memory
Instr. N+1
Execute (Pre-)
Architecture topics, EEE8087

Improved Performance through
• Prefetch offers good performance as it reduces the latency between CPU and the main memory
• But performance is not doubled: – Fetchusuallyshorterthanexecution
• Prefetch more than one instruction?
– Anyjumporbranchmeansthatprefetched
instructions are not the required instructions
• Add more stages or time multiplex the stages to improve performance
29/10/20 Architecture topics, EEE8087 9

Pipelining
• Detailed data flow
– Fetchinstruction
– Decodeinstruction
– Controloperandaddresses – Fetchoperands
– Executeinstructions – Writeresult
• Overlap these operations
29/10/20 Architecture topics, EEE8087 10

Timing of Pipeline – 6 stages
Architecture topics, EEE8087
• FI:fetchinstr.
• DI:decodeinstr.
• CO:controloperandaddress • FO:fetchoperands
• EI:executeinstructions
• WO:write-backoperands

Branch in a Pipeline
Architecture topics, EEE8087
Instruction 3 caused a branch to 15
Instructions 4-7 have stalls

Resource conflict stalls
Time (clock cycles)
I n s t r.
Load Instr 1
Instr 2 Instr 3
Apart from branching, it is possible to have stalls because of resource conflicts
Needs careful processor pipeline design with appropriate arbitration between streams
(eg. skip the cycle 4)
Architecture topics, EEE8087

Dealing with Branches
• Multiple Streams
• Prefetch Branch Target
• Loop buffer
• Branch prediction
• Delayed branching
29/10/20 Architecture topics, EEE8087 14

Prefetching branching target
Prefetch the branch instructions and store somewhere non-conflicting
Target of branch is prefetched in addition to instructions following branch
Keep target until branch is executed
Used as far back as the IBM 360/91
Architecture topics, EEE8087

Loop Buffer
• Often jump targets are a loop with sequence of instructions
• Very fast memory (IRs) stores these N Instructions in sequence
• The instructions in the loop can be pipelined
• Maintained by fetch stage of pipeline
• Check buffer before fetching from memory
• Very good for small loops or jumps
• Used by CRAY-1
29/10/20 Architecture topics, EEE8087 16

Branch Prediction (1)
• Predict never taken (pessimistic)
– Assumethatjumpwillnothappen
– Alwaysfetchnextinstruction
– Examples:68020&VAX11/780(manufacturedby DEC)
– Donotprefetchafterbranch
• Predict always taken (optimistic)
Architecture topics, EEE8087 17

mov r3, str r3, mov r3, str r3, b .L2
[fp, #-16] #0
[fp, #-20]
[fp, #-20] [fp, #-20] [fp, #-16] r3, r2 #207
r1, asl #2 fp, #12 r3, r1
r3, r2 [r3, #0] [fp, #-20] r3, #1 [fp, #-20]
[fp, #-20] #49
ldr r1, ldr r2, ldr r3, mul r0, mvn r2, mov r3, sub r1, add r3, add r3, str r0, ldr r3, add r3, str r3,
ldr r3, cmp r3, ble .L3 sub sp,
fp, #12 Architecltdumrefdtopsipc,s,{fEpE,Es8p0, 8p7c}
int a,b,c[50];
for( a= 0; a < 50; a++) c[a] = a * b; } Predict always jump has a 49/50 success rate and predict never jump has a 1/50 success rate Branch Prediction (2) • Predict by Opcode – Some instructions are more likely to result in a jump than others – For example COMPARE instructions – Can get up to 75% success • Taken/Not taken switch – Based on previous history (machine learning aided) – Good for loops • Delayed Branch – Do not take jump until you have to – Do all current in sequence until the jump instruction – Rearrange instructions Architecture topics, EEE8087 19 Speedup from pipelining • Ideally should equal to the number of pipelined stages (pipeline depth) Without pipelining, CPI is equal to the number of stages in Data Flow; assuming each stage requires 1 cycle (= Ideal CPI x Pipeline depth) CPI = clocks per instruction, ideally = 1 CPIpipelined =IdealCPI+AverageStallcyclesperInst Speedup = Ideal CPI ́ Pipeline depth Ideal CPI + Pipeline stall CPI ́ Cycle Timeunpipelined Cycle Timepipelined Architecture topics, EEE8087 Pipelined architecture examples ARM7TDMI – 3 stage pipeline Thumb®ARM decompress ARM decode Reg Select FETCH DECODE ARM9TDMI – 5 stage pipeline ARM or Thumb Inst Decode Reg Decode Architecture topics, EEE8087 Instruction Fetch Instruction Fetch FETCH DECODE EXECUTE Shift + ALU Memory Access Control unit: CPU types Arithmetic and Logic Units Internal Interconnects Control Unit Computer I/O Architecture topics, EEE8087 System bus architecture • “Princeton architecture” • Data and instructions share the same memory and memory interface with the CPU • Input and output may be on separate interconnects • Usually simplified to using a single bus for all data/instructions transfer • Most of classical and current systems belong to this to some degree 29/10/20 Architecture topics, EEE8087 Source: Kapooht Harvard architecture • Separate instruction and data memories connected to the processor’s control unit using separate interconnects • I/O share the same interconnect 29/10/20 Architecture topics, EEE8087 A bit of both • Modified Harvard architecture – Orsometimescalled“almostVonNeumannarchitecture” – MemoriesinsideandclosetoCPUaredividedinto instruction and data • Instruction registers and data registers • Instruction cache and data cache (usually L1 cache) • Connected with separate interconnects – MemoriesfurtherawayfromCPUareorganizedin fashion – ARMandIntelcurrenttechprocessorsusethis – Reviewpipelinestallswhenfetchclasheswithdata store (slide 13) Architecture topics, EEE8087 25 CISC and RISC • CISC: complex instruction set computer • RISC: reduced instruction set computer • Berkeley group coined the term RISC and made a CPU called RISC 1, soon after Stanford made a similar CPU called MIPS • SPARC also emerged from SUN • ARM has a range of RISC architectures • Early RISC CPUs had about 50 instructions compared to 200-300 common for CISC – The aim was to simplify CPU to process (and start) instructions faster Architecture topics, EEE8087 26 RISC philosophy • Instructions of fixed length executing in a single clock cycle • Pipelines to achieve one-instruction-per-one-clock-cycle throughput (need to predict branches in program flow in advance) • Simple control logic to increase clock speed, no micro-code • Operations performed on internal registers only; only LOAD and STORE instructions access external memory MIPS example: add $rd, $rs, $rt B31-26 opcode B25-21 B20-16 B15-11 B10-6 B5-0 function register s register t register d shift amount 29/10/20 Architecture topics, EEE8087 27 CISC characteristics • Binary compatibility – Oldbinarycodecanrunonnewerversions • Complex control logic to support many instructions • Use of micro-code – Oneprograminstructioncanexecuteinmanycycles • Variable-length instructions to save program • Small internal register sets compared with RISC • Complex addressing modes, operands can reside in external memory or internal registers 29/10/20 Architecture topics, EEE8087 28 A CISC versus RISC example 29/10/20 Architecture topics, EEE8087 29 One way of looking at it... • Runtime = clock-period x CPI x Ninstr • CISC tries to reduce the number of instructions – Fewer instructions to do more – Increased CPI – Complex CPU design (multi-mode registers, and multi-cycle executions) • RISC tries to reduce the clock cycles per instruction – less cycles-per-instr – more instructions – simpler CPU design • Obvious trade-offs can be seen! 29/10/20 Architecture topics, EEE8087 30 Another way of looking at it • CISC assembler code may be easier for human programmers to handle – Whenmanuallycoding • But is this advantage really relevant these days? 29/10/20 Architecture topics, EEE8087 31 程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com