CS代写 MEC302 Embedded Computer Systems

MEC302 Embedded Computer Systems
Parallelism: Pipelining
Types of Processors
􏰀 Microprocessors and Microcontrollers 􏰀 DSP Processors

Copyright By PowCoder代写 加微信 powcoder

􏰀 Graphics Processors
Dr. Sanghyuk Lee
Parallelism
􏰀 Parallelism vs Concurrency
􏰀 Pipelining
􏰀 Instruction-Level Parallelism 􏰀 Multicore Architectures

Concurrency in Embedded Systems
• Embedded programs interact with physical processes and many activities progress at the same time.
• An embedded program often needs to monitor and react to multiple concurrent sources of stimulus and simultaneously control multiple output devices.
• Embedded programs are almost always concurrent programs.
− Timeliness matters: actions in the physical world should be performed on
right time
• Imperative and concurrent programs can be executed both sequentially and in parallel

Parallelism in Hardware
• The application does not (necessarily) demand multiple activities execute simultaneously
− it demands that things be done very quickly
• Of course, many other applications will combine both forms of concurrency, arising from parallelism and from application requirements.
• Here we will focus on hardware approaches to deliver parallelism
􏰀Pipelining 􏰀Instruction-level parallelism 􏰀Multicore architectures
• Later we will look at memory systems. These strongly influence how parallelism is handled.

Pipelining
􏰀 The process of fetching next instruction while the current instruction is
being executed − Each instruction can be broken down into steps
􏰁 Fetch instruction from memory (fetch)
􏰁 Decode instruction (dec)
􏰁 Access operands from register bank (reg)
􏰁 Combine operands to form either a result or memory address (ALU)
􏰁 Access memory to read or write data (mem)
􏰁 Write result into the register bank (res)
􏰀 Not all of these steps will be needed for some instructions.
􏰀 These 6 steps can occur concurrently in a 6 stage pipeline.

Pipelined instruction execution
Pipelining allows more than one instruction to be executed at a time; but in different phases. In the diagram above
1. instruction 3 is being fetched
2. whilstinst.2isbeingdecoded
3. & operands for inst. 1 are being accessed from a register

Why Pipelining ? 􏰀 Fast Processing time
………………….
For example 100 commands , 4 stages; 4𝜇𝜇s. Total processing time is
………………….
For example 100 commands , 4 stages; 4𝜇𝜇s. Total processing time is 4 + 99 𝜇𝜇s.

ARM Instruction Pipelines
􏰀 ARM processor developed by Advanced RISC
(reduced instruction set computer) Machines.
􏰀 Extensively used in consumer electronic devices
(smartphones, tablets etc.).
− Require fewer transistors, enables smaller size
􏰀 For the same basic speed of transistor operation, an n stage instruction pipeline allows the microprocessor to execute up to n times as many instructions in a given time.
􏰀 The ARM7 core has a three stage pipeline whereas the ARM9 core has a five stage pipeline.

3 Stage Pipeline – ARM7
• In a three stage pipeline (e.g. ARM7) the CPU can simultaneously execute an instruction, decode the next instruction and fetch the next but one instruction.
Fetch: In this stage the ARM processor fetches instruction from the memory
Decode: In this stage recognizes the instruction that is to be executed
Execute: Processor processes the instruction and writes the result back to desired register

ARM7 3 Stage Pipeline: detail
• In each stage of the ARM7 pipeline several things happen; normally consecutively:
• If 3 stages of execution are overlapped, will achieve higher speed of execution (exists in ARM 7 processor)

5 Stage Pipeline
􏰀Later processors use a 5 stage pipeline (this is more common now!)
􏰀 Patterson, D. A. and J. L. Hennessy, 1996:

Summary: FETCH and DECODE
􏰀In the fetch stage the program counter provides an address to the instruction memory.
• The instruction memory provides encoded instructions (for ARM 7 32 bit)
• In the fetch stage the program counter is incremented by 4 bytes to become the address of the next instruction (unless a conditional branch instruction is providing an entirely new address for the Program counter)
􏰀The decode pipeline stage extracts register addresses from the 32- bit instruction and fetches the data in the specific registers from the register bank.

Summary: Execute, memory and writeback
• The execute pipeline stage operates on the data fetched from the registers (or on the PC for a computed branch) using the ALU.
• The memory pipeline stage reads or writes to a memory location given by a register.
• The writeback pipeline stage stores results in a register file.

Reservation table for the 5 stage pipeline
􏰀 In cycle 5, E is being fetched while D is reading from the register bank, while C is using the ALU, while B is using reading from or writing data memory, while A is writing results to the register bank.
The write by A occurs in cycle 5, but the read by B occurs in cycle 3. The value that B reads will not be the value that A writes − data hazard

MOV R1, 10; R1=10 MOV R2, 20; R2=20 ADD R1, R2; R1=R1+R2 MOV R3, 30; R3=30
MOV R1, 10; R1=10 MOV R2, 20; R2=20 NOP; stall
ADD R1, R2; R1=R1+R2 MOV R3, 30; R3=30
………………… Stall More Stall invokes more processing time

Pipeline Hazards – data hazard
􏰀 Many techniques have been developed by programmers in a variety ways to handle pipeline hazards
􏰀 The simplest technique is known as an explicit pipeline.
• The pipeline hazard is documented and the compiler deals with it.
• For example where B reads a register written by A, the compiler will insert three no-op instructions (do nothing) between A and B.
− to ensure the write occurs before the read • No-op instructions form a pipeline bubble

Pipeline Hazards – data hazard 􏰀 Another method is to provide interlocks
• In this technique the instruction decode hardware will detect the hazard and delay the execution of B until A has completed the writeback stage (delayed by 3 cycles).
• Can be reduced to two cycles – complex forwarding logic
Interlocks therefore provide hardware that automatically inserts pipeline bubbles.
assuming that instruction B reads a register that is written by instruction A.

Pipeline Hazards – data hazard
􏰀Out-of-order execution
• A hardware is provided that detects a hazard but instead of simply delaying the execution of B, proceeds to fetch C, and if C does not read registers written by A or B, and does not write registers read by B, then proceeds to execute C before B.
• This further reduces the number of pipeline bubbles(stall)

Pipeline Hazards – control hazard
􏰀 A conditional branch instruction changes the value of the program counter (PC) if a specified register has value zero.
􏰀 The new value of PC is provided by the result of an ALU operation.
􏰀 In this case, if A is a conditional branch instruction, then it has to
have reached the memory stage before the PC can be updated.
􏰀 The instructions following A in memory will have been fetched and be at the decode and execute stages before it is determined that they should not be executed.

• Pipelines only work properly when instructions are stored in memory at consecutive addresses (0x00008000, 8001, 8002…) or regularly spaced addresses (0x00008000, 8004, 8008..).
• Generally this is the case except when the instruction fetched is a branch (or jump).
• The purpose of a branch is to take the program to a different part of memory.
• For a pipeline this means losing a number of instructions that have been fetched but do not need to be executed.
• This results in a ‘pipeline flush’.

Pipeline flush on branch
For any given instruction that the CPU must process, there are multiple stages of processing, called instruction or machine cycles. These stages include fetching the instruction
from memory, and decoding
and executing the commands. CPUs pipeline their instructions, which means multiple instructions can be in different stages of the machine cycle at any given time.
In such a case, the CPU may need to clear (or “flush”) the instruction pipeline to ensure the calculations are not corrupted by the pipelining process.

Pipeline Hazards – control hazard
• Delayed Branch
• Documents the branch will be taken some time after it is encountered. The compiler will ensure that the instructions that follow are harmless.
• Interlock
• Hardware solution – insert pipeline bubbles as needed
• Speculative execution
• The hardware estimates whether a branch is likely to be taken and
begins executing instructions it expects to execute.
• If its expectation is not met then it undoes any side effect.

Why do pipeline hazards cause problems?
• Timing may be very important and the techniques used can introduce variability in the timing of execution of an instruction sequence.
• Analysis of the timing of a program is difficult when there is a deep pipeline with very elaborate forwarding and speculation.
• Explicit pipelines and delayed branches usage:
• For example DSP processors are often used in applications where precise timing is essential. They normally use explicit pipelines.
• Out-of-order and speculative execution are common in general- purpose processors, where precise timing is not so critical.

The End of Lecture
Parallelism: Pipelining

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com