Embedded Computer Systems
Dr. Sanghyuk Lee
Email: Dept. Mechatronics and Robotics
Copyright By PowCoder代写 加微信 powcoder
• Introduction to embedded systems • Modelling on the system
• Sensor and actuator
• Programming examples
• Embedded processors
• Memory architectures
• Input and output mechanism
Learning outcomes
• Discuss what is meant by an Embedded Computer System.
• Describe different types of embedded processors and their applications.
• Understand how parallelism relates to Embedded systems (timing, pipelines and parallel resources).
• Explain Memory Architectures and their importance in Embedded system design.
• Be able to understand the design issues facing an Embedded system designer with relation to input/output hardware and software.
References
Why is it important?
• The most visible use of computers and software is processing information for human consumption – write books, search, communicate etc.
• Most computers is use are much less visible – run engine, command robots Less visible computers are called embedded systems
The software they run is called embedded software
• Design of Cyber-Physical Systems(CPS)s requires understanding the joint dynamics of computers, software, networks, and physical processes
– mentioned by , in NSF application in US.
• Heart Surgery; surgical tools can robotically controlled so that they move with the motion of the heart, a stereoscopic video system can present to the surgeon a video illusion of a still heart
• City Traffic system; traffic light and cars cooperate to ensure efficient flow of traffic
• Flight refuse to crash; Soft wall by Lee(2001) track the location on the aircraft on which it is installed and prevent it from flying into obstacles such as mountain and building
Two networked platforms each with its own sensors and/or actuators.
• The action taken by the actuators affects the data provided by the sensors through the physical plant.
• Platform 2 controls the physical plant via Actuator 1. It measures the processes in the physical plant using Sensor 2.
• Computation 2 implements a control law, which determines based on the sensor data what commands to issue to
the actuator. Such a loop is called a feedback control loop.
• Platform 1 makes additional measurements using Sensor 1, and sends messages to Platform 2 via the network
• Computation 3 realizes an additional control law, which is merged with that of Computation 2, possibly preempting
Embedded Systems Construction
• Modelling
Continuous System/Discrete System/Hybrid system/State
Machine/Concurrent Models of Computation
Sensor and Actuator/Embedded Processor/Memory
Architecture/ Input and Output/ Multitasking/Scheduling
• Analysis
Invariant and Temporal Logic/Equivalent and Refinement/
Reachability Analysis/Quantative Analysis/ Security and Privacy
• Focusing on this Module; Design Analysis with Lab
Characteristics of Embedded Systems
• Concurrent
Composed of multi-tasking and/or distributed processes
• Communicating
Specialized processes communicate in order to achieve some overall system
• Real-time
Timing requirements are established by the environment
• Resource-constrained
Limited resources: processing, memory, peripherals, power, etc….
Discrete Model; example
Hybrid Model
• each state refinement has a clock, which is a continuous-time signal 𝑠𝑠̇ 𝑡𝑡 = 1.
• initial state, written as 𝑠𝑠 𝑡𝑡 = 𝑇𝑇𝑐𝑐 . • Temperature 𝜏𝜏(𝑡𝑡)
• 𝑠𝑠 𝑡𝑡 ≥ 𝑇𝑇h ensures that the heater will always be on for at least time 𝑇𝑇h
• 𝑠𝑠 𝑡𝑡 ≥ 𝑇𝑇𝑐𝑐 specifies that once the heater
goes off, it will remain off for at least time
Sensor Model; Quantification
• 𝑛𝑛 -bit sensor 2𝑛𝑛 distinct number (in continuous, it becomes infinite)
• Ex). 3 -bit sensor measure from 0 V to 1 V. It can be modelled as the function
f : R → {0, 1, … , 7}, L=0 and H=1, then 𝑝𝑝 = 1/8 • Dynamic range of decibel is
𝐷𝐷 =20log 𝐻𝐻−𝐿𝐿 ≅18𝑑𝑑𝑑𝑑
Sensor Model: Quantification
• Precision is given;
𝑝𝑝 = (𝐻𝐻−𝐿𝐿). 2𝑛𝑛
• And dynamic range with decibel
𝐷𝐷 =20log 𝐻𝐻−𝐿𝐿 =20log (𝐻𝐻−𝐿𝐿) =20log 2𝑛𝑛 ≅6𝑛𝑛𝑑𝑑𝑑𝑑 𝑑𝑑𝑑𝑑 10 𝑝𝑝 10 (𝐻𝐻 − 𝐿𝐿) 10
• Digitization has invoked distortion error(one bit is highest), but it can be avoided by fast sampling
Sensor Model: Noise and SNB
• Relation with actual measurement and pure information
𝑥𝑥′ 𝑡𝑡 =𝑥𝑥𝑡𝑡 +𝑛𝑛(𝑡𝑡)
𝑥𝑥′ 𝑡𝑡 is actual measurement, 𝑥𝑥 𝑡𝑡 is pure information.
Weget 𝑓𝑓 𝑥𝑥 𝑡𝑡 =𝑥𝑥 𝑡𝑡 +𝑛𝑛(𝑡𝑡)afterthroughsensor • Noise power 𝑁𝑁 = lim 1 ∫𝑇𝑇 𝑛𝑛(𝜏𝜏)2𝑑𝑑𝜏𝜏
𝑇𝑇→∞ 2𝑇𝑇 −𝑇𝑇
• Signal to Noise (SNR) definition in term of RMS 𝑆𝑆𝑁𝑁𝑆𝑆𝑑𝑑𝑑𝑑 = 20 log10
Microcontrollers
• A microcontroller (μC) is a small computer on a single integrated circuit.
• It consists of a relatively simple CPU combined with peripheral devices (memories, I/O devices, timers, etc).
− More than half the CPUs sold in the world are microcontrollers.
• A microcontroller is a small and low-cost computer built for the purpose
of dealing with specific tasks.
• Microcontrollers are mainly used in products that require a degree of control to be exerted by the user.
Microprocessor vs microcontroller
• Microprocessor is an IC which has only the CPU inside them
• i.e. only the processing powers such as Intel’s Pentium 1,2,3,4, core 2 duo, i3, i5 etc.
• These microprocessors don’t have RAM, ROM, and other peripheral on the chip.
• A system designer has to add them externally to make them functional.
• Applications of microprocessor includes Desktop PC’s, Laptops, notepads etc.
• But this is not the case with Microcontrollers.
• A Microcontroller has a CPU, in addition with a fixed amount of RAM, ROM and other peripherals all embedded on a single chip.
• At times it is also termed as a mini computer or a computer on a single chip.
• Today different manufacturers produce microcontrollers with a wide range of features available in different versions.
What is a microprocessor system?
Parallelism vs Concurrency
• Concurrent
A computer program is concurrent if different parts of the program conceptually execute simultaneously
• Parallel
A program is parallel if different parts of the program physically execute simultaneously on distinct hardware.
To program and return. Every thread can access
Access only to counter thread
Parallelism in Hardware
• The application does not (necessarily) demand multiple activities execute simultaneously
− it demands that things be done very quickly
• Of course, many other applications will combine both forms of concurrency, arising from parallelism and from application requirements.
• Here we will focus on hardware approaches to deliver parallelism
Pipelining Instruction-level parallelism Multicore architectures
• Later we will look at memory systems. These strongly influence how parallelism is handled.
Pipelining
The process of fetching next instruction while the current instruction is
being executed − Each instruction can be broken down into steps
Fetch instruction from memory (fetch)
Decode instruction (dec)
Access operands from register bank (reg)
Combine operands to form either a result or memory address (ALU)
Access memory to read or write data (mem)
Write result into the register bank (res)
Not all of these steps will be needed for some instructions.
These 6 steps can occur concurrently in a 6 stage pipeline.
Pipelined instruction execution
Pipelining allows more than one instruction to be executed at a time; but in different phases. In the diagram above
1. instruction 3 is being fetched
2. whilstinst.2isbeingdecoded
3. & operands for inst. 1 are being accessed from a register
Why Pipelining ? Fast Processing time
………………….
For example 100 commands , 4 stages; 4𝜇𝜇s. Total processing time is
………………….
For example 100 commands , 4 stages; 4𝜇𝜇s. Total processing time is 4 + 99 𝜇𝜇s.
ARM Instruction Pipelines
ARM processor developed by Advanced RISC
(reduced instruction set computer) Machines.
Extensively used in consumer electronic devices
(smartphones, tablets etc.).
− Require fewer transistors, enables smaller size
For the same basic speed of transistor operation, an n stage instruction pipeline allows the microprocessor to execute up to n times as many instructions in a given time.
The ARM7 core has a three stage pipeline whereas the ARM9 core has a five stage pipeline.
3 Stage Pipeline – ARM7
• In a three stage pipeline (e.g. ARM7) the CPU can simultaneously execute an instruction, decode the next instruction and fetch the next but one instruction.
Fetch: In this stage the ARM processor fetches instruction from the memory
Decode: In this stage recognizes the instruction that is to be executed
Execute: Processor processes the instruction and writes the result back to desired register
ARM7 3 Stage Pipeline: detail
• In each stage of the ARM7 pipeline several things happen; normally consecutively:
• If 3 stages of execution are overlapped, will achieve higher speed of execution (exists in ARM 7 processor)
Reservation table for the 5 stage pipeline
In cycle 5, E is being fetched while D is reading from the register bank, while C is using the ALU, while B is using reading from or writing data memory, while A is writing results to the register bank.
The write by A occurs in cycle 5, but the read by B occurs in cycle 3. The value that B reads will not be the value that A writes − data hazard
MOV R1, 10; R1=10 MOV R2, 20; R2=20 ADD R1, R2; R1=R1+R2 MOV R3, 30; R3=30
MOV R1, 10; R1=10 MOV R2, 20; R2=20 NOP; stall
ADD R1, R2; R1=R1+R2 MOV R3, 30; R3=30
………………… Stall More Stall invokes more processing time
Pipeline Hazards – data hazard
Many techniques have been developed by programmers in a variety ways to handle pipeline hazards
The simplest technique is known as an explicit pipeline.
• The pipeline hazard is documented and the compiler deals with it.
• For example where B reads a register written by A, the compiler will insert three no-op instructions (do nothing) between A and B.
− to ensure the write occurs before the read • No-op instructions form a pipeline bubble
Pipeline Hazards – data hazard Another method is to provide interlocks
• In this technique the instruction decode hardware will detect the hazard and delay the execution of B until A has completed the writeback stage (delayed by 3 cycles).
• Can be reduced to two cycles – complex forwarding logic
Interlocks therefore provide hardware that automatically inserts pipeline bubbles.
assuming that instruction B reads a register that is written by instruction A.
Instruction Level Parallelism (ILP)
• A processor supporting ILP is able to perform multiple independent operations in each instruction cycle.
• There are 4 major forms of ILP:
CISC instructions; complex instruction set computer/ RISC(reduced instruction set somputers)
Subword parallelism Superscalar
Complex Instruction Set Computer (CISC)
• Processor with complex instructions − CISC
• The philosophy behind such processors is different from that of RISC
(reduced instruction set computers) When do we use CISC machines?
• DSPs are CISC machines, include instructions supporting FIR filtering
• In fact to qualify as a DSP a processor must be able to perform FIR filtering in one instruction cycle per tap.
• Disadvantages: Extremely challenging for a compiler to make optimal use of such instruction set
− DSPs used with code libraries written and optimized in assembly language
RISC Vs CISC: An Example Multiplying Two Numbers in Memory
On the right is a diagram representing the storage scheme for a generic computer. The main memory is divided into locations numbered from (row) 1: (column) 1 to (row) 6: (column) 4.
The execution unit is responsible for carrying out all computations. However, the execution unit can only operate on data that has been loaded into one of the six registers (A, B, C, D, E, or F).
Let’s say we want to find the product of two numbers – one stored in location 2:3 and another stored in location 5:2 – and then store the product back in the location 2:3.
CISC Approach
• The primary goal of CISC architecture is to complete a task in as few lines of assembly as possible.
• This is achieved by building processor hardware that is capable of understanding and executing a series of operations. For this particular task, a CISC processor would come prepared with a specific instruction (we’ll call it “MULT”).
• When executed, this instruction loads the two values into separate registers, multiplies the operands in the execution unit, and then stores the product in the appropriate register. Thus, the entire task of multiplying two numbers can be completed with one instruction:
MULT 2:3, 5:2
• MULT is what is known as a “complex instruction.” It operates directly on the computer’s memory banks and does not require the programmer to explicitly call any loading or storing functions.
• One of the primary advantages of this system is that the compiler has to do very little work to translate a high-level language statement into assembly. Because the length of code is relatively short, very little RAM is required to store instructions.
• The emphasis is put on building complex instructions directly into the hardware.
RISC Approach
• RISC processors only use simple instructions that can be executed within one clock cycle. “MULT” command divided into 3 commands:
• “LOAD,” moves data from the memory bank to a register, “PROD,” which finds the product of two operands located within the registers, “STORE,” moves data from a register to the memory banks.
• In order to perform the exact series of steps described in the CISC approach, a programmer would need to code four lines of assembly.
RISC Vs CISC (Advantages)
• At first, the RISC approach may seem like a much less efficient way of completing the operation. Because there are more lines of code, more RAM is needed to store the assembly level instructions.
• The compiler must also perform more work to convert a high-level language statement into code of this form
Because each instruction requires only one clock cycle to execute, the entire program will execute in approximately the same amount of time as the multi-cycle “MULT” command.
RISC “reduced instructions” require less transistors of hardware space than the complex instructions, leaving more room for general purpose registers. Because all of the instructions execute in a uniform amount of time (i.e. one clock), pipelining is possible.
Memory Technologies
Volatile Memory • RAM
Non-Volatile Memory
• Flash memory
Introduction
Memory systems have a significant impact on overall system performance. • There are three main sources of complexity in memory
It is usually necessary to mix a variety of memory technologies in the same embedded system.
i. Some volatile and some non-volatile memory is required.
Memory hierarchy is needed.
i. Memories with larger capacity or lower power consumption are slower
ii. To achieve good performance there needs to be a mix of fast and slow memories.
The address space of a processor architecture is divided up to provide access to various kinds of memory.
i. To provide support for common programming models
ii. To designate addresses for interaction with non-memory devices(I/O).
Memory Hierarchy
Many applications require more memory than is available on-chip μC
Memory hierarchy – combines various memory technologies
• to increase overall memory capacity while optimizing cost, energy consumption • small amount of on-chip SRAM + large amount of off-chip DRAM + disk drives
Virtual memory makes technologies look to compiler address space The OS/hardware provides address translation
• Converts logical addresses in the address space to physical locations in one of memory technologies
These techniques can create serious problems
• because they make very difficult to predict how long memory accesses will take
Memory Maps
A memory map for a processor defines how addresses are mapped to hardware.
Total size of the address space is constrained by width of processor How many memory locations can be addressed by a 32-bit proc.?
Can address 232 locations, or 4 GB by assuming each address refers to one byte.
The address width matches the word width, except 8-bit processors − where the address width is typically higher (often 16 bits)
Register Files
The register file is tightly integrated memory in a processor. Each register stores a word.
• The size of the word is a key property of a processor architecture (four bytes for a 32-bit architecture, 8 bytes on 64-bit).
The number of registers in a processor is usually small. • This is to do with the bits in an instruction word.
If the register file has 16 registers, each reference to register requires 4 bits.
– If an instruction can refer to 3 registers, that requires a total of 12 bits.
I/O Hardware
Luminary Micro Stellaris microcontroller, which is an ARM CortexTM – M3 32-bit processor.
Embedded processors include I/O mechanisms on-chip
Single-board computer
Serial Interfaces
• Key constraints − small packages and low power consumption
• The number of pins on the processor integrated circuit is limited
− each pin must be used efficiently, wires must also be used efficiently − one way to use efficiently is to send information over them serially as
sequences of bits, such interface is called serial interface
• RS-232 – sender and receiver must agree on a transmission rate
− sender initiates transmission of a byte with a start bit
− sender then clocks out the sequence of bits at the agreed-upon rate − receivers clock resets upon receiving the start bit
Serial Interfaces
• RS-232 connection may be provided via DB-9 connector
• USB is electrically simpler than RS- 232, uses robust connectors
• JTAG (Joint Test Action Group) serial interface is widely implemented in embedded processors
Parallel Interfaces
• A serial interface sends or receives sequence of bits sequentially over a single line
• Parallel interface uses multiple lines to simultaneously send bits
− each line is also serial interface, but logical grouping makes it parallel
• With careful programming, a group of GPIO pins can be used together to realize a parallel interface
• Parallel interfaces deliver higher performance than serial interfaces − because more wires are used for the interconnection
• A bus is an interface shared among multiple devices
• Buses can be serial (USB) or parallel interfaces
• Any bus architecture must include media-access control (MAC) to arbitrate competing accesses
− MAC protocol has single bus master that interrogates bus slaves − USB uses such a mechanism
• Alternative is time-triggered bus, devices are assigned time slots during which they can transmit
• Token ring, devices must acquire token before they use shared medium
The End of Lecture
Review Class
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com