02 Computer Evolution and Performance
COMP228: System Hardware
Tutorial Chapter #04: MARIE
General Purpose Register (GPR)
Architecture
Its functional units are:
Data Registers: D0, D1, D2,…, D7 for arithmetic operations – holds
any kind of data
Address Registers: A0, A1, A2,…, A7 serve as pointers to memory
addresses
Working Registers: several such registers – serve as scratch pads for
CPU
Program Counter (PC) holding the address in memory of the next
instruction to be executed. After an instruction is fetched from
memory, the PC is automatically incremented to hold the address of,
or point to, the next instruction to be executed.
Instruction Register (IR) holds the most recently read instruction from
memory while it is being decoded by the Instruction Interpreter.
Memory Address Register (MAR) holds the address of the next location
to be accessed in memory.
Memory Buffer Register (MBR or MDR) holds the data just read from
memory, or the data which is about to be written to memory. Buffer is
referring to temporarily holding data.
Status Register (SR) to record status information
GPR CPU
MBR
MAR
PC
Increment
Interpre
ter
IR
Register
File
0
1
2
3
ALU
CPU Memory
Data bus
Address bus
Memory
Control
16 bit
8 bit
Figure 04.F08: MARIE’s Architecture-
MARIE
Figure 04.F09: Datapath in MARIE
–
MARIE
12 bits
12 bits
16 bits
16 bits
8 bits
16 bits
8 bits
BUS Protocol example
Asynchronous Buses using complex handshaking protocol
Instruction Execution Time
Clock Cycles (P) – regular time
intervals defined by the CPU clock
Clock Frequency (clock Rate), R = 1/P
cycles per second (Hz)
500 MHz => P = 2ns
1.25 GHz => P = 0.8ns
Micro Step Number of Clock Cycles
Register Transfer 1
Decoding 2
Add 2
Multiply 5
Memory Access 10
For each instruction:
Fetch: Total 12 clock cycles
MAR PC 1
MDR M[MAR] 10
IR MBR 1
Decode: 2 clock cycles
Execute: depends on instruction
Exercise 1
What are the main functions of the CPU?
Ans.
The CPU is responsible for fetching program instructions,
decoding each instruction that is fetched and performing the
indicated sequence of operations on the correct data.
How is the ALU related to the CPU? What are its main functions?
Ans.
The ALU is part of the CPU. It carries out arithmetic operations
(typically only integer arithmetic) and can carry out logical
operations such as AND, OR, and XOR, as well as shift
operations.
Central Processing Unit (CPU)
• CPU is the heart and brain
• It interprets and executes machine level
instructions
• Controls data transfer from/to Main Memory
(MM) and CPU
• Detects any errors
Memory
• Smallest unit of storage is a Bit
• However, smallest addressable unit is a
Byte (8 bits)
bit 7 6 5 4 3 2 1 0
msb lsb
• Most computers permit access of memory
through words (16 bits, 32 bits or 64 bits)
bits 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
MSByte LSByte
0 1 0 1 1 0 1 1
1 1 0 1 0 0 1 1 0 1 0 1 1 0 1 1
Figure 04.F04: a) N 8-Bit Memory Locations b) M 16-Bit Memory
Locations
–
Memory
Main Memory
•
Main
Memory
CPU
k-bit
n-bit
Address
bus
Data bus
Main Memory
• Internally data is always represented in binary, although Hex is
more readable
Main
Memory
CPU
Address bus
Data bus
Read operation
1
1
1
1
1
1
0
1
$FD
0 1 0 0 0 0 1 0 $FD location
0
1
0
0
0
0
1
0
Memory Addressing
• Successive addresses refer to
successive byte locations in memory.
• Byte locations have addresses 0, 1, 2,
….
• If word length of the machine is 16 bits,
successive words are located at
addresses 0, 2, 4, ….(these even
addresses are also called word
boundary)
• If word length of the machine is 32 bits
(long word), successive words are
located at addresses 0, 4, 8, ….
• Words must be accessed at their word
boundaries, otherwise exception occurs
• Some machines allow long words to be
accessed at even addresses – address 0
for bytes at locations 0,1,2,3 – address
2 for bytes at locations 2,3,4,5
0
1
2
3
4
5
6
7
8
Word
length
16 bits
Word
length
32 bits
8 bits 8 bits
Big-Endian and Little-Endian
Big-Endian:
• Lower memory address correspond to MSByte
• Address of word is defined as address of MSByte
Little-Endian:
• Lower memory address correspond to LSByte
• Address of word is defined as address of LSByte
Memory Capacity
0
1
2
3
4
5
6
7
8
m = 8
bits
m =16
bits
Byte
Organized
Memory
Word
Organized
Memory
k-bit
•
Exercise 2
How many bits would you need to address a 2M × 32 memory if
a) The memory is byte-addressable?
b) The memory is word-addressable?
Ans.
a) There are 2M × 4 bytes which equals 2 × 2^20× 2^2 = 2^23
total bytes, so 23 bits are needed for an address.
b) There are 2M words which equals 2 × 2^20 = 2^21, so 21 bits
are required for an address.
Exercise 3
How many bits are required to address a 4M × 16 main memory if
a) Main memory is byte-addressable?
b) Main memory is word-addressable?
Ans.
a) There are 4M × 2 bytes which equals 2^2× 2^20× 2 = 2^23 total
bytes, so 23 bits are needed for an address
b) There are 4M words which equals 2^2× 2^20 = 2^22, so 22 bits
are required for an address
Exercise 4
How many bits are required to address a 1M × 8 main memory if
a) Main memory is byte-addressable?
b) Main memory is word-addressable?
Ans)
a) There are 1M ×1 bytes, or 2^20 total bytes, so 20 bits are needed
for an address
b) There are 1M words, or 2^20 total words, so 20 bits are required
for an address
A byte organized memory chip with 11 bit address bus is used as a building block in a
larger memory organization.
a) Calculate the capacity of the above chip.
b) If the above chip is used to build a 64 KB long word organized (32 bit) memory,
calculate the number of chips needed.
c) Specify the new Memory Address map, and the Memory Connections to the CPU for this
new 32-bit word-organized memory with 11-bit address bus.
Exercise 5
A byte organized memory chip with 11 bit address bus is used as a building block in a
larger memory organization.
a)Calculate the capacity of the above chip.
C = 211 bytes = 2Kbytes
b)If the above chip is used to build a 64 KB long word organized (32 bit) memory,
calculate the number of chips needed.
For 64Kbytes memory, we would need 64/2 = 32 memory chips, which are aligned as 8
rows and 4 columns.
4 columns because the memory is long word organized.
c)specify the new Memory Address map, and the Memory Connections to the CPU for this
new 32-bit word-organized memory with 11-bit address bus.
Each row would need 11 address lines from the CPU and 32 data bus lines.
3 address lines from CPU are required to decode which one of the 8 rows of memory will
be chosen.
Therefore, a total of 14 address lines are required from CPU.
Exercise 5
A byte organized memory chip with 12 bit address bus is used as a building block in a
larger memory organization.
a) Calculate the capacity of the above chip.
b) If the above chip is used to build a 32 KB word organized (16 bit) memory,
calculate the number of chips needed.
c) How many address lines should the CPU have, and how many of these address lines are
used for the decoder.
Exercise 6
A byte organized memory chip with 12 bit address bus is used as a building block in a
larger memory organization.
a) Calculate the capacity of the above chip.
2^12 = 4KB
b) If the above chip is used to build a 32 KB word organized (16 bit) memory, calculate the
number of chips needed.
32KB/4KB = 8 chips
c) How many address lines should the CPU have, and how many of these address lines are
used for the decoder.
Total 2+12=14 lines
Exercise 6
MARIE
• A Machine Architecture that is Really
Intuitive and Easy
Instruction Set Architecture (ISA)
• CPU operation is determined by the instruction it
executes
• Collection of these instructions that a CPU can
execute forms its Instruction Set
• An instruction is represented as sequence of bits,
for example:
• Instruction is divided into fields
• Opcode indicates the operation to be performed,
eg., 1 above indicates a load operation
• operands represents
—nature of operands address
001 1011 1000 0001
1 B 8 1
MARIE Instructions SET
Wr
Clk
Cycle 1
Multiple Cycle Implementation:
Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10
Load Ifetch Reg Exec Mem Wr
Ifetch Reg Exec Mem
Load Store
Pipelined Implementation:
Ifetch Reg Exec Mem WrStore
Clk
Single Cycle Implementation:
Load Store Waste
Ifetch
R-type
Ifetch Reg Exec Mem WrR-type
Cycle 1 Cycle 2
Ifetch Reg Exec Mem
Single Cycle vs. Multiple Cycle vs. Pipelined
CPU Designs: Summary
• Disadvantages of the Single Cycle Processor
—Long cycle time
—Cycle time wasted for the faster instructions
• Multiple Clock Cycle Processor
—Divide the instructions into smaller steps
—Execute each step (instead of the entire
instruction) in 1 cycle
• Pipelined Processor
—Natural enhancement of the multiple clock cycle
processor
—Each functional unit used only once per instruction
—If an instruction is going to use a functional unit:
– it must use it at the same stage as all other instructions
—Pipeline Control:
– each stage’s control signal depends ONLY on the
instruction that is currently in that stage
What Is Pipelining?
Pipelining: It’s Natural!
• Laundry Example
• Ann, Brian, Cathy, Dave
each have one load of clothes
to wash, dry, and fold
• Washer takes 30 minutes
• Dryer takes 40 minutes
• Folder takes 20 minutes
A B C D
Sequential Laundry
6 PM 7 8 9 10 11 Midnight
A
B
C
D
30 40 20 30 40 20 30 40 20 30 40 20
T
a
s
k
O
r
d
e
r
Time
•Sequential laundry takes 6 hours for 4 loads
•If they learned pipelining, how long would laundry take?
Pipelined Laundry Start work ASAP
• Pipelined laundry takes 3.5 hours for 4 loads
A
B
C
D
6 PM 7 8 9 10 11 Midnight
T
a
s
k
O
r
d
e
r
Time
30 40 40 40 40 20
Key Definitions
Pipelining is a key implementation technique
used to build fast processors. It allows the
execution of multiple instructions to overlap in
time.
A pipeline within a processor is similar to a car
assembly line. Each assembly station is called a
pipe stage or a pipe segment.
The throughput of an instruction pipeline is
the measure of how often an instruction exits the
pipeline.
Characteristics Of Pipelining
• If the stages of a pipeline are not balanced
and one stage is slower than another, the
entire throughput of the pipeline is
affected.
• In terms of a pipeline within a CPU, each
instruction is broken up into different
stages. Ideally if each stage is balanced (all
stages are ready to start at the same time
and take an equal amount of time to
execute) the time taken per instruction
(pipelined) is defined as:
Time per instruction (unpipelined) /
Number of stages
CPU Structure
• CPU must:
—Fetch instructions (Fetch)
—Interpret instructions (Decode)
—Process data (Execute)
Pipeline Stages
We can divide the execution of an instruction
into the following stages:
IF: Instruction Fetch
ID: Instruction Decode, register fetch
EX: Execution
Program Execution
Fetch Cycle:
—Processor fetches one instruction at a time from
successive memory locations until a branch/jump occurs.
—Instructions are located in the memory location pointed
to by the PC
—Instruction is loaded into the IR
—Increment the contents of the PC by the size of an
instruction
Decode Cycle:
—Instruction is decoded/interpreted, opcode will provide
the type of operation to be performed, the nature and
mode of the operands
—Decoder and control logic unit is responsible to select the
registers involved and direct the data transfer.
Execute Cycle:
—Carry out the actions specified by the instruction in the IR
Register Transfer Notation RTN
Register Transfer Notation RTN
Instruction Processing Full view
Interrupt
Exercise 7
Exercise 8
Exercise 9
Exercise 10
Exercise 10
Exercise 11
Exercise 11
Exercise 12
MARIE’s Extended Instruction Set
MARIE Extended Instruction set
Example 4.2
Example 4.3
Example 4.4
Example 4.5
MARIE Extended Instruction set
Directives
Exercise 13
Exercise 13
Exercise 13
Exercise 14
Exercise 15
Exercise 16
Exercise 17
Exercise 18
Exercise 19
Exercise 19
Exercise 20
Exercise 20
Exercise 20
Exercise 20
Exercise 20
Exercise 20
Exercise 20
Exercise 20
Exercise 20
Exercise 20
–
Exercise 21
–
ALU Control Signals and Response
Timing Diagram for the Microoperations of MARIE’s Add
Instruction
–
P3 P2 P1 P0 T0 : MAR X
P4 P3 T1 : MBR M[MAR]
A0 P5 P1 P0 T2: AC AC + MBR
Cr T3: [Reset counter]
–
Exercise 22
P3 P2 P1 P0 T0 : MAR X
P4 P3 T1 : MBR M[MAR]
P5 P1 P0 T2: AC MBA
Cr T3: [Reset counter]
–
Hardwired Control Unit
–
Exercise 22
Partial Instruction Decoder for MARIE’s Instruction Set
–
Ring Counter Using D Flip-Flops
–
Combinational Logic for Signal Controls of MARIE’s Add
Instruction
P3 P2 P1 P0 T3 : MAR X
P4 P3 T4 MR : MBR M[MAR]
Cr A0 P5 T5 LALT : AC AC + MBR
[Reset counter]
Microprogrammed Control Unit
–
MARIE’s Microinstruction Format
–
Microoperation Codes and Corresponding MARIE RTL
–
Selected Statements in MARIE’s Microprogram
CISC and RISC
Reduced Instruction Set Computers (RISC)
— Performs simple instructions that require small number of
basic steps to execute (smaller S)
— Requires large number of instructions to perform a given
task – large code size (larger N)
— more RAM is needed to store the assembly level instructions
— Advantage: Low cycles per second – each instruction is
executed faster in one clock cycle (smaller R)
— Example: Advanced RISC Machines (ARM) processor
Complex Instruction Set Computers (CISC)
— Complex instructions that involve large number of steps
(larger S)
— Fewer instructions needed (smaller N) – small code size
— Commands represent more closely to high-level languages
— Less RAM required to store the program
— Disadvantage: High cycles per second
— Example: Motorola 68000 processor, Intel x86