CS计算机代考程序代写 x86 assembly interpreter 02 Computer Evolution and Performance

02 Computer Evolution and Performance

COMP228: System Hardware

Tutorial Chapter #04: MARIE

General Purpose Register (GPR)

Architecture

Its functional units are:
Data Registers: D0, D1, D2,…, D7 for arithmetic operations – holds
any kind of data

Address Registers: A0, A1, A2,…, A7 serve as pointers to memory
addresses

Working Registers: several such registers – serve as scratch pads for
CPU

Program Counter (PC) holding the address in memory of the next
instruction to be executed. After an instruction is fetched from
memory, the PC is automatically incremented to hold the address of,
or point to, the next instruction to be executed.

Instruction Register (IR) holds the most recently read instruction from
memory while it is being decoded by the Instruction Interpreter.

Memory Address Register (MAR) holds the address of the next location
to be accessed in memory.

Memory Buffer Register (MBR or MDR) holds the data just read from
memory, or the data which is about to be written to memory. Buffer is
referring to temporarily holding data.

Status Register (SR) to record status information

GPR CPU

MBR

MAR

PC

Increment

Interpre
ter

IR

Register
File

0
1
2
3

ALU

CPU Memory

Data bus

Address bus

Memory

Control

16 bit

8 bit

Figure 04.F08: MARIE’s Architecture-

MARIE

Figure 04.F09: Datapath in MARIE

MARIE

12 bits

12 bits

16 bits

16 bits

8 bits

16 bits

8 bits

BUS Protocol example

Asynchronous Buses using complex handshaking protocol

Instruction Execution Time

Clock Cycles (P) – regular time
intervals defined by the CPU clock

Clock Frequency (clock Rate), R = 1/P
cycles per second (Hz)

500 MHz => P = 2ns

1.25 GHz => P = 0.8ns

Micro Step Number of Clock Cycles

Register Transfer 1

Decoding 2

Add 2

Multiply 5

Memory Access 10

For each instruction:
Fetch: Total 12 clock cycles

MAR  PC 1
MDR  M[MAR] 10
IR  MBR 1

Decode: 2 clock cycles
Execute: depends on instruction

Exercise 1

What are the main functions of the CPU?

Ans.

The CPU is responsible for fetching program instructions,

decoding each instruction that is fetched and performing the

indicated sequence of operations on the correct data.

How is the ALU related to the CPU? What are its main functions?

Ans.

The ALU is part of the CPU. It carries out arithmetic operations

(typically only integer arithmetic) and can carry out logical

operations such as AND, OR, and XOR, as well as shift

operations.

Central Processing Unit (CPU)

• CPU is the heart and brain

• It interprets and executes machine level
instructions

• Controls data transfer from/to Main Memory
(MM) and CPU

• Detects any errors

Memory

• Smallest unit of storage is a Bit

• However, smallest addressable unit is a
Byte (8 bits)

bit 7 6 5 4 3 2 1 0

msb lsb

• Most computers permit access of memory
through words (16 bits, 32 bits or 64 bits)

bits 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

MSByte LSByte

0 1 0 1 1 0 1 1

1 1 0 1 0 0 1 1 0 1 0 1 1 0 1 1

Figure 04.F04: a) N 8-Bit Memory Locations b) M 16-Bit Memory

Locations

Memory

Main Memory

Main
Memory

CPU

k-bit

n-bit

Address
bus

Data bus

Main Memory

• Internally data is always represented in binary, although Hex is
more readable

Main
Memory

CPU

Address bus

Data bus
Read operation

1

1

1

1

1

1

0

1

$FD

0 1 0 0 0 0 1 0 $FD location
0

1

0

0

0

0

1

0

Memory Addressing

• Successive addresses refer to
successive byte locations in memory.

• Byte locations have addresses 0, 1, 2,
….

• If word length of the machine is 16 bits,
successive words are located at
addresses 0, 2, 4, ….(these even
addresses are also called word
boundary)

• If word length of the machine is 32 bits
(long word), successive words are
located at addresses 0, 4, 8, ….

• Words must be accessed at their word
boundaries, otherwise exception occurs

• Some machines allow long words to be
accessed at even addresses – address 0
for bytes at locations 0,1,2,3 – address
2 for bytes at locations 2,3,4,5

0

1

2

3

4

5

6

7

8

Word
length
16 bits

Word
length
32 bits

8 bits 8 bits

Big-Endian and Little-Endian

Big-Endian:

• Lower memory address correspond to MSByte

• Address of word is defined as address of MSByte

Little-Endian:

• Lower memory address correspond to LSByte

• Address of word is defined as address of LSByte

Memory Capacity

0
1
2
3
4
5
6
7
8

m = 8
bits

m =16
bits

Byte
Organized
Memory

Word
Organized
Memory

k-bit

Exercise 2

How many bits would you need to address a 2M × 32 memory if

a) The memory is byte-addressable?

b) The memory is word-addressable?

Ans.

a) There are 2M × 4 bytes which equals 2 × 2^20× 2^2 = 2^23

total bytes, so 23 bits are needed for an address.

b) There are 2M words which equals 2 × 2^20 = 2^21, so 21 bits

are required for an address.

Exercise 3

How many bits are required to address a 4M × 16 main memory if

a) Main memory is byte-addressable?

b) Main memory is word-addressable?

Ans.

a) There are 4M × 2 bytes which equals 2^2× 2^20× 2 = 2^23 total

bytes, so 23 bits are needed for an address

b) There are 4M words which equals 2^2× 2^20 = 2^22, so 22 bits

are required for an address

Exercise 4

How many bits are required to address a 1M × 8 main memory if

a) Main memory is byte-addressable?

b) Main memory is word-addressable?

Ans)

a) There are 1M ×1 bytes, or 2^20 total bytes, so 20 bits are needed

for an address

b) There are 1M words, or 2^20 total words, so 20 bits are required

for an address

A byte organized memory chip with 11 bit address bus is used as a building block in a

larger memory organization.

a) Calculate the capacity of the above chip.

b) If the above chip is used to build a 64 KB long word organized (32 bit) memory,

calculate the number of chips needed.

c) Specify the new Memory Address map, and the Memory Connections to the CPU for this

new 32-bit word-organized memory with 11-bit address bus.

Exercise 5

A byte organized memory chip with 11 bit address bus is used as a building block in a

larger memory organization.

a)Calculate the capacity of the above chip.

C = 211 bytes = 2Kbytes

b)If the above chip is used to build a 64 KB long word organized (32 bit) memory,

calculate the number of chips needed.

For 64Kbytes memory, we would need 64/2 = 32 memory chips, which are aligned as 8

rows and 4 columns.

4 columns because the memory is long word organized.

c)specify the new Memory Address map, and the Memory Connections to the CPU for this

new 32-bit word-organized memory with 11-bit address bus.

Each row would need 11 address lines from the CPU and 32 data bus lines.

3 address lines from CPU are required to decode which one of the 8 rows of memory will

be chosen.

Therefore, a total of 14 address lines are required from CPU.

Exercise 5

A byte organized memory chip with 12 bit address bus is used as a building block in a

larger memory organization.

a) Calculate the capacity of the above chip.

b) If the above chip is used to build a 32 KB word organized (16 bit) memory,

calculate the number of chips needed.

c) How many address lines should the CPU have, and how many of these address lines are

used for the decoder.

Exercise 6

A byte organized memory chip with 12 bit address bus is used as a building block in a

larger memory organization.

a) Calculate the capacity of the above chip.

2^12 = 4KB

b) If the above chip is used to build a 32 KB word organized (16 bit) memory, calculate the

number of chips needed.

32KB/4KB = 8 chips

c) How many address lines should the CPU have, and how many of these address lines are

used for the decoder.

Total 2+12=14 lines

Exercise 6

MARIE

• A Machine Architecture that is Really
Intuitive and Easy

Instruction Set Architecture (ISA)

• CPU operation is determined by the instruction it
executes

• Collection of these instructions that a CPU can
execute forms its Instruction Set

• An instruction is represented as sequence of bits,
for example:

• Instruction is divided into fields

• Opcode indicates the operation to be performed,
eg., 1 above indicates a load operation

• operands represents
—nature of operands address

001 1011 1000 0001

1 B 8 1

MARIE Instructions SET

Wr

Clk

Cycle 1

Multiple Cycle Implementation:

Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10

Load Ifetch Reg Exec Mem Wr

Ifetch Reg Exec Mem

Load Store

Pipelined Implementation:

Ifetch Reg Exec Mem WrStore

Clk

Single Cycle Implementation:
Load Store Waste

Ifetch

R-type

Ifetch Reg Exec Mem WrR-type

Cycle 1 Cycle 2

Ifetch Reg Exec Mem

Single Cycle vs. Multiple Cycle vs. Pipelined

CPU Designs: Summary

• Disadvantages of the Single Cycle Processor
—Long cycle time
—Cycle time wasted for the faster instructions

• Multiple Clock Cycle Processor
—Divide the instructions into smaller steps
—Execute each step (instead of the entire

instruction) in 1 cycle

• Pipelined Processor
—Natural enhancement of the multiple clock cycle

processor
—Each functional unit used only once per instruction
—If an instruction is going to use a functional unit:

– it must use it at the same stage as all other instructions

—Pipeline Control:
– each stage’s control signal depends ONLY on the

instruction that is currently in that stage

What Is Pipelining?

Pipelining: It’s Natural!

• Laundry Example

• Ann, Brian, Cathy, Dave
each have one load of clothes
to wash, dry, and fold

• Washer takes 30 minutes

• Dryer takes 40 minutes

• Folder takes 20 minutes

A B C D

Sequential Laundry

6 PM 7 8 9 10 11 Midnight

A

B

C

D

30 40 20 30 40 20 30 40 20 30 40 20
T

a

s

k

O

r

d

e

r

Time

•Sequential laundry takes 6 hours for 4 loads

•If they learned pipelining, how long would laundry take?

Pipelined Laundry Start work ASAP

• Pipelined laundry takes 3.5 hours for 4 loads

A

B

C

D

6 PM 7 8 9 10 11 Midnight

T

a

s

k

O

r

d

e

r

Time

30 40 40 40 40 20

Key Definitions

Pipelining is a key implementation technique
used to build fast processors. It allows the
execution of multiple instructions to overlap in
time.

A pipeline within a processor is similar to a car
assembly line. Each assembly station is called a
pipe stage or a pipe segment.

The throughput of an instruction pipeline is
the measure of how often an instruction exits the
pipeline.

Characteristics Of Pipelining

• If the stages of a pipeline are not balanced
and one stage is slower than another, the
entire throughput of the pipeline is
affected.

• In terms of a pipeline within a CPU, each
instruction is broken up into different
stages. Ideally if each stage is balanced (all
stages are ready to start at the same time
and take an equal amount of time to
execute) the time taken per instruction
(pipelined) is defined as:

Time per instruction (unpipelined) /
Number of stages

CPU Structure

• CPU must:

—Fetch instructions (Fetch)

—Interpret instructions (Decode)

—Process data (Execute)

Pipeline Stages

We can divide the execution of an instruction
into the following stages:

IF: Instruction Fetch
ID: Instruction Decode, register fetch
EX: Execution

Program Execution

Fetch Cycle:
—Processor fetches one instruction at a time from

successive memory locations until a branch/jump occurs.

—Instructions are located in the memory location pointed
to by the PC

—Instruction is loaded into the IR

—Increment the contents of the PC by the size of an
instruction

Decode Cycle:
—Instruction is decoded/interpreted, opcode will provide

the type of operation to be performed, the nature and
mode of the operands

—Decoder and control logic unit is responsible to select the
registers involved and direct the data transfer.

Execute Cycle:
—Carry out the actions specified by the instruction in the IR

Register Transfer Notation RTN

Register Transfer Notation RTN

Instruction Processing Full view

Interrupt

Exercise 7

Exercise 8

Exercise 9

Exercise 10

Exercise 10

Exercise 11

Exercise 11

Exercise 12

MARIE’s Extended Instruction Set

MARIE Extended Instruction set

Example 4.2

Example 4.3

Example 4.4

Example 4.5

MARIE Extended Instruction set

Directives

Exercise 13

Exercise 13

Exercise 13

Exercise 14

Exercise 15

Exercise 16

Exercise 17

Exercise 18

Exercise 19

Exercise 19

Exercise 20

Exercise 20

Exercise 20

Exercise 20

Exercise 20

Exercise 20

Exercise 20

Exercise 20

Exercise 20

Exercise 20

Exercise 21

ALU Control Signals and Response

Timing Diagram for the Microoperations of MARIE’s Add

Instruction

P3 P2 P1 P0 T0 : MAR  X

P4 P3 T1 : MBR  M[MAR]

A0 P5 P1 P0 T2: AC  AC + MBR

Cr T3: [Reset counter]

Exercise 22

P3 P2 P1 P0 T0 : MAR  X

P4 P3 T1 : MBR  M[MAR]

P5 P1 P0 T2: AC  MBA

Cr T3: [Reset counter]

Hardwired Control Unit

Exercise 22

Partial Instruction Decoder for MARIE’s Instruction Set

Ring Counter Using D Flip-Flops

Combinational Logic for Signal Controls of MARIE’s Add

Instruction

P3 P2 P1 P0 T3 : MAR  X

P4 P3 T4 MR : MBR  M[MAR]

Cr A0 P5 T5 LALT : AC  AC + MBR

[Reset counter]

Microprogrammed Control Unit

MARIE’s Microinstruction Format

Microoperation Codes and Corresponding MARIE RTL

Selected Statements in MARIE’s Microprogram

CISC and RISC

Reduced Instruction Set Computers (RISC)
— Performs simple instructions that require small number of

basic steps to execute (smaller S)
— Requires large number of instructions to perform a given

task – large code size (larger N)
— more RAM is needed to store the assembly level instructions
— Advantage: Low cycles per second – each instruction is

executed faster in one clock cycle (smaller R)
— Example: Advanced RISC Machines (ARM) processor

Complex Instruction Set Computers (CISC)
— Complex instructions that involve large number of steps

(larger S)
— Fewer instructions needed (smaller N) – small code size
— Commands represent more closely to high-level languages
— Less RAM required to store the program
— Disadvantage: High cycles per second
— Example: Motorola 68000 processor, Intel x86