CS计算机代考程序代写 computer architecture assembly Department of Electrical and Computer Engineering

Department of Electrical and Computer Engineering
Rutgers, The State University of New Jersey

Computer Architecture and Assembly Language Lab

Fall 2021
Lab 6

CPU Structure, Pipeline Programming and Hazards

Goal
After completing this lab, you will:

• Know the difference between pipelined and non-pipelined processors
• Learn how the pipeline works; know how to prevent hazards in the pipeline

Introduction
A CPU has 5 basic components, the Arithmetic and Logic Unit (ALU), the Instruction

Memory, the Register File, the Data Memory and its Control block. The ALU is

responsible for executing the arithmetic functions, such as addition, subtraction, and the

logical functions, e.g. AND, OR, logical shift, as well as computing branch, jump and

memory addresses. The instruction memory is the memory where an assembly program

is stored before execution. The Program Counter contains the address of the instruction

being fetched. The data memory is the memory where all the data produced or not are

stored. Finally, the control block is responsible for decoding an instruction and setting all

necessary control signals with the proper values. Figure 1 shows a diagram of a simple

CPU. Computers execute instructions in a serial manner. A way to speed up a

program’s execution is by using a Pipeline. Pipelining is a method to increase the

throughput of the executing instruction without actually affecting the time a single

instruction needs to execute. Finally, control is the most challenging aspect of processor

design: it is both the hardest part to get right and the hardest part to make fast. One of

the difficult parts of control design is implementing exceptions and interrupts. An

exception is an unscheduled event that disrupts program execution. An interrupt is an

exception that comes from outside of the processor, such as when an Input/Output

device needs servicing.

CPU Structure
In order to examine the structure of the CPU, the implementation of actual instructions

must be analyzed, so that the organization of the memory units, the ALU and the control

of the CPU are understood. Figure 1 depicts the parts of a CPU. The most important

parts are:

Department of Electrical and Computer Engineering
Rutgers, The State University of New Jersey

1. The Program Counter (PC) points to the address of the instruction that is being

fetched from the instruction memory.
2. The Register File contains registers that store the state arguments for the ALU or

results of Arithmetic operations, at every clock cycle.
3. The Arithmetic and Logic Unit (ALU) is responsible to execute every arithmetic or

logical operation that is needed by an instruction.
4. The Data Memory contains all the data that either are generated by the execution

if the instructions or are the initial data that a program may need.
5. The Instruction Memory holds the compiled and linked instructions before

execution.
6. The Control Block of the processor decodes each instruction and sets all the

control signals according to that instruction.
7. The ALU control is responsible to pass the correct control signal to the ALU

according to the arithmetic or logical operation that is requested by the Control

unit.

Figure 1. The simple data path with the control unit shown.

Department of Electrical and Computer Engineering
Rutgers, The State University of New Jersey

In order to understand how an instruction is executed and how the data path operates,

two examples will be given below. The first example explains the instruction add x5,

x6, x7 and the second explains the instruction sw x5, offset(x6). Table 1 shows

the values the control signals have for the two examples.

Example 1
add x5, x6, x7

The execution can be shown in four steps:
1. The instruction is fetched and the PC is incremented by 4.
2. Two registers, x6 and x7, are read from the register file. Also, the main

control unit sets of the control lines for an add operation.

3. The ALU operates on the data read from the register file, using the

function code (bits: 2 ALUOp of the instruction) to generate the ALU

function.
4. The result of the ALU is written back into the register file using the bits

15:11 of the instruction to select the destination register x5.

Example 2
sw x5, offset(x6)

The execution can be shown in 5 steps [1]:
1. The instruction is fetched and the PC is incremented.
2. A register (x5) value is read from the register file.

3. The ALU computes the sum of the value read from the register file and the

sign-extended, lower 16 bits of the instruction (offset).
4. The sum of the ALU is used as the address for the data memory.
5. The data from the register file is written into the memory unit; the register

destination is given by x6.

Instruction ALUSrc MemtoReg RegWrite MemRead MemWrite Branch ALUOp [1:0]

add x5, x6, x7 0 0 1 0 0 0 10

sw x5, offset(x6) 1 0 0 0 1 0 00

Table 1. The setting of the control lines of examples 1 and 2 based on Figure 1.

Department of Electrical and Computer Engineering
Rutgers, The State University of New Jersey

The explanation of the control signals is as follows:

Signal Name Explanation

ALUSrc Controls the MUX to select the second input of the ALU either from the second

output of the register file or the sign extended lower 16 bits of the instruction

MemtoReg Selects the data fed to the register Write data input port either from the ALU

output or the data memory output

RegWrite The register on the Write register input is written with the value on the Write

data input

MemRead Data memory contents designated by the address input are put on the Read

data output

MemWrite Data memory contents designated by the address input are replaced by the

value on the Write data input

Branch Line goes HI if the instruction is a branch instruction

ALUOp[1:0] The ALU operation.

Pipeline Programming and Hazards

When a processor supports pipeline, an instruction can be divided in four or five steps

which are 1) Instruction Fetch [IF], 2) Instruction Decode [ID], 3) Execution [EX], 4)

Data Memory Access [MEM] and 5) Write Back the result into the register file [WB].

For a processor to achieve its maximum throughput, the pipeline must always be kept

full (i.e. each stage of the pipeline needs to be occupied at each clock cycle). Also,

the clock cycle time is kept fixed and is defined by the slowest step of an instruction.

Let us now see an example. Consider 3 consecutive instructions A, B and C. Each

instruction has the above-mentioned five steps. When a processor does not support

pipelining, each instruction will be executed after the previous one has finished, as it

is shown in Table 2. Also, suppose that instruction A starts at cycle 1.

Instr/Cycle 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

A IF ID EX MEM WB

B IF ID EX MEM WB

C IF ID EX MEM WB

Table 2. Instruction execution without pipelining.

On the other hand, when pipeline is supported, each instruction can start executing

even though the previous one has not necessarily finished.

Department of Electrical and Computer Engineering
Rutgers, The State University of New Jersey

Instr/Cycle 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

A IF ID EX MEM WB

B IF ID EX MEM WB

C IF ID EX MEM WB

Table 3. Instruction execution with pipeline support.

Note that there are situations in pipelining when the next instruction cannot

execute in the following clock cycle. These events are called Hazards and there

are three different types of hazards:

A. Structural Hazard. When a planned instruction cannot execute in the

proper clock cycle because the hardware does not support the combination

of instructions that are set to execute.

B. Data Hazard or Pipeline Data Hazard. When a planned instruction cannot

execute in the proper clock cycle because data that is needed to execute

the instruction is not yet available.

C. Control Hazard or Branch Hazard. When the instruction cannot execute

in the proper pipeline clock cycle because the instruction that was fetched

is not the one that is needed; that is, the flow of instruction addresses is not

what the pipeline expected.

To understand a Data Hazard, consider the following code fragment:

lw x5, 0(x6)

sub x7, x5, x6

The execution of the code with pipeline support is as follows:

Instr/Cycle 1 2 3 4 5 6

lw x5,0(x6)
IF ID EX MEM WB

sub x7,x5,x6
IF ID EX MEM WB

The lw will write the value from the memory in the register file in step WB, in the

meantime the sub instruction will have retrieved its operands in the ID step. This

actually means that the old value of register x5 will be used for the addition.

Department of Electrical and Computer Engineering
Rutgers, The State University of New Jersey

Such hazards can be prevented by either writing a program that does not provoke

any hazard or by adding an ample number of no-operation instructions (NOP)

bubbles between the instructions that depend on the result of each other. NOP

operations act like bubbles inside the pipeline, delaying the execution of the second

instruction (and subsequent instructions) until the values it requires as input are

ready. When the NOP technique is used in the above example, the following

happens:

Instr/Cycl 1 2 3 4 5 6 7 8

lw x5, 0(x6)
IF ID EX MEM WB

sub x7,x5,x6 IF Stall Stall ID EX MEM WB

Unfortunately, Stalling slows down the pipeline and degrades its efficiency. An

alternative technique instead of NOPs is Instruction Forwarding. In this technique,

the result of an instruction will be sent forward to the next instruction before it is

written back to the register file and it is possible that stalling is not needed. The

above data hazard between lw and sub can be resolved with instruction

forwarding:

Instr/Cycl 1 2 3 4 5 6 7 8

lw x5, 0(x6)
IF ID EX MEM WB

sub x7, x5, x6
IF Stall ID EX MEM WB

It can be easily seen that a Stall cycle still exists. The reason is that the lw

instruction will not have the data ready for forwarding before it is read from the

memory. When it writes back the value in the register file, it also forwards it to the

input of the ALU for immediate use (as symbolized by the arrow). If that stall does

not exist even with instruction forwarding, the forwarded result would arrive too late

and the result of the sub instruction would be wrong.

The third technique that a programmer can use to avoid data hazards is

Instruction Reordering. As implied by its name, the independence between

instructions can be exploited in order to reorder them in a way that keeps the

pipeline as full as possible.

Department of Electrical and Computer Engineering
Rutgers, The State University of New Jersey

Consider the following code segment:

lw x5, 0(x6)

addi x7, x5, 2

sub x28, x29, x9

Without instruction reordering, the execution will be the following:

Instr/Cycle
1 2 3 4 5 6 7 8 9

lw x5,0(x6)
IF ID EX MEM WB

addi x7,x5,2 IF Stall Stall ID EX MEM WB

sub x28,x29,x9 IF ID EX MEM WB

It is clear that two cycles are lost due to the NOP instruction. Also, it is easily seen

that sub does not need the result of the lw or the add. In this case sub is

independent of the other two and it can be moved between them. After reordering

the execution will look like this:

Instr/Cycle
1 2 3 4 5 6 7 8 9

lw x5,0(x6)
IF ID EX MEM WB

sub x28,x29,x9
IF ID EX MEM WB

addi x7,x5,2 IF Stall ID EX MEM WB

Department of Electrical and Computer Engineering
Rutgers, The State University of New Jersey

Exercises

1. [30 pts] Suppose you have the CPU of Figure 1 and that the architecture is not pipelined.

1. [15 pts] For the following program indicate the values of the control signals for each

instruction.

sw x8, 0(x10)

lw x15, 4(x8)

beq x0, x0, BoolTrue

2. [15 pts] Given the following control signals, write the corresponding assembly

instruction(s).

ALUSrc MemtoReg RegWrite MemRead MemWrite Branch ALUOp[1:0]

1 0 1 0 0 0 10

ALUSrc MemtoReg RegWrite MemRead MemWrite Branch ALUOp[1:0]

0 0 1 0 0 0 10

2. [30 pts] Suppose you have a machine that supports the five pipeline stages mentioned

above. Using the program below, answer the following questions:

lw x20, 4(x19)

add x20, x20, x11

sub x11, x19, x11

addi x10, x19, 4

sub x17, x17, x10

1. [5 pts] Complete the table of execution with five pipeline stages. Ignore all hazards.

2. [5 pts] Suppose that registers x10, x11, x17, x19, and x20 have the initial values of 2,

5, 8, 0, and 1, respectively. Give the values of the registers after running the program

without the processor handling any hazards.

3. [5 pts] Assume that the initial value of the PC is 0x0…0BF. What is the new PC value

at cycle 4 when ignoring hazards?

4. [5 pts] Resolve the hazards through stalling. Insert NOP instructions so that the

registers would get the correct values when the processor executes the program one

instruction per cycle.

5. [5 pts] Resolve all hazards (if possible) through instruction reordering. Can all

hazards be resolved through instruction reordering only? Explain.

6. [5 pts] Resolve the hazards (if possible) through instruction forwarding. Can all

hazards be resolved through instruction forwarding only? Explain.

Department of Electrical and Computer Engineering
Rutgers, The State University of New Jersey

3. [25 pts] Consider the following code:

addi x10, x0, 1

add x12, x0, x10

slli x13, x12, 2

addi x7, x12, 0

addi x29, x29, 4

Loop: sw x7, 10(x29)

addi x28, x0, 2

bne x7, x29, Loop

1. [5 pts] Determine how many cycles it would take for the program to execute without

hazard handling hardware.

2. [10 pts] Draw the execution table for one iteration. Identify the hazards and their types.

3. [10 pts] Assuming perfect hazard handling, what is the minimum number of cycles to

execute one iteration of the program? Explain what combination of hazard resolution

techniques must be used to achieve this and how. Draw the execution table to justify

your explanation.

Department of Electrical and Computer Engineering
Rutgers, The State University of New Jersey

4. [15 pts] Please read the following code, and answer the questions:

main:

addi x5, x0, 10

addi x6, x0, 0

jal x0, block_b

block_a:

add x6, x6, x7

addi x10, x0, 1

addi x11, x6, 0

ecall

jal x0, end

block_b:

addi x7, x0, 0

addi x8, x0, 0

jal x0, block_c

block_c:

sub x9, x5, x7

bge x8, x9, block_a

add x7, x7, x8

add x12, x8, x7

addi x8, x8, 1

jal x0, block_c

end:

Using any combination of techniques you like, identify and resolve all hazards in the program.

Draw execution tables showing the pipelined instructions before and after implementing your

hazard resolution method to illustrate how your solution fixes the issue.

For simplicity, you do not need to draw the table for every instruction in the program, only the

instructions affected by each hazard you identify.