CS计算机代考程序代写 mips compiler ER assembly CSE141-s121-L4-before

CSE141-s121-L4-before

Week 2 — Tuesday July 6

ISA (Cont’d) & Performance
CSE-141 Summer Session I 2021

Many slides adapted from Dean Tullsen

Administrative

• Weekly Check-In will be released today and due tomorrow night

• Homework 1

• Due next Thursday 11:59 pm

Be a human …

• 2 min breakout rooms

• Anything exciting you did on the weekend?

• Feel free to not join and hang out in the main

room…

Previously on CSE-141…

The Instruction Set Architecture

I/O systemInstr. Set Proc.

Compiler

Operating
System

Application

Digital Design

Circuit Design

Instruction Set
Architecture

• An ISA is “the agreed-upon interface between all the software that runs on the
machine and the hardware that executes it.”

Hardware

Software

The Instruction Set Architecture

I/O systemInstr. Set Proc.

Compiler

Operating
System

Application

Digital Design

Circuit Design

Instruction Set
Architecture

• An ISA is “the agreed-upon interface between all the software that runs on the
machine and the hardware that executes it.”

Hardware

Software

Key ISA decisions

• operations

– how many?
– which ones?

• operands

– how many?
– location
– types
– how to specify?

• instruction format

– size
– how many formats?

y = x + b

operation

source operands

Destination operand

how does the computer know what
0001 0101 0001 0010 means?

add r5, r1, r2

MIPS Instruction Formats

opcode

opcode

opcode

rs rt rd sa funct

rs rt immediate

target

6 bits 5 bits 5 bits 5 bits 5 bits 6 bits
R-type

I-type

J-type

• An instruction has opcode == 0 ————————-> R-type ✅
• An instruction is R-type ————————-> opcode == 0 ❌

MIPS Instruction Formats

• the opcode tells the machine which format

opcode

opcode

opcode

rs rt rd sa funct

rs rt immediate

target

6 bits 5 bits 5 bits 5 bits 5 bits 6 bits
R-type

I-type

J-type

• An instruction has opcode == 0 ————————-> R-type ✅
• An instruction is R-type ————————-> opcode == 0 ❌

MIPS addressing modes

register direct
add $1, $2, $3

immediate
add $1, $2, #35

base + displacement
lw $1, disp($2)

OP rs rt rd sa funct

OP rs rt immediate

rt

rs

immediate

(R1 = M[R2 + disp])

MIPS addressing modes

register direct
add $1, $2, $3

immediate
add $1, $2, #35

base + displacement
lw $1, disp($2)

OP rs rt rd sa funct

OP rs rt immediate

rt

rs

immediate register indirect ! disp = 0
absolute ! (rs) = 0

(R1 = M[R2 + disp])

Control Flow

• Jumps
• Procedure call (jump subroutine)
• Conditional Branch
• Used to implement, for example, if-then-else logic, loops, etc.

Control Flow

• Jumps
• Procedure call (jump subroutine)
• Conditional Branch
• Used to implement, for example, if-then-else logic, loops, etc.

• A conditional branch must specify two things
• Condition under which the branch is taken
• Location that the branch jumps to if taken (target)

Conditional branch

• How do you specify the destination (target) of a branch/jump?

Conditional branch

• How do you specify the destination (target) of a branch/jump?

• studies show that almost all conditional branches go short distances from the

current program counter (loops, if-then-else).

Conditional branch

• How do you specify the destination (target) of a branch/jump?

• studies show that almost all conditional branches go short distances from the

current program counter (loops, if-then-else).

• we can specify a relative address in much fewer bits than an absolute address

• e.g., beq $1, $2, 100 => if ($1 == $2) PC = (PC+4) + 100 * 4

Conditional branch

• How do you specify the destination (target) of a branch/jump?

• studies show that almost all conditional branches go short distances from the

current program counter (loops, if-then-else).

• we can specify a relative address in much fewer bits than an absolute address

• e.g., beq $1, $2, 100 => if ($1 == $2) PC = (PC+4) + 100 * 4

• How do we specify the condition of the branch?

MIPS conditional branches

MIPS conditional branches

• beq, bne beq r1, r2, addr => if (r1 == r2) goto addr

MIPS conditional branches

• beq, bne beq r1, r2, addr => if (r1 == r2) goto addr

• slt $1, $2, $3 => if ($2 < $3) $1 = 1; else $1 = 0 MIPS conditional branches • beq, bne beq r1, r2, addr => if (r1 == r2) goto addr

• slt $1, $2, $3 => if ($2 < $3) $1 = 1; else $1 = 0 • these, combined with $0, can implement all fundamental branch conditions Always, never, !=, = =, >, <=, >=, <, >(unsigned), <= (unsigned), ... MIPS conditional branches • beq, bne beq r1, r2, addr => if (r1 == r2) goto addr

• slt $1, $2, $3 => if ($2 < $3) $1 = 1; else $1 = 0 • these, combined with $0, can implement all fundamental branch conditions Always, never, !=, = =, >, <=, >=, <, >(unsigned), <= (unsigned), ... if (i PC = 10000

Jumps

• need to be able to jump to an absolute address sometimes
• need to be able to do procedure calls and returns

• jump — j 10000 => PC = 10000
• jump and link — jal 100000 => $31 = PC + 4; PC = 10000

Jumps

• need to be able to jump to an absolute address sometimes
• need to be able to do procedure calls and returns

• jump — j 10000 => PC = 10000
• jump and link — jal 100000 => $31 = PC + 4; PC = 10000

• used for procedure calls
OP target

Jumps

• need to be able to jump to an absolute address sometimes
• need to be able to do procedure calls and returns

• jump — j 10000 => PC = 10000
• jump and link — jal 100000 => $31 = PC + 4; PC = 10000

• used for procedure calls

• jump register — jr $31 => PC = $31
• used for returns, but can be useful for lots of other things.

OP target

Branch and Jump Addressing Modes

• Branch (e.g., beq) uses PC-relative addressing mode (uses few bits if address typically
close). That is, it uses base+displacement mode, with the PC being the base. If opcode
is 6 bits, how many bits are available for displacement? How far can you jump?

• Jump uses pseudo-direct addressing mode. 26 bits of the address is in the instruction,
the rest is taken from the PC.

instruction program counter

jump destination address
| |00

6 | 26 4 | 28

Review — Instruction Execution in a CPU

Memory2
20000
20004

80000

Address10
10001100010000110100111000100000
00000000011000010010100000100000

00000000000000000000000000111001

Registers10

R0
R1
R2
R3
R4
R5

0
36
60000
45
198
12

Program Counter10
20000

Instruction Buffer

op rtrs rd shamt

immediate/disp

in1 in2

out

ALUoperation
Load/Store
Unit

addr

data

C
P
U

M
e
m
o
r
y

MIPS ISA Tradeoffs

What if?

• 64 registers

• 20-bit immediates

opcode

opcode

opcode

rs rt rd sa funct

rs rt immediate

target

6 bits 5 bits 5 bits 5 bits 5 bits 6 bits
R-type

I-type

J-type

RISC Architectures

• MIPS, like SPARC, PowerPC, and Alpha AXP, is a RISC (Reduced Instruction
Set Computer) ISA.

• fixed instruction length

• few instruction formats

• load/store architecture

• RISC architectures worked because they enabled pipelining. They continue
to thrive because they enable parallelism.

Alternative Architectures

Alternative Architectures

• Design alternative:

• provide more powerful operations

• goal is to reduce number of instructions executed

• danger is a slower cycle time and/or a higher CPI (cycles per instruction)

Alternative Architectures

• Design alternative:

• provide more powerful operations

• goal is to reduce number of instructions executed

• danger is a slower cycle time and/or a higher CPI (cycles per instruction)

• Sometimes referred to as “RISC vs. CISC”

• CISC = Complex Instruction Set Computer (as alt to RISC)

• virtually all new instruction sets since 1982 have been RISC

• VAX: minimize code size, make assembly language easy

instructions from 1 to 54 bytes long!

Alternative Architectures

• Design alternative:

• provide more powerful operations

• goal is to reduce number of instructions executed

• danger is a slower cycle time and/or a higher CPI (cycles per instruction)

• Sometimes referred to as “RISC vs. CISC”

• CISC = Complex Instruction Set Computer (as alt to RISC)

• virtually all new instruction sets since 1982 have been RISC

• VAX: minimize code size, make assembly language easy

instructions from 1 to 54 bytes long!

• We’ll look (briefly!) at 80×86

Alternative Architectures

• Design alternative:

• provide more powerful operations

• goal is to reduce number of instructions executed

• danger is a slower cycle time and/or a higher CPI (cycles per instruction)

• Sometimes referred to as “RISC vs. CISC”

• CISC = Complex Instruction Set Computer (as alt to RISC)

• virtually all new instruction sets since 1982 have been RISC

• VAX: minimize code size, make assembly language easy

instructions from 1 to 54 bytes long!

• We’ll look (briefly!) at 80×86
• What is ARM?

80×86

• 1978: The Intel 8086 is announced (16 bit architecture)

• 1980: The 8087 floating point coprocessor is added

• 1982: The 80286 increases address space to 24 bits, +instructions

• 1985: The 80386 extends to 32 bits, new addressing modes

• 1989-1995: The 80486, Pentium, Pentium Pro add a few instructions


(mostly designed for higher performance)

• 1997: MMX is added

• 1999: Pentium III (same architecture)

• 2001: Pentium 4 (144 new multimedia instructions), simultaneous multithreading (hyperthreading)

• 2005: dual core Pentium processors

• 2006: quad core (sort of) Pentium processors

• 2009: Nehalem – eight-core multithreaded processors

• 2015: Skylake – 4-core, multithreaded, added hw security features, transactional memory, …

80×86

• Complexity:

• Instructions from 1 to 17 bytes long

• one operand must act as both a source and destination

• one operand can come from memory

• complex addressing modes

e.g., “base or scaled index with 8 or 32 bit displacement”

• Saving grace:

• the most frequently used instructions are not too difficult to build

• compilers avoid the portions of the architecture that are slow

• Some other tricks we’ll talk about later.


ISA Key Points

• MIPS is a general-purpose register, load-store, fixed-instruction-length
architecture.

• MIPS is optimized for fast pipelined performance, not for low instruction
count

• Historic architectures favored code size over parallelism.

• MIPS most complex addressing mode, for both branches and loads/stores is

base + displacement.

Thought Experiment

• What is the fastest way to send a picture of a black hole to Boston?
• What is the fastest way to send 5 petabytes of data to Boston?

Thought Experiment

• What is the fastest way to send a picture of a black hole to Boston?
• What is the fastest way to send 5 petabytes of data to Boston?

Thought Experiment

• What is the fastest way to send a picture of a black hole to Boston?
• What is the fastest way to send 5 petabytes of data to Boston?

(5 petabytes) / (940 Megabits/second) = 1.35 years

Measuring and Discussing
Computer System Performance

or
“My computer is faster than your computer”

Performance Trends

Specint2000

1.00

10.00

100.00

1000.00

10000.00

85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 00 01 02 03 04 05

Intel

Alpha

Sparc

Mips

HP PA

Pow er PC

AMD

Year of introduction

P
er

fo
rm

an
ce

• But what is performance?

The bottom line: Performance
Car Speed

160 mph

65 mph

Time to
Bay Area

3.1 hours

7.7 hours

Passengers

2

60

Throughput
(pmph)

320

3900

The bottom line: Performance

° Time to do the task
– execution time, response time, latency
° Tasks per day, hour, week, sec, ns. ..
– throughput, bandwidth

Car Speed

160 mph

65 mph

Time to
Bay Area

3.1 hours

7.7 hours

Passengers

2

60

Throughput
(pmph)

320

3900

Measures of “Performance”

Measures of “Performance”

• Execution Time

Measures of “Performance”

• Execution Time

• Frame Rate

Measures of “Performance”

• Execution Time

• Frame Rate

• Throughput (operations/time)
• Transactions/sec, queries/day, etc.

Measures of “Performance”

• Execution Time

• Frame Rate

• Throughput (operations/time)
• Transactions/sec, queries/day, etc.

• Responsiveness

Measures of “Performance”

• Execution Time

• Frame Rate

• Throughput (operations/time)
• Transactions/sec, queries/day, etc.

• Responsiveness

• Performance / Cost

Measures of “Performance”

• Execution Time

• Frame Rate

• Throughput (operations/time)
• Transactions/sec, queries/day, etc.

• Responsiveness

• Performance / Cost

• Performance / Power

Measures of “Performance”

• Execution Time

• Frame Rate

• Throughput (operations/time)
• Transactions/sec, queries/day, etc.

• Responsiveness

• Performance / Cost

• Performance / Power

• Performance / Energy

How to measure Execution Time?

How to measure Execution Time?

% time program
… program results …
real 0m0.921s
user 0m0.179s
sys 0m0.389s
%

How to measure Execution Time?

• Wall-clock time?

% time program
… program results …
real 0m0.921s
user 0m0.179s
sys 0m0.389s
%

How to measure Execution Time?

• Wall-clock time?
• user CPU time? % time program

… program results …
real 0m0.921s
user 0m0.179s
sys 0m0.389s
%

How to measure Execution Time?

• Wall-clock time?
• user CPU time?
• user + kernel CPU time?

% time program
… program results …
real 0m0.921s
user 0m0.179s
sys 0m0.389s
%

How to measure Execution Time?

• Wall-clock time?
• user CPU time?
• user + kernel CPU time?
• Answer:

% time program
… program results …
real 0m0.921s
user 0m0.179s
sys 0m0.389s
%

Our definition of Performance

Execution TimeX
, for program X

Our definition of Performance

PerformanceX =
1

Execution TimeX
, for program X

Our definition of Performance

• only has meaning in the context of a program or workload

PerformanceX =
1

Execution TimeX
, for program X

Our definition of Performance

• only has meaning in the context of a program or workload

• Not very intuitive as an absolute measure, but most of the time we’re more

interested in relative performance.

PerformanceX =
1

Execution TimeX
, for program X

Relative Performance

• can be confusing

A runs in 12 seconds

B runs in 20 seconds


• A/B = 0.60, so A is 40% faster, or 1.40X faster, or B is 40% slower

• B/A = 1.67, so A is 67% faster, or 1.67X faster, or B is 67% slower

• needs a precise definition

Relative Performance (Speedup), the Definition

PerformanceX
Execution TimeXPerformanceY

Speedup Execution TimeY= = = n(X/Y)

Example

• Machine A runs program C in 9 seconds, Machine B runs the same program
in 6 seconds. What is the speedup we see if we move to Machine B from
Machine A?

Example

• Machine A runs program C in 9 seconds, Machine B runs the same program
in 6 seconds. What is the speedup we see if we move to Machine B from
Machine A?

• Machine B gets a new compiler, and can now run the program in 3 seconds.
Speedup from the new compiler?

What is Time?

What is Time?

CPU Execution Time = CPU clock cycles * Clock cycle time

What is Time?

CPU Execution Time = CPU clock cycles * Clock cycle time

• Every conventional processor has a clock with an associated clock cycle time or clock rate

What is Time?

CPU Execution Time = CPU clock cycles * Clock cycle time

• Every conventional processor has a clock with an associated clock cycle time or clock rate

• Every program runs in an integral number of clock cycles

Cycle Time

MHz = millions of cycles/second, GHz = billions of cycles/second

X MHz = 1000/X nanoseconds cycle time

Y GHz = 1/Y nanoseconds cycle time

How many clock cycles?

How many clock cycles?

Number of CPU clock cycles =

[Instruction count] * [Average Clock Cycles per Instruction (CPI)]

How many clock cycles?

Number of CPU clock cycles =

[Instruction count] * [Average Clock Cycles per Instruction (CPI)]

Computer A runs program C in 3.6 billion cycles. Program C requires 2 billion dynamic
instructions. What is the CPI?

All Together Now

CPU Execution Time = [CPU clock cycles] * [Clock cycle time]

CPU clock cycles = [Instruction count] * [Average Clock Cycles per Instruction (CPI)]

All Together Now

CPU Execution
Time

Instruction
Count

CPI Clock Cycle
Time= X X

All Together Now

CPU Execution
Time

Instruction
Count

CPI Clock Cycle
Time= X X

instructions
cycles/instruction seconds/cycle

seconds

Putting it all together

• IC = 4 billion, 2 GHz processor, execution time of 3 seconds. What is the CPI
for this program?

• Suppose we reduce CPI to 1.2 (through an architectural improvement). What
is the new execution time?

CPU Execution
Time

Instruction
Count

CPI Clock Cycle
Time= X X