CS计算机代考程序代写 mips assembly Welcome to Computer Organization and Assembly!

Welcome to Computer Organization and Assembly!

The Register File, ALU,
and Memory
CS/COE 0447
Jarrett Billingsley

1

Class announcements
don’t forget the laaaaaab
CS447
2

2

The register files
CS447
3

– spookyyyyy
3

The MIPS register file
soooo we’re gonna need 32 registers, right?
CS447
4
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15
16 17 18 19 20 21 22 23
24 25 26 27 28 29 30 31

this would be a pain to put on slides sooooo
let’s just look at the first 4 instead.
0 1 2 3

– everything we talk about generalizes to n registers anyway.
4

1/8 of the MIPS register file (animated)
soooo we’re gonna need 4 registers, right?
CS447
5

D
Q
en
0

D
Q
en
1

D
Q
en
2

D
Q
en
3
except, register 0 is named zero, and it’s always 0, and if you write to it, it’s still 0, and if you read from it, it’s always 0, so maybe it should just be 0. the constant 0.
0
now let’s work on reading from the registers.
registers constantly output whatever value they store.
how will we choose which register to read, though?

– 0.
– choOOOoooooOOOOooooose……….. that means a mux.
5

Reading from one register (animated)
we’ll use a mux to decide which register to read.
CS447
6
1
83
3
29

D
Q
en
1
83
0

D
Q
en
2
4

D
Q
en
3
29

this is a muxtopus: a mux with a bunch of legs coming out of it.
so what decides which register to read?
add t0, t1, t2

uh, but this mux only reads one register.
so what do you think we’ll need?

– this muxtopus is missing a few legs 
– with many legs, you get a muxtipede. very serious technical terms here.
– how the registers get from your assembly text into that control signal input is next week
6

0
0
3
29
Reading from two registers (animated)
we duplicate that circuitry to be able to read 2 registers at once!
CS447
7
1
83
3
29

D
Q
en
1
83
0

D
Q
en
2
4

D
Q
en
3
29

now we can read from two different registers at the same time.
or, if we like, the same register twice…
do you think we could read from three at once?

– of course we could. it just means more wires and another mux. but that stuff DOES take up a lot of space.
7

Writing to a register
writing is sort of the mirror image. sort of.
CS447
8

D
Q
en
1
83
0

D
Q
en
2
4

D
Q
en
3
29
do we always write to a register?
what decides which register to write?
add t0, t1, t2

bne t0, t2, top
we’re making a choice here, but a mux doesn’t really make sense…
we can also use the write enables to make choices.
so let’s think about the logic of when each of these registers should change.

– register 1 should change if we’ve got a destination AND the destination is 1.
– register 2 should change if we’ve got a destination AND the destination is 2.
– etc…
8

A first, but ugly and bad attempt
we only want to write to one register at a time… if any.
CS447
9

D
Q
en
1
83
0

D
Q
en
2
4

D
Q
en
3
29
we’ll need a write enable signal for the register file as a whole.
WE
it’s 1 (true) for instructions that have a destination.

every register needs an AND gate, cause that’s what we said.
but… how do I check if the destination == 1? or 2, or 3?
dest

could try a comparator…
1
=
this feels… verbose.

and repeat for every single register…

– when the register file’s WE=0, no registers change. when its WE=1, 1 register changes.
– if you’re doing something that’s really wordy and repetitive, it’s probably the wrong way to solve it.
9

1
3
DEMUX
Chekhov’s Gun
I said this component was pointless 95% of the time.
well, now it has a point.
CS447
10

D
Q
en
1
83
0

D
Q
en
2
4

D
Q
en
3
29
WE
a demultiplexer forwards its input to one of its outputs. the rest will be 0.
now, a register is only written when WE=1 and it is selected as the destination register.
dest
1
0
0
this first output just isn’t connected.

– is this a demuxtopus?
10

The last thing to hook up (animated)
the input data has to go to the destination register.
but because of our write enable circuitry…
CS447
11

D
Q
en
1
83
0

D
Q
en
2
4

D
Q
en
3
29
Data

we can hook up the data input to all the registers’ data inputs.
the write enable signal is like a door: only the register with WE=1 will be changed.
74
74
1
0
0

we’re done!

11

The MIPS register file
now we have the completed register file:
CS447
12

Register File
there’s one input or write port, for the data to write.
there are two output or read ports, for the data being read.
each port can read a different register.

there’s the clock input, of course…
one write enable, which is on when the instruction changes a register…
and inputs to select the registers used for the read/write ports.
WE
rd
rs
rt
rd
rs
rt

12

The ALU

Arithmetic and Logic Unit
remember this? it does arithmetic and logic. cool.
squish two values together, get new value.
I said it was entirely combinational.
if we ignore multiplication and division, it is!
so get ready for the simplest thing ever.

CS447
14

ALU
A
B

14

Here it is
it’s a mux and some things.
CS447
15

+

A
B
do everything, but only pick the thing you need.
ALUOp
the ALU Op(eration) signal controls what the ALU does

Save Our Silicon
despite its simplicity, the ALU can be a pretty big piece of silicon
you might reuse the ALU hardware for non-arithmetic instructions
CS447
16

t2
t5
t2 & t5
and t0, t2, t5
s0
4
s0 + 4
lw t0, 4(s0)
t0
10
t0 – 10
bne t0, 10, lab1
some instructions might present a problem, though…
(we’ll come back to that)

– the problem is: what if you need to do TWO calculations in one instruction?
– actually the ‘bne’ there might. it adds an offset to the PC register. so it needs to subtract and add at the same time??
16

What about multiply and divide though??
let’s say multiply takes as long as 6 regular instructions.
CS447
17
Main ALU
+-

D
Q

D
Q

ctrl
×÷ Unit
mult t0, t1
add t2, t3, t4
and t2, t2, a2

move v0, t2
mflo v1

it sends the multiplication off to a separate unit.
then we run other instructions while the multiplication happens.
…and when the multiplication is done, we can get the product!

– of course, this means having separate silicon for a dedicated adder/subtractor in the multiply/divide unit.
– but that’s probably less than a full regular ALU.
17

Hey, that’s neat actually
by making the multiply/divide unit separate from the rest of the CPU, we can do fun things like overclock it.
maybe the CPU runs at 2 GHz, but the divider at 8 GHz
now the divider does 4 steps for each main CPU cycle!
we can do this cause it’s a smaller circuit, which means it can likely run faster than the CPU as a whole.
but…
what should happen if the program tries to get the product or quotient before the operation is done?
should we “pause” the CPU when that happens? how?
what if it happens a lot? is that gonna be a performance issue?
there are many possible solutions…
CS447
18

– the dead-simplest approach would be to either give garbage results or crash. but that’s… not great. for a number of reasons.
– you could pause the CPU until the operation completes, but that destroys performance.
– you could keep running other instructions that don’t depend on the result of the operation…
– ooh MAN does that get complicated fast!
18

Memory
CS447
19

A big register file. A really big one
remember: memory is an array of bytes.
each byte has an address, and you can read and write it.
CS447
20

Register File

WE
rd
rs
rt
rd

rt
so think about a register file.

Memory

WE
addr
data
at addr
remove one read port…
use one address to select the location to read/write…
boom. memory.
rd
rs

– or from another view, a register file is a small, special-purpose memory with extra read/write ports.
20

Bytes? Words?
wait, if memory is an array of bytes, how can we get whole words?
physically, we make it an array of words, and get bytes out of them.
CS447
21
0 00 01 02 03
1 04 05 06 07
2 08 09 0A 0B
3 0C 0D 0E 0F

I want to load a byte from address 0x07 = 0b0111:
the red digits pick which word to load

0 1 2 3
04 05 06 07

the blue digits pick which byte to extract from it

now I can sign- or zero-extend that byte value and put it in the register file.
here, the numbers in the cells are the addresses.

– I know this diagram is gonna confuse people, I KNOW IT, I CAN FEEL IT IN MY BOOOOOONES
– this also visually explains why you can’t load/store from unaligned addresses – they’re in different rows of the word-addressed memory.
21

☢️ Structural Hazards ☢️
where are your program’s instructions stored?
if we try to do lw t0, (s0) with one memory in a single cycle…
CS447
22
Memory
PC
Instruction Address
Control
Instruction
Load word address…?
Loaded word…??
in general, memory can only read/write a single address at a time.
what about sw?!?
adding a second read port would take a lot of wires.

– instructions are in memory too.
– a read port requires “linear space” in the size of the memory.
– for the register file, there are only 32 addresses, so it’s an acceptable tradeoff.
– for memory, it’s usually impractical.
– there are memories which are “dual-ported” and let you read/write to two addresses simultaneously
– but they are limited in size and are more expensive.
22

Von Neumann vs Harvard
one way to solve this problem is to have two memories!
CS447
23
this is a Harvard Architecture.
a Von Neumann Architecture has one memory.
“Von Neumann” is 2 words for 1 memory… “Harvard” is 1 word for 2 memories…
Instruction Memory
PC
instructions!
Data Memory
loads
stores
Memory
PC
instructions, loads
, stores
but… how? didn’t I just say that was impossible?

– “Neumann” is said “noy-man”. like “annoy man”.
– you are not going to remember how to spell it on the exam, I bet. that’s fine. I’m not an English teacher.
23

All things are possible through space-time tradeoffs
a very general rule: if you have one kind of resource, and you want two users to use it, you have two options:
CS447
24

use two, instead of just one, OR…
make the users take turns.

this takes more space;
and this takes more time.

– the latter is often called “time multiplexing.” very very common design pattern.
24

Multi-cycle (animated)
a Von Neumann machine has one memory, but uses multiple clock cycles to execute each instruction
CS447
25

Memory
PC
Instruction Address
Control
Instruction
Loaded word
Cycle 1:
Cycle 2:
Load word address
lw t0, (s0)
multi-cycle machines are by far the most common today
but they’re more complex…

– but this isn’t even the whole picture, cause once you start pipelining, you have to go back to a Harvard architecture! but only within the CPU. outside, it’s still von Neumann. AAAAAAHHH.
25

/docProps/thumbnail.jpeg