Assembly Language
* Created with contributions by and .
Programming the processor
Copyright By PowCoder代写 加微信 powcoder
Things you’ll need to know:
Control unit signals to the datapath
Machine code instructions Assembly language
instructions
Programming in assembly language
How things fit together
Assembly Language
Datapath Signals
Machine Code
Control Unit
Machine Code Instructions
Intro to Machine Code
Now that we have a processor, operations are performed by:
The instruction register:
Sending instruction components to the processor.
The control unit:
Based on the opcode value (sent from the instruction register), sending a sequence of signals to the rest of the processor.
Only questions remaining:
Where do these instructions come from?
How are they provided to the instruction memory?
Assembly language
Each processor type has its own language for representing, say 32-bit, instructions as user- level code words.
Example: C = A + B
Assume A is stored in $t1, B in $t2, C in $t3. Assembly language instruction:
add $t3, $t1, $t2
Machine code instruction:
000000 01001 01010 01011 XXXXX 100000
Encoding the instruction
Machine code instructions contain all the details about a processor operation, such as:
What operation is being performed (opcode),
What registers are being used in this operation,
What other information might be needed to make this operation happen (immediate or shift values)
When we write (or interpret) a machine code instruction, we need to know how to encode (or decode) these details into these 32 bits.
R-type instructions
For instance, how do we encode the earlier instruction that adds registers $t1 and $t2 and stores the result into register $t3?.
e.g. 000000
add $t3, $t1, $t2
000000 01001 01010 01011 XXXXX 100000
Operating on Registers
The add instruction is one of many operations that processes two register values and stores the result in a third.
This is called an R-type instruction.
Any operations whose inputs and outputs are all registers are called R-type, even if they involve less than three registers.
e.g. the jr instruction, which we get to later.
In order to encode R-type instructions, we need to know the 5-bit codes used to refer to our input and output registers.
Machine code + registers
MIPS is register-to-register.
Almost all operations rely on register data.
MIPS provides 32 registers (with numerical and label names).
Some have special values:
Register $0 ($zero): value 0 — always.
Register $1 ($at): reserved for the assembler.
Registers $28-$31 ($gp, $sp, $fp, $ra): memory and function support
Registers $26-$27: reserved for OS kernel
Some are used by programs as functions parameters:
Registers $2-$3 ($v0, $v1): return values
Registers $4-$7 ($a0-$a3): function arguments
Some are used by programs to store values:
Registers $8-$15, $24-$25 ($t0-$t9): temporaries
Registers $16-$23 ($s0-$s7): saved temporaries
Also three special registers (PC, HI, LO) that are not directly accessible.
HI and LO are used in multiplication and division, and have special instructions for accessing them.
Filling in the blanks
In our previous example, registers $t1 and $t2 are registers 9 and 10 respectively, and register
$t3 is the 11th register.
The registers for this instruction are encoded as
01001 ($t1), 01010 ($t2) and 01011 ($t3). add $t3, $t1, $t2
I-type instructions
I-type instructions also operate on registers, but involve a constant value as well.
This constant is encoded in the last 16 bits of the instruction.
e.g. 001000
addi $t2, $t1, 42
0000000000101010
001000 01001 01010 0000000000101010
J-type instructions
J-type instructions jump to a location in memory encoded by the last 26 bits of the instruction (everything but the opcode).
This location is stored as a label, which is resolved when the assembly program is compiled.
More later on how these 26 bits store jump addresses. e.g.
00000000000000011000101010
000010 00000000000000011000101010
Review: MIPS instruction types
R-type: 655556
Machine code details
Things to note about machine code:
R-type instructions have an opcode of 000000,
with a 6-bit function listed at the end.
Although we specify “don’t care” bits as X values, the assembly language interpreter always assigns
them to some value (like 0).
It’s possible to program your processor with machine code, but makes more sense to use an equivalent language that is more natural (for humans, that is).
Assembly Language Overview
Assembly language
Assembly language is the lowest-level language that you’ll ever program in.
Many compilers translate
their high-level program
commands into assembly commands, which are then converted into machine code and used by the processor.
Note: There are multiple types of assembly language, especially for different architectures!
A little about MIPS
Short for Microprocessor without Interlocked
Pipeline Stages
A type of RISC (Reduced Instruction Set Computer)
architecture.
Provides a set of simple and fast instructions
Compiler translates instructions into 32-bit instructions for instruction memory.
Complex instructions (e.g. multiplication) are built out of simple ones by the compiler and assembler.
MIPS Instructions
Things to note about MIPS instructions:
Instruction are written as:
All instructions are 32 bits (4 bytes) long
Instruction addresses are measured in bytes, starting from the instruction at address 0.
Therefore, all instruction addresses are divisible by 4.
The following tables show the most common MIPS instructions, the syntax for their parameters, and what operation they perform.
Frequency of instructions
Instruction Type
Integer Frequency
Floating point Frequency
Arithmetic
Operations in assignment statement s
Data transfer
lw, sw, lb, lbu, lh, lhu, sb, lui
References to data structures, such as arrays
and, or, nor,
andi, ori,
0perations in assignment statement s
Conditional branch
beq, bne, slt, slti, sltiu
If statements and loops
j, jr, jal
Procedure calls, returns, and case/switch statements
Original source: Computer Organization And Design: The Hardware/Software Interface,
5th Edition, Patterson & Hennessy, 2014, p163 20
Assembly Language Instructions
Arithmetic instructions
Instruction
Opcode/Function
$d, $s, $t
$d = $s + $t
$d, $s, $t
$d = $s + $t
$t = $s + SE(i)
$t = $s + SE(i)
lo = $s / $t; hi = $s % $t
lo = $s / $t; hi = $s % $t
hi:lo = $s * $t
hi:lo = $s * $t
$d, $s, $t
$d = $s – $t
$d, $s, $t
$d = $s – $t
Note: “hi” and “lo” refer to the high and low bits referred to in the register slide. “SE” = “sign extend”.
Assembly Machine Code
$t3 = $t1 + $t2
add $t3, $t1, $t2
Instruction
Opcode/Function
$d, $s, $t
$d = $s + $t
o0p0c0o0d0e0
sXhXaXXmtX
1f0u0n0c0t0
R-type vs I-type arithmetic
add, addu
div, divu
mult, multu sub, subu
addi addiu
In general, some instructions are R-type (meaning all operands are registers) and some are I-type (meaning they use an immediate/constant value in their operation).
Can you recognize which of the following are R-type and I-type instructions?
Assembly Machine Code II
$t2 = $t1 + 42
addi $t2, $t1, 42
Instruction
Opcode/Function
$t = $s + SE(i)
o0p0c1o0d0e0
0000i0m0m0e0d0i0a1t0e1010
Logical instructions
Instruction
Opcode/Function
$d, $s, $t
$d = $s & $t
$t = $s & ZE(i)
$d, $s, $t
$d = ~($s | $t)
$d, $s, $t
$d = $s | $t
$t = $s | ZE(i)
$d, $s, $t
$d = $s ^ $t
$t = $s ^ ZE(i)
Note: ZE = zero extend (pad upper bits with 0 value).
Shift instructions
Instruction
Opcode/Function
$d = $t << a
$d, $t, $s
$d = $t << $s
$d = $t >> a
$d, $t, $s
$d = $t >> $s
$d = $t >>> a
$d, $t, $s
$d = $t >>> $s
Note: srl = “shift right logical”, and sra = “shift right arithmetic”. The “v” denotes a variable number of bits, specified by $s. a is a shift amount, and is stored in shamt when encoding
the R-type machine code instructions.
Data movement instructions
Instruction
Opcode/Function
These are R-type instructions for operating on the HI and LO registers described earlier.
ALU instructions
Note that for ALU instruction, most are R-type instructions.
The six-digit codes in the tables are therefore the function codes (opcodes are 000000).
Exceptions are the I-type instructions (addi, andi, ori, etc.)
Not all R-type instructions have an I-type equivalent.
RISC architectures dictate that an operation doesn’t need an instruction if it can be performed through multiple existing operations.
Example: addi + divdivi
Example program
Fibonacci sequence:
How would you convert this into assembly?
(ignoring function arguments, return call for now)
int fib(void) {
int n = 10;
int f1 = 1, f2 = -1;
while (n != 0) {
f1 = f1 + f2;
f2 = f1 – f2;
n = n – 1;
return f1; }
Assembly code exa
Fibonacci sequence in assembly code: }
return f1;
int n = 10;
int f1 = 1, f2 = -1;
f1 = f1 + f2;
f2 = f1 – f2;
n = n – 1;
# register usage: $t3=n, $t4=f1, $t5=f2 #
FIB: addi $t3, $zero, 10
addi $t4, $zero, 1
addi $t5, $zero, -1
LOOP: beq $t3, $zero, END
add $t4, $t4, $t5
sub $t5, $t4, $t5
addi $t3, $t3, -1
END: sb $t4, 0($sp)
# initialize n=10
# initialize f1=1
# initialize f2=-1 # done loop if n==0 #f1=f1+f2 #f2=f1-f2
# n = n – 1
# repeat until done
# store result
Making an assembly program
Assembly language programs typically have structure similar to simple Python or C programs:
They set aside registers to store data.
They have sections of instructions that manipulate
this data.
It is always good to decide at the beginning which registers will be used for what purpose!
More on this later
Simulating MIPS
• Link to download:
http://courses.missouristate.edu/KenVollmar/mars/
• Tutorial links available on Quercus!
More instructions!
Control flow in assembly
Not all programs follow a linear set of instructions. Some operations require the code to branch to one
section of code or another (if/else).
Some require the code to jump back and repeat a section
of code again (for/while).
For this, we have labels on the left-hand side that indicate the points that the program flow might need to jump to.
References to these points in the assembly code are resolved at compile time to offset values for the program counter.
Jump instructions
Instruction
Opcode/Function
pc = (pc & 0xF0000000) | (i<<2)
$31 = pc+4;
pc = (pc & 0xF0000000) | (i<<2)
$31 = pc+4; pc = $s
jal = “jump and link”.
Register $31 (aka $ra) stores the address that’s used when returning
from a subroutine (i.e. the next instruction to run).
Note: jr and jalr are jumps, but not J-type instructions.
jr and jalr (jump to register) For instructions such as the following:
jr $ra jalr $t0
The processor moves the address stored in $ra and $t0 into the program counter.
The next instruction to be fetched will be at this new address, and the program will continue from there.
What happens in the other cases, when the destination address is stored in the instruction?
j and jal (jump to label) For j and jal instructions, the address is
supplied by the instruction. This is a potential problem.
If the first 6 bits are occupied by the opcode, the remaining bits aren’t enough for a full 32- bit address!
How do we get around this?
Solution #1: Trailing zeros
Since jump instructions load new addresses into the program counter, the values being loaded must be divisible by 4.
Therefore...the binary values of these addresses will always end in “00”.
Therefore...there’s no point in storing the last two bits of the address in the instruction.
Solution: Use the 26 bits in the J-type instructions to store the new PC address, minus the last two zeros at the end.
Solution #2: Leading bits
This still leaves us with a 28-bit address (26 bits from the instruction + “00” at the end).
What should we use for the first 4 bits?
Several solutions exist, but the one that MIPS uses
is to keep the first 4 bits of the previous PC value. This is where the formula in the table comes from:
Take the first four bits from the previous PC
Join the two parts together
Add “00” to the end of the instruction bits
(pc & 0xF0000000)
Branch instructions
Instruction
Opcode/Function
$s, $t, label
if ($s == $t) pc += i << 2
if ($s > 0) pc += i << 2
if ($s <= 0) pc += i << 2
$s, $t, label
if ($s != $t) pc += i << 2
Branch operations are key when implementing if statements and while loops.
The labels are (addresses of) memory locations, assigned to each label at compile time.
Branch instructions
How does a branch instruction work? .text
main: beq $t0, $t1, end
end: ... ...
# check if $t0 == $t1
# if $t0 != $t1, then
# execute these lines
# if $t0 == $t1, then # execute these lines
Branch instructions
Alternate implementation using bne: .text
main: bne $t0, $t1, end
end: ... ...
# check if $t0 == $t1
# if $t0 == $t1, then
# execute these lines
# if $t0 != $t1, then # execute these lines
Used to produce if statement behaviour.
Branch’s immediate (i) value 6 5 5 16
Branch statements are I-type instructions.
The immediate value (i) is a 16-bit offset (i.e. a relative address) to add to the current instruction if the branch condition is satisfied (not the absolute address, like with jumps).
Calculated as the difference between the current PC value and the address of the instruction you’re branching to.
Stored here as # of instructions (and not # of bytes)
Again, not storing the trailing “00” if it’s not necessary.
The i value can be positive (if you’re jumping i instructions forward) or negative (if you’re jumping i instructions backward).
Calculating the i value
The offset is computed differently, depending on the implementation (i.e. if the PC is incremented by 4 before or after the branch offset calculation).
If relative to current-PC :
i = (label location - (current PC)) >> 2
If relative to incremented PC:
i = (label location – (current PC + 4)) >> 2
For this course, we assume i is computed as:
i = (label – (current PC)) >> 2
Corresponds to the simulator we use for this course (MARS) more on that later.
i in simulation
Use a simple program in MARS to confirm this.
main: addi $t0, $zero, 1
beq $t0, $zero, END
addi $t1, $zero, 1
END: addi $t3, $zero, 1
What will i be for beq?
In MARS, the 16 least significant bits of the machine
code instruction are 0000000000000010.
END is 2 instructions down from the branch instruction.
Conditional Branch Terms
When the branch condition is met, we say the branch is taken.
When the branch condition is not met, we say the branch is not taken.
What is the next PC in this case? It’s the usual PC+4
How far can a processor branch? Are there any constraints?
Comparison instructions
Instruction
Opcode/Function
$d, $s, $t
$d = ($s < $t)
$d, $s, $t
$d = ($s < $t)
$t = ($s < SE(i))
$t = ($s < ZE(i))
Note: Comparison operations store a 1 in the destination register if the less-than comparison is true, and stores a zero in that location otherwise. Not used too often, but useful in combination with branch instructions that only depend on one register (e.g., bgtz)
Using branches and jumps
if, else, while & for
If statements
if statements test a condition and then execute lines of code if the condition is true.
For instance:
if ( i == j ) { i++;
j = j + i;
Testing conditions is done using either a beq instruction or a bne instruction.
Translated if statement
if ( i == j ) { i++;
j = j + i;
Use the bne instruction to skip the i++ step and proceed straight to the j=j+i step:
# $t1 = i, $t2 = j
main: bne $t1, $t2, END
addi $t1, $t1, 1 END: add $t2, $t2, $t1
# branch if (i != j)
# j = j + i
if/else statements
if ( i == j ) i++;
Possible approach to if/else statements: Test condition, and jump to if logic block
whenever condition is true.
Otherwise, perform else logic block, and jump to first line after if logic block.
Translated if/else statements
# $t1 = i, $t2 = j
main: beq $t1, $t2, IF
addi $t1, $t1, -1
IF: addi $t1, $t1, 1
END: add $t2, $t2, $t1
# branch if ( i == j )
# jump over IF # i++
Or branch on the else condition first: # $t1 = i, $t2 = j
ELSE: END:
bne $t1, $t2, ELSE addi $t1, $t1, 1
addi $t1, $t1, -1 add $t2, $t2, $t1
# branch if ! ( i == j ) # i++
# jump over ELSE
A trick with if statements Use flow charts to help you sort out the
control flow of the code:
true false
if ( i == j )
else block
# $t1=i,$t2=j
main: beq $t1, $t2, IF
addi $t1, $t1, -1
IF: addi $t1, $t1, 1
END: add $t2, $t2, $t1
if/else statement flowcharts i=j?
if ( i == j )
# $t1=i,$t2=j
ELSE: END:
bne $t1, $t2, ELSE
addi $t1, $t1, 1
addi $t1, $t1, -1
add $t2, $t2, $t1
else block
Multiple if conditions
if ( i == j || i == k ) i++ ; // if-body
i-- ; // else-body
Branch statement for each condition:
# $t1 = i, $t2 = j, $t3 = k
main: beq $t1, $t2, IF bne $t1, $t3, ELSE
IF: addi $t1, $t1, 1 j END
ELSE: addi $ti, $ti, -1 END: add $t2, $t1, $t3
# cond1: branch if ( i == j ) # cond2: branch if ( i != k )