N19 – Assembler Basic Concepts
Credit: Jiayao (Amy) Li ’18 for typewriting my handwritten notes
As noted several times before, computers understand the language of 1s and 0s only. We made a ‘feeble’ attempt to learn and write code in HACK machine language of 1s and 0s, but quickly made way for a higher level of abstraction in the form of HACK assembly language. Why? Because we were concerned about the readability of code, difficulty in writing carefully articulated sequences of 1s and 0s expressing A and C instructions, and (of course) the difficulty in debugging code expressed in 1s and 0s. So if we write code at the higher level of Assembly Mneomic form, someone or something has to step in to translate that code into lower machine level in 1s and 0s, right? That’s the job of an Assembler. You can write it in any language. Its job is to take code written in assembly language and convert it into machine language. There is an assembler written for every unique machine architecture with its unique machine language. So in other words, HACK has an assembler. The X86 (Intel) architecture has an assembler. ARM Machines have an assembler for their architecture, etc.
Our discussion on the topic of Assembler will be based on the HACK computer. It is easy to describe both the principles and application of the assembler with the aid of an example architecture.
First, let’s have a quick refresher of the HACK computer from the perspective of its machine language and high-level block diagram of its architecture.
HACK MACHINE LANGUAGE HACK has two instructions in its repertoire:
(1) A-Instruction with mnemonic
– Example: @27 – Bit format:
– A-instruction is named so largely because it serves the purpose of holding address of memory location, although it can also hold data for typical arithmetic & logical computation.
@
0 n14 n13……n1 n0
→ Notice the leading 0 followed by the number
→ Also note that we can represent a number as low as 0 and as high as 215-1
(2) C-Instruction with mnemonics like (eg.) D=A+D
Or M=D or D; JLE etc. Note that the complete reference of HACK C instruction can be found in the textbook and lecture slides for Machine Language.
C-Instr.
A-Instruction
Generically, C instruction can be written as: DEST=COMP; JUMP
Where DEST represents the destination of computation COMP COMP is the computation performed by the instruction
JUMP represents the class of JUMP instructions in the broad category of flow control operations
→ Note: DEST and JUMP fields are optional but NOT BOTH Examples
→ A=A+D shows DEST is A register &
COMP is A+D, the sum of contents of A&D register
→ D; JLE shows there is no specified destination of this instruction and that the operation is to test the value of D and Jump to location pointed by A register if D≤0
A C-Instruction bit pattern categories are as shown:
←–COMP–→ A–Instruction bit pattern is only one type as shown:
1××ac5c4c3c2c1c0d2d1d0j2j1j0
0n14 n13……n1 n0
On your right is a very high-level block diagram of the HACK computer (sans the I/0).
-The Instruction memory holds instructions.
Which instruction will be executed next is determined by the address loaded in the program counter, PC
-The Data Memory holds data that can be read by or written to the CPU
-CPU has two registers A&D
-Width of all memory blocks and the registers is 16-bits
Hack computer has two registers – A and D. The D Register only serves the purpose of holding data. In the event, the D register is holding a computer memory address, its contents must first be moved to the A register before they can be used as memory address (example 4 below)
Example1:
@9
D=A loads the number 9 into register D
Example2: D=D+A updates D with sum of A and D content
Example3:
@9
D=M loads contents of data memory M[9] into register D
D Register
HACK computer @High Level
C-Instruction
Example4:
A=D
M=M+1 implements the function M[A]=M[A]+1
The A Register, on the other hand, can serve multiple purposes.
(a) The A register can hold data (see Example 1 above)
(b) The A register acts as pointer to data memory location (see Example 3 and 4 above) (c) The A register also acts as pointer to instruction memory location to implement
flow control
Example5:
@9
JUMP Jump unconditionally to Instruction Memory (ROM) location 9
Example6:
@127
D; JLE JUMP to location 127 in Instruction Memory if content of D register is ≤0
With this basic knowhow of HACK instructions, let us write a basic HACK program and ask some fundamental questions about how such a translation software called Assembler might work.
// This simple assembly program adds contents of
// RAM locations 0 and 1 and places the result in RAM location 2 // i.e. M[2] = M[0] + M[1]
@0 //A=0
D=M // D = M[0]
@1 //A=1
D = D+M // D = D + M[1]
@2 //A=2
M=D // M[2] = D = M[0]+M[1]
The picture next to the code shows the code being loaded in our HACK CPU emulator. You even used it in Project 4 while running your code! Before we go any further, let’s take a quick aside on the term emulator and its context in HACK: An emulator is typically an executable piece of software – but sometimes also assisted with programmable hardware – that is implemented to realize a design (e.g. a computing device) to develop and test its functionality before it is committed to volume production. For HACK, we also have a CPU emulator written entirely in software which behaves just as if the HACK CPU was actually realized with physical switches. This allows us to “play” with its functionality without spending physical resources to build it. The assembly program is loaded into the CPU emulator. Notice how the comments seem to have disappeared! How did that happen? And now look at the translated code that we obtained via the ASSEMBLER tool available in nand2tetris SW pack:
A Register
Each line is translated – one for one – into its equivalent machine level (0s and 1s) code. Let’s examine this now a little more –
1. Every line that starts with a comment “//” is discarded. Clearly, just like in high-level language programs, comments are only there for enhancing readability and debugability. Computers clearly do not need these comments.
2. Empty lines are also discarded. They are again only useful for readability.
3. Now the fun begins (keep an eye on the table below that shows the encodings for C- instructions. The encoding for A-instruction is rather simple! – a 0 followed by 15 bits
representing the non-negative number declared in the instruction.
a. Instruction1: @0 translates to 0 000000000000000
b. Instruction2: D = M has destination D and OP is M. Looking at the table below, D
destination corresponds to 3-bit code 010. OP of M corresponds to a=1 and c1:c6 as 110000. Since Jump is not here in this instruction, the jump bits are 000. Let’s combine them in the 16-bit format of C instruction that goes as follows:
111 a c1 c2 c3 c4 c5 c6 d1 d2 d3 j1 j2 j3. The resulting instruction therefore is 111 1 1 1 0 0 0 0 0 1 0 0 0 0.
Written compactly, 1111110000010000. Check out the translated code for D=M on the previous page and convince yourself it matches. Voila! That’s your simple assembly process
c. To satisfy your curiosity, you may ‘manually assemble’ the remaining instructions as well.
4. So what’s left to do?
a. As you experienced in Project 4, to write readable and transportable code, we need
to write code with variable symbols (for e.g. @x, D=M, @y, D=D+M, @z, M=D).
How do we deal with variable symbols when it comes to program translation?
b. We also know from experience that the other kind of symbols that appear in a
assembly program are special symbols like R0, R1, KBRD, etc.
c. And above all, programs also have loop structures implemented with JUMPS! LABELS are needed to show in a simple fashion how flow control takes place. If that is the case, how do we deal with labels when it comes to program translation?
d. Above all a..d needs to be handled in software! In other words we need to write
code to actually capture the assembly process.
e. We will do all of this in our next module.