N20 – Building an Assembler
Credit: Jiayao (Amy) Li ’18 for typewriting my handwritten notes
In the previous module, we talked about the basic concept of Assemblers and performed a ‘manual’ assembly of simple HACK code to demonstrate the process at introductory level. In building an assembler, we are getting first-hand experience in compiler technology where one must know the syntax and semantics of both the source and target languages.
In this module, we will explore additional (realistic) complexities that come with translating assembly code to machine code. For instance, we will explore dealing with variables and symbols of various forms in assembly code.
Let’s dust off our example from module 16 on swapping contents of two RAM locations.
Example 1 Hack Assembly Code
// This program swaps the content of RAM[0] with RAM[1] // We follow the procedure temp=x, x=y, y=temp
//Stash the value of M[0] in a temp location
@0
D=M
@temp
M=D
//Now update M[0] with value of M[1] @1
D=M
@0
M=D
//Now update M[1] with original value of M[0] @temp
D=M @1 M=D
Notice that we are able to use variables like temp in HACK A-instructions instead of putting down non-negative numbers. The above program is fairly easy to read and understand. Using transportable variable is more readable.
The flexibility of being able to use variables instead of actual memory addresses now requires us to think of implementing a lookup table. Data structure that holds the corresponding value of address for each variable. This lookup table in the assembler is called a symbol table (st) →Eg.
SYMBOLT ABLE
Symbol
Value
temp
16
foo
17
…etc…
…etc….
Armed with this concept of a symbol table, one can implement the assembler with a flowchart like this:
The above flowchart works fine for programs without any predefined variables or LABEL symbols. Let’s next look at a program with flow control:
If X>Y then Z=X-Y else Z=X+Y
Flowchart for ASSEMBLER OF PROGRAM WITH VAR SIMBOLS
Example2: Flow Control
// This program performs the following function // If X>Y then Z=X-Y else Z=X+Y
// Here assume X&Y are memory locations
// that have predefined content
@X
D=M
@Y
D=D-M //X-Y @PASS
D; JGT // GOTO PASS if X>Y @Y
D=M // D=Y
@X
D=D+M // (PASS)
@Z
M=D (END)
@END O;JMP
D=X+Y
What new do you notice above in symbols? Aside from our familiar comment lines or statements starting with “//” and lines with instructions that use variable like X,Y,Z, we also see LABEL symbols eg. (PASS) and (END)! Why do we need labels in assembly code? To answer that there are two questions to be asked:
(i) If writing ASM code is supposed to be easier then wouldn’t remembering line numbers while writing code be counter to that simplicity?
(ii) Even if you could overcome (i) above, how unproductive will it be if you had to modify the program with add/delete and then have to manually recalculate all line #s?
Two important points about HACK ASM program syntax:
(i) Variables can be lower or upper case but must start with a letter.
(ii) Labels must be all UPPER CASE, and encased inside parentheses () at
declaration.
Also note that HACK already has some predefined symbols memory location 0…15 are R0 … R15
Ok, Now that we can also have LABELS in ASM code, we need to rethink our assembler flowchart a bit differently.
(i) Need to account for predefined symbols in the ST
(ii) Need to find and convert all labels to create corresponding entries in the ST
Line#ROMCode0 1 2 3 4 5 6 7 8 9 1011 1213
A Fresh look at the Assembler
HACK ASM SPECIFIC REQHYTS
Why Labels??
This now requires TWO passes of the ASM code! Why? Just take another look at the Example 2. On line 4 we encounter @PASS but we have no idea what line# PASS corresponds to because if we are going to need to jump to this address, we better know the “GO TO” address! We will only know this after we have gone through one pass of the entire cod and created a line # next to each unique occurrence of a LABEL symbol.
Full Flowchart for the HACK ASSEMBLER