CSCE-312 Day 1 Introduction
L20 – Building an Assembler
Program Translation Process
2
Machine Language
Courtesy: nand2tetris.org
if ((x+width)>511) {
let x=511-width;
}
High Level Language code
Compiler
2
Assembly process
0000000000010000
1110111111001000
0000000000010001
1110101010001000
0000000000010000
1111110000010000
0000000000000000
1111010011010000
0000000000010010
1110001100000001
0000000000010000
1111110000010000
0000000000010001
…
Machine Language
assembler
@i
M=1 // i = 1
@sum
M=0 // sum = 0
(LOOP)
@i // if i>RAM[0]
D=M // GOTP WRITE
@R0
D=D-M
@WRITE
D;JGT
… // Etc.
Assembly Language
assemble
mario.asm
into
mario.bin
run
3
Credit: www.nand2tetris.org
Assembler: lecture plan
The assembly process
The Hack assembly language
The assembly process: instructions
The assembly process: symbols
Developing an assembler
Project 6 overview
Credit: www.nand2tetris.org
4
Hack Assembler
0000000000010000
1110111111001000
0000000000010001
1110101010001000
0000000000010000
1111110000010000
0000000000000000
1111010011010000
0000000000010010
1110001100000001
0000000000010000
1111110000010000
0000000000010001
1111000010001000
0000000000010000
1111110111001000
0000000000000100
1110101010000111
0000000000010001
1111110000010000
0000000000000001
1110001100001000
0000000000010110
1110101010000111
Hack machine code
Challenges:
Handling…
White space
Instructions
Symbols
Assembly program
// Computes RAM[1] = 1 + … + RAM[0]
@i
M=1 // i = 1
@sum
M=0 // sum = 0
(LOOP)
@i // if i>RAM[0] goto STOP
D=M
@R0
D=D-M
@STOP
D;JGT
@i // sum += i
D=M
@sum
M=D+M
@i // i++
M=M+1
@LOOP // goto LOOP
0;JMP
(STOP)
@sum
D=M
@R1
M=D // RAM[1] = the sum
(END)
@END
0;JMP
Assembler
Credit: www.nand2tetris.org
5
5
Handling symbols
Assembly program
Pre-defined symbols:
represent special memory locations
label symbols:
represent destinations of
goto instructions
variable symbols:
represent memory locations where the programmer wants to maintain values
// Computes RAM[1] = 1 + … + RAM[0]
@i
M=1 // i = 1
@sum
M=0 // sum = 0
(LOOP)
@i // if i>RAM[0] goto STOP
D=M
@R0
D=D-M
@STOP
D;JGT
@i // sum += i
D=M
@sum
M=D+M
@i // i++
M=M+1
@LOOP // goto LOOP
0;JMP
(STOP)
@sum
D=M
@R1
M=D // RAM[1] = the sum
(END)
@END
0;JMP
Credit: www.nand2tetris.org
6
Handling pre-defined symbols
// Computes RAM[1] = 1 + … + RAM[0]
@i
M=1 // i = 1
@sum
M=0 // sum = 0
(LOOP)
@i // if i>RAM[0] goto STOP
D=M
@R0
D=D-M
@STOP
D;JGT
@i // sum += i
D=M
@sum
M=D+M
@i // i++
M=M+1
@LOOP // goto LOOP
0;JMP
(STOP)
@sum
D=M
@R1
M=D // RAM[1] = the sum
(END)
@END
0;JMP
Assembly program
symbol value
R0 0
R1 1
R2 2
… …
R15 15
SCREEN 16384
KBD 24576
symbol value
SP 0
LCL 1
ARG 2
THIS 3
THAT 4
The Hack language specification describes 23 pre-defined symbols:
Translating @preDefinedSymbol :
Replace preDefinedSymbol with its value.
Credit: www.nand2tetris.org
7
Handling symbols that denote labels
// Computes RAM[1] = 1 + … + RAM[0]
@i
M=1 // i = 1
@sum
M=0 // sum = 0
(LOOP)
@i // if i>RAM[0] goto STOP
D=M
@R0
D=D-M
@STOP
D;JGT
@i // sum += i
D=M
@sum
M=D+M
@i // i++
M=M+1
@LOOP // goto LOOP
0;JMP
(STOP)
@sum
D=M
@R1
M=D // RAM[1] = the sum
(END)
@END
0;JMP
Assembly program
Label symbols
Used to label destinations of goto commands
Declared by the pseudo-command (XXX)
This directive defines the symbol XXX
to refer to the memory location holding the next instruction in the program
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
symbol value
LOOP 4
STOP 18
END 22
Translating @labelSymbol :
Replace labelSymbol with its value
Credit: www.nand2tetris.org
8
Handling symbols that denote variables
Assembly program
Variable symbols
Any symbol XXX appearing in an assembly program which is not pre-defined and is not defined elsewhere using the (XXX) directive is treated as a variable
Each variable is assigned a unique memory address, starting at 16
symbol value
i 16
sum 17
// Computes RAM[1] = 1 + … + RAM[0]
@i
M=1 // i = 1
@sum
M=0 // sum = 0
(LOOP)
@i // if i>RAM[0] goto STOP
D=M
@R0
D=D-M
@STOP
D;JGT
@i // sum += i
D=M
@sum
M=D+M
@i // i++
M=M+1
@LOOP // goto LOOP
0;JMP
(STOP)
@sum
D=M
@R1
M=D // RAM[1] = the sum
(END)
@END
0;JMP
Translating @variableSymbol :
If seen for the first time,
assign a unique memory address
Replace variableSymbol with this address
Credit: www.nand2tetris.org
9
Symbol table
// Computes RAM[1] = 1 + … + RAM[0]
@i
M=1 // i = 1
@sum
M=0 // sum = 0
(LOOP)
@i // if i>RAM[0] goto STOP
D=M
@R0
D=D-M
@STOP
D;JGT
@i // sum += i
D=M
@sum
M=D+M
@i // i++
M=M+1
@LOOP // goto LOOP
0;JMP
(STOP)
@sum
D=M
@R1
M=D // RAM[1] = the sum
(END)
@END
0;JMP
Assembly program
Credit: www.nand2tetris.org
10
Symbol table
symbol value
Symbol table
// Computes RAM[1] = 1 + … + RAM[0]
@i
M=1 // i = 1
@sum
M=0 // sum = 0
(LOOP)
@i // if i>RAM[0] goto STOP
D=M
@R0
D=D-M
@STOP
D;JGT
@i // sum += i
D=M
@sum
M=D+M
@i // i++
M=M+1
@LOOP // goto LOOP
0;JMP
(STOP)
@sum
D=M
@R1
M=D // RAM[1] = the sum
(END)
@END
0;JMP
Assembly program
Credit: www.nand2tetris.org
11
Symbol table
symbol value
R0 0
R1 1
R2 2
… …
R15 15
SCREEN 16384
KBD 24576
SP 0
LCL 1
ARG 2
THIS 3
THAT 4
Symbol table
Initialization:
Add the pre-defined symbols
// Computes RAM[1] = 1 + … + RAM[0]
@i
M=1 // i = 1
@sum
M=0 // sum = 0
(LOOP)
@i // if i>RAM[0] goto STOP
D=M
@R0
D=D-M
@STOP
D;JGT
@i // sum += i
D=M
@sum
M=D+M
@i // i++
M=M+1
@LOOP // goto LOOP
0;JMP
(STOP)
@sum
D=M
@R1
M=D // RAM[1] = the sum
(END)
@END
0;JMP
Assembly program
Credit: www.nand2tetris.org
12
Symbol table
symbol value
R0 0
R1 1
R2 2
… …
R15 15
SCREEN 16384
KBD 24576
SP 0
LCL 1
ARG 2
THIS 3
THAT 4
Symbol table
Initialization:
Add the pre-defined symbols
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// Computes RAM[1] = 1 + … + RAM[0]
@i
M=1 // i = 1
@sum
M=0 // sum = 0
(LOOP)
@i // if i>RAM[0] goto STOP
D=M
@R0
D=D-M
@STOP
D;JGT
@i // sum += i
D=M
@sum
M=D+M
@i // i++
M=M+1
@LOOP // goto LOOP
0;JMP
(STOP)
@sum
D=M
@R1
M=D // RAM[1] = the sum
(END)
@END
0;JMP
Assembly program
Credit: www.nand2tetris.org
13
Symbol table
Assembly program
symbol value
R0 0
R1 1
R2 2
… …
R15 15
SCREEN 16384
KBD 24576
SP 0
LCL 1
ARG 2
THIS 3
THAT 4
LOOP 4
STOP 18
END 22
Symbol table
Initialization:
Add the pre-defined symbols
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
First pass:
Add the label symbols
// Computes RAM[1] = 1 + … + RAM[0]
@i
M=1 // i = 1
@sum
M=0 // sum = 0
(LOOP)
@i // if i>RAM[0] goto STOP
D=M
@R0
D=D-M
@STOP
D;JGT
@i // sum += i
D=M
@sum
M=D+M
@i // i++
M=M+1
@LOOP // goto LOOP
0;JMP
(STOP)
@sum
D=M
@R1
M=D // RAM[1] = the sum
(END)
@END
0;JMP
Credit: www.nand2tetris.org
14
Symbol table
Assembly program
symbol value
R0 0
R1 1
R2 2
… …
R15 15
SCREEN 16384
KBD 24576
SP 0
LCL 1
ARG 2
THIS 3
THAT 4
LOOP 4
STOP 18
END 22
Symbol table
Initialization:
Add the pre-defined symbols
First pass:
Add the label symbols
// Computes RAM[1] = 1 + … + RAM[0]
@i
M=1 // i = 1
@sum
M=0 // sum = 0
(LOOP)
@i // if i>RAM[0] goto STOP
D=M
@R0
D=D-M
@STOP
D;JGT
@i // sum += i
D=M
@sum
M=D+M
@i // i++
M=M+1
@LOOP // goto LOOP
0;JMP
(STOP)
@sum
D=M
@R1
M=D // RAM[1] = the sum
(END)
@END
0;JMP
Credit: www.nand2tetris.org
15
Symbol table
Assembly program
symbol value
R0 0
R1 1
R2 2
… …
R15 15
SCREEN 16384
KBD 24576
SP 0
LCL 1
ARG 2
THIS 3
THAT 4
LOOP 4
STOP 18
END 22
i 16
sum 17
Symbol table
Usage:
To resolve a symbol, look up its value in the symbol table
Initialization:
Add the pre-defined symbols
First pass:
Add the label symbols
Second pass:
Add the var. symbols
// Computes RAM[1] = 1 + … + RAM[0]
@i
M=1 // i = 1
@sum
M=0 // sum = 0
(LOOP)
@i // if i>RAM[0] goto STOP
D=M
@R0
D=D-M
@STOP
D;JGT
@i // sum += i
D=M
@sum
M=D+M
@i // i++
M=M+1
@LOOP // goto LOOP
0;JMP
(STOP)
@sum
D=M
@R1
M=D // RAM[1] = the sum
(END)
@END
0;JMP
Credit: www.nand2tetris.org
16
The assembly process
Initialization:
Construct an empty symbol table
Add the pre-defined symbols to the symbol table
First pass:
Scan the entire program;
For each “instruction” of the form (xxx):
Add the pair (xxx, address) to the symbol table,
where address is the number of the instruction following (xxx)
Second pass:
Set n to 16
Scan the entire program again; for each instruction:
If the instruction is @symbol, look up symbol in the symbol table;
If (symbol, value) is found, use value to complete the instruction’s translation;
If not found:
Add (symbol, n) to the symbol table,
Use n to complete the instruction’s translation,
n++
If the instruction is a C-instruction, complete the instruction’s translation
Write the translated instruction to the output file.
Credit: www.nand2tetris.org
17
Hack Assembler
0000000000010000
1110111111001000
0000000000010001
1110101010001000
0000000000010000
1111110000010000
0000000000000000
1111010011010000
0000000000010010
1110001100000001
0000000000010000
1111110000010000
0000000000010001
1111000010001000
0000000000010000
1111110111001000
0000000000000100
1110101010000111
0000000000010001
1111110000010000
0000000000000001
1110001100001000
0000000000010110
1110101010000111
Hack machine code
Challenges:
Handling…
White space
Instructions
Symbols
Assembly program
// Computes RAM[1] = 1 + … + RAM[0]
@i
M=1 // i = 1
@sum
M=0 // sum = 0
(LOOP)
@i // if i>RAM[0] goto STOP
D=M
@R0
D=D-M
@STOP
D;JGT
@i // sum += i
D=M
@sum
M=D+M
@i // i++
M=M+1
@LOOP // goto LOOP
0;JMP
(STOP)
@sum
D=M
@R1
M=D // RAM[1] = the sum
(END)
@END
0;JMP
Assembler
Credit: www.nand2tetris.org
18
18
Assembler: lecture plan
Assembler logic (basic)
The Hack assembly language
The assembly process: instructions
The assembly process: symbols
Developing an assembler
Project 6 overview
Credit: www.nand2tetris.org
19
Sub-tasks that need to be done
Reading and parsing commands
Converting mnemonics code
Handling symbols
Credit: www.nand2tetris.org
20
Reading and Parsing Commands
No need to understand the meaning of anything
Credit: www.nand2tetris.org
21
Reading and Parsing Commands
Start reading a file with a given name
Credit: www.nand2tetris.org
22
Reading and Parsing Commands
Start reading a file with a given name
E.g. Constructor for a Parser object that accepts a string specifying a file name.
Need to know how to read text files
Credit: www.nand2tetris.org
23
Reading and Parsing Commands
Start reading a file with a given name
Move to the next command in the file
Credit: www.nand2tetris.org
24
Reading and Parsing Commands
Start reading a file with a given name
Move to the next command in the file
Are we finished? boolean hasMoreCommands()
Get the next command: void advance()
Need to read one line at a time
Need to skip whitespace including comments
Credit: www.nand2tetris.org
25
Reading and Parsing Commands
Start reading a file with a given name
Move to the next command in the file
Get the fields of the current command
Credit: www.nand2tetris.org
26
Reading and Parsing Commands
Start reading a file with a given name
Move to the next command in the file
Get the fields of the current command
Type of current command (A-Command, C-Command, or Label)
Credit: www.nand2tetris.org
27
Reading and Parsing Commands
Start reading a file with a given name
Move to the next command in the file
Get the fields of the current command
Type of current command (A-Command, C-Command, or Label)
Easy access to the fields:
D=M+1; JGT @sum
D
M + 1
J G T
s u m
Credit: www.nand2tetris.org
28
Reading and Parsing Commands
Start reading a file with a given name
Move to the next command in the file
Get the fields of the current command
Type of current command (A-Command, C-Command, or Label)
Easy access to the fields:
D=M+1; JGT @sum
D
M + 1
J G T
s u m
String dest(); String comp(); String jump(); String label();
Credit: www.nand2tetris.org
29
Translating Mnemonic to Code: overview
Symbolic syntax:
1 1 1 a c1 c2 c3 c4 c5 c6 d1 d2 d3 j1 j2 j3
Binary syntax:
dest = comp ; jump
Credit: www.nand2tetris.org
30
Translating Mnemonic to Code: overview
No need to worry about how the mnemonic fields were obtained
Credit: www.nand2tetris.org
31
Translating Mnemonic to Code: destination
Symbolic syntax:
1 1 1 a c1 c2 c3 c4 c5 c6 d1 d2 d3 j1 j2 j3
Binary syntax:
dest = comp ; jump
dest d1 d2 d3
null 0 0 0
M 0 0 1
D 0 1 0
MD 0 1 1
A 1 0 0
AM 1 0 1
AD 1 1 0
AMD 1 1 1
Credit: www.nand2tetris.org
32
Translating Mnemonic to Code: jump
Symbolic syntax:
1 1 1 a c1 c2 c3 c4 c5 c6 d1 d2 d3 j1 j2 j3
Binary syntax:
dest = comp ; jump
jump j1 j2 j3
null 0 0 0
JGT 0 0 1
JEQ 0 1 0
JGE 0 1 1
JLT 1 0 0
JNE 1 0 1
JLE 1 1 0
JMP 1 1 1
Credit: www.nand2tetris.org
33
Translating Mnemonic to Code: computation
Symbolic syntax:
1 1 1 a c1 c2 c3 c4 c5 c6 d1 d2 d3 j1 j2 j3
Binary syntax:
dest = comp ; jump
comp c1 c2 c3 c4 c5 c6
0 1 0 1 0 1 0
1 1 1 1 1 1 1
-1 1 1 1 0 1 0
D 0 0 1 1 0 0
A M 1 1 0 0 0 0
!D 0 0 1 1 0 1
!A !M 1 1 0 0 0 1
-D 0 0 1 1 1 1
-A -M 1 1 0 0 1 1
D+1 0 1 1 1 1 1
A+1 M+1 1 1 0 1 1 1
D-1 0 0 1 1 1 0
A-1 M-1 1 1 0 0 1 0
D+A D+M 0 0 0 0 1 0
D-A D-M 0 1 0 0 1 1
A-D M-D 0 0 0 1 1 1
D&A D&M 0 0 0 0 0 0
D|A D|M 0 1 0 1 0 1
a=0 a=1
Credit: www.nand2tetris.org
34
Recap: Parsing + Translating
Symbolic syntax:
1 1 1 a c1 c2 c3 c4 c5 c6 d1 d2 d3 j1 j2 j3
Binary syntax:
dest = comp ; jump
// Assume that current command is
// D = M+1; JGT
String c=parser.comp(); // “M+1”
String d=parser.dest(); // “D”
String j=parser.jump(); // “JGT”
String cc = Code.comp(c); // “1110111”
String dd = Code.dest(d); // “010”
String jj = Code.jump(j); // “001”
String out = “111” + cc + dd + jj;
Credit: www.nand2tetris.org
35
The Symbol Table
Symbol Address
loop 73
sum 12
No need to worry about what these symbols mean
Credit: www.nand2tetris.org
36
The Symbol Table
Create an new empty table
Add a (symbol , address) pair to the table
Does the table contain a given symbol?
What is the address associated with a given symbol?
Symbol Address
loop 73
sum 12
Credit: www.nand2tetris.org
37
Using the Symbol Table
Create a new empty table
Add all the pre-defined symbols to the table
While reading the input, add labels and new variables to the table
Whenever you see a “@xxx” command, where xxx is not a number,
consult the table to replace the symbol xxx with its address.
Credit: www.nand2tetris.org
38
Using the Symbol Table: adding symbols
…
…
While reading the input, add labels and new variables to the table
Labels: when you see a “(xxx)” command, add the symbol xxx and the address of the next machine language command
Comment 1: this requires maintaining this running address
Comment 2: this may need to be done in a first pass
Variables: when you see an “@xxx” command, where xxx is not a number and not already in the table, add the symbol xxx and the next free address for variable allocation
Credit: www.nand2tetris.org
39
Overall logic
Initialization
Of Parser
Of Symbol Table
First Pass: Read all commands, only paying attention to labels and updating the symbol table
Restart reading and translating commands
Main Loop:
Get the next Assembly Language Command and parse it
For A-commands: Translate symbols to binary addresses
For C-commands: get code for each part and put them together
Output the resulting machine language command
Credit: www.nand2tetris.org
40
Parser module: proposed API
Credit: www.nand2tetris.org
41
Code module: proposed API
Credit: www.nand2tetris.org
42
SymbolTable module: proposed API
Credit: www.nand2tetris.org
43
Assembler: lecture plan
The assembly process
The Hack assembly language
The assembly process: instructions
The assembly process: symbols
Developing an assembler
Project 6 overview
Credit: www.nand2tetris.org
44
Developing a Hack Assembler
Contract
Develop an assembler that translates Hack assembly programs into executable Hack binary code
The source program is supplied in a text file named Xxx.asm
The generated code is written into a text file named Xxx.hack
Assumption: Xxx.asm is error-free
Usage
prompt> java HackAssembler Xxx.asm
This command should create a new Xxx.hack file that can be executed as-is on the Hack computer.
Credit: www.nand2tetris.org
45
Proposed design
The assembler can be implemented in any high-level language
Proposed software design
Parser: unpacks each instruction into its underlying fields
Code: translates each field into its corresponding binary value
SymbolTable: manages the symbol table
Main: initializes I/O files and drives the process.
Credit: www.nand2tetris.org
46
Proposed Implementation
Staged development
Develop a basic assembler that can translate assembly programs without symbols
Develop an ability to handle symbols
Morph the basic assembler into an assembler that can translate any assembly program
Supplied test programs
Add.asm
Max.asm MaxL.asm
Rectangle.asm RectangleL.asm
Pong.asm PongL.asm
Credit: www.nand2tetris.org
47
Test program: Add
Basic test of handling:
White space
Instructions
// Computes RAM[0] = 2 + 3
@2
D=A
@3
D=D+A
@0
M=D
Add.asm
Credit: www.nand2tetris.org
48
48
Test program: Max
// Computes RAM[2] = max(RAM[0],RAM[1])
@R0
D=M // D = RAM[0]
@R1
D=D-M // D = RAM[0] – RAM[1]
@OUTPUT_RAM0
D;JGT // if D>0 goto output RAM[0]
// Output RAM[1]
@R1
D=M
@R2
M=D // RAM[2] = RAM[1]
@END
0;JMP
(OUTPUT_RAM0)
@R0
D=M
@R2
M=D // RAM[2] = RAM[0]
(END)
@END
0;JMP
Max.asm
// Symbol-less version
@0
D=M // D = RAM[0]
@1
D=D-M // D = RAM[0] – RAM[1]
@12
D;JGT // if D>0 goto output RAM[0]
// Output RAM[1]
@1
D=M
@2
M=D // RAM[2] = RAM[1]
@16
0;JMP
@0
D=M
@2
M=D // RAM[2] = RAM[0]
@16
0;JMP
MaxL.asm
with labels
without labels
Credit: www.nand2tetris.org
49
49
Test program: Rectangle
Credit: www.nand2tetris.org
50
// Rectangle.asm
@R0
D=M
@n
M=D // n = RAM[0]
@i
M=0 // i = 0
@SCREEN
D=A
@address
M=D // base address of the Hack screen
(LOOP)
@i
D=M
@n
D=D-M
@END
D;JGT // if i>n goto END
…
Rectangle.asm
Test program: Rectangle
// Symbol-less version
@0
D=M
@16
M=D // n = RAM[0]
@17
M=0 // i = 0
@16384
D=A
@18
M=D // base address of the Hack screen
@17
D=M
@16
D=D-M
@27
D;JGT // if i>n goto END
…
RectangleL.asm
with symbols
without symbols
Credit: www.nand2tetris.org
51
51
// Pong.asm
@256
D=A
@SP
M=D
@133
0;JMP
@R15
M=D
@SP
AM=M-1
D=M
A=A-1
D=M-D
M=0
@END_EQ
D;JNE
@SP
A=M-1
M=-1
(END_EQ)
@R15
A=M
0;JMP
@R15
M=D
…
Pong.asm
Test program: Pong
Observations:
Source code originally written in the Jack language
The Hack code was generated by the Jack compiler and the Hack assembler
The resulting code is 28,374 instructions long
(includes the Jack OS)
Machine generated code:
No white space
“Strange” addresses
“Strange” labels
“Strange” pre-defined symbols
Credit: www.nand2tetris.org
52
52
Testing options
Use your assembler to translate Xxx.asm,
generating the executable file Xxx.hack
Hardware simulator:
load Xxx.hack into the Hack Computer chip, then execute it
CPU Emulator:
load Xxx.hack into the supplied CPUEmulator, then execute it
Assembler:
use the supplied Assembler to translate Xxx.asm;
Compare the resulting code to the binary code generated by your assembler.
Credit: www.nand2tetris.org
53
53
Testing your assembler using the supplied assembler
Source Xxx.asm file
Xxx.hack file, translated by your
assembler
Xxx.hack file, translated by the supplied
assembler
Credit: www.nand2tetris.org
54
Assembler: lecture plan
The assembly process
The Hack assembly language
The assembly process: instructions
The assembly process: symbols
Developing an assembler
Project 6 overview
Credit: www.nand2tetris.org
55
…
push x
push width
add
push 511
gt
if-goto L1
goto L2
L1:
push 511
push width
sub
pop x
L2:
…
// push 511
@511
D=A // D=511
@SP
A=M
M=D // *SP=D
@SP
M=M+1 // SP++
Virtual machine program
Assembly program
0000000000000000
1110110010001000
Executable
VM
translator
Assembler
push 511
@SP
M=M+1 // SP++
/docProps/thumbnail.jpeg