Compiler Theory
Assignment 1
January 22, 2019
Finite State Machine for Lexical Analysis
1. Make a list of all the tokens found in Atto-C. You may for now treat
all keywonds and identifiers as one type of token, call it IDENT.
Don’t forget whitespace and comments!
2. Draw a Finite State Machine diagram for recognizing all the tokens in
your list. Make it a nice big diagram, filling a sheet of paper.
There should be a START state in the center. Each edge of the diagram
should be labelled with the input character or characters causing the
transition along that edge. You may use “digit” instead of listing
all the individual digits, and “letter” for all the 52 upper and lower
case letters, and “not double-quote” inside string tokens. Otherwise,
label with the actual characters. Well, you can say “space” and “tab” and
“newline” for those characters. Each final state should be double-circled
and labelled inside with the token it is recognizing (one and two character
symbol tokens may just use themselves as labels).
Some test input for your Machine (remember there are newlines at the end
of each line):
asdf 2345 O { -23 + 35 /* a comment */ “Hello world!”
while||earth art_of_programming __asm // end-of-line comment
243CAPS234 _____ TwoTokens /*****************/
Also to do:
Read the text on lexical analysis:
Chapter 1 sections 1-4
Chapter 2 sections 1-3
Chapter 3 sections 1-2
If you will be using your own computer for writing your compiler, install
Microsoft Visual Studio 2017 Community Edition (free) from
https://visualstudio.microsoft.com/downloads/.
Watch some YouTube videos on Finite State Machines (not the ones talking
about regular expnessions, though; we’ll get to those next time).