Tokenizing
19 March 2019 OSU CSE 1
BL Compiler Structure
Copyright By PowCoder代写 加微信 powcoder
string of characters (source code)
string of tokens (“words”)
abstract program
integers (object code)
The tokenizer is relatively easy.
Code Generator
19 March 2019 OSU CSE
Aside: Characters vs. Tokens
• In the examples of CFGs, we dealt with languages over the alphabet of individual
characters (e.g., Java’s char values) Σ = character
• Now, we deal with languages over an alphabet of tokens, each of which is a unit that you want to consider as a single entity in the language
– Choice of tokens is a design decision
19 March 2019 OSU CSE 3
expr term factor add-op mult-op
digit-seq digit
→ expr add-op term | term
→ term mult-op factor | factor → ( expr ) | digit-seq →+|-
→ * | DIV | REM
→ digit digit-seq | digit →0|1|2|3|4|5|6|7|8|9
19 March 2019
Example: Expression CFG
expr term factor add-op mult-op
digit-seq digit
→ expr add-op term | term consecutive terminal
19 March 2019
→0|1|2|3|4|5|6|7|8|9
Appropriate tokens for Example: Expression CFG
this CFG are “words” consisting of strings of
→ term mult-op factor | factor symbols (characters) that
“belong together”, e.g., → ( expr ) | digit-seq
“+”, “DIV”, “5”. → digit digit-seq | digit
→ * | DIV | REM
• The job of the tokenizer is to transform a string of characters into a string of tokens
• Example:
– Input:”4 + (7 DIV 3) REM 5″
19 March 2019 OSU CSE 6
• The job of the tokenizer is to transform a string of characters into a string of tokens
• Example:
– Input:”4 + (7 DIV 3) REM 5″
characters used as terminal symbols of the language
19 March 2019 OSU CSE 7
• The job of the tokenizer is to transform a string of characters into a string of tokens
• Example:
– Input:”4 + (7 DIV 3) REM 5″
whitespace characters
19 March 2019 OSU CSE 8
• The job of the tokenizer is to transform a string of characters into a string of tokens
• Example:
– Input:”4 + (7 DIV 3) REM 5″
Mathematically, input is a
string of character
19 March 2019 OSU CSE 9
• The job of the tokenizer is to transform a string of characters into a string of tokens
• Example:
– Input:”4 + (7 DIV 3) REM 5″
– Output: <"4", "+", "(", "7", "DIV",
"3", ")", "REM", "5">
Mathematically, output is a
string of string of character
19 March 2019 OSU CSE 10
Another Example: BL
• In BL, tokens can be the “words” such as
“IF”, “next-is-empty”, etc.
• A BL tokenizer is then easy: it can simply treat strings of consecutive whitespace characters as separators between tokens
– This makes it easy for the language to allow line separators, extra spaces and tabs used for indentation, etc., to have no impact on the legality of a program
19 March 2019 OSU CSE 11
Resources • Wikipedia: Lexical Analysis
– http://en.wikipedia.org/wiki/Lexical_analysis
19 March 2019 OSU CSE 12
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com