Overview of Compilation Readings: EAC2 Chapter 1
6 CHAPTE
EECS4302 M: Compilers and Interpreters Winter 2020
CHEN-WEI WANG
What is a Compiler? (2)
● The idea about a compiler is extremely powerful: rview of Compilation
You can turn anything to anything else,
as long as the following are clear about them:
○ SYNTAX [ specifiable as CFGs ] A traditional compiler improves the input program by making it directly
○ SEMANTICS [ programmable as mapping functions ] executable on some target machine. Other “compilers” improve their input
● Construction of a compiler should conform to good
in different ways. For example, tpic is a program that takes the specifica-
tion for a drawing written in the graphic
software engineering principles
○ Modularity & Information Hiding [ interacting components ] A source-to-source translator for c must produce code that is, in some mea-
LTEX; the “improvement” lies in LTEX’s greater availability and generality.
AA
○ Single Choice Principle
sure, better than the input program; if it is not, why would anyone invoke it?
○ Design Patterns (e.g., composite, visitor)
○ Regression Testing at different levels: e.g., Unit & Acceptance
1.2 COMPILER STRUCTURE
s language pic and converts it into .
3 of 18
A compiler is a large, complex software system. The community has been building compilers since 1955, and over the years, we have learned many lessons about how to structure a compiler. Earlier, we depicted a compiler as a simple box that translates a source program into a target program. Reality,
R 1 Ove
of course, is more complex than that simple picture.
As the single-box model suggests, a compiler must both understand the
What is a Compiler? (1)
A software system that automatically translates/transforms input/source programs (written in one language) to output/target programs (written in another language).
○
R
: context with its own vocabulary and meanings
e.g., OO, database, predicates
○ Source and target may be in different semantic domains.
e.g., Java programs to SQL relational database schemas/queries 2 of 18 e.g., C procedural programs to MISP assembly instructions
A compiler uses so
input
semantic domain
Input/Source
Language
Input/Source
Program
encoded into
passed to
encoded into
output
semantic domain
Output/Target
Language
Output/Target
Program
Compiler
generates
I
Semantic Domain
represent the code called an intermed
ata structures to
● bTercaonmsefsotrhme cso:mfrpoilmer’tshdefiInRititvoe trheperetsaerngtaetiton for the code it is translating.
source program that it takes as input and map its functionality to the target
machine. The distinct nature of these two tasks suggests a division of labor and leads to a design that decomposes compilation into two major pieces: a
Compiler: Typical Infrastructure (1)
front end and a back end.
Source Target
Program
Front End
○ FRTOhNe frEonNtDen:d focuses on understanding the source-language program. The back end focuses on mapping programs to the target machine. This sep-
IR
Back End
Compiler
Program
● Encodes:knowledgeofthesourcelanguage
aration of concerns has several important implications for the design and
● Transforms:fromthesourcetosomeIR(intermediaterepresentation) implementatiers.
● Principle: of the source must be preserved in the IR. ○ BATCheKfEronNtDe:nd must encode its knowledge of the source program in some
● sEtrunctourdeefsorklantoerwulseedbgyethoefbtahcekteanrdg. Tehtilsaintgeurmaegdeiate representation (ir)
cesses. That form is
ntation, or IR. Q. How many IRs needed for building a number of compilers:
At each point in compilation, the compiler will have a definitive represen-
tation. It may, in fact, use several different irs as compilation progresses, JAVA-TO-C, EIFFEL-TO-C, JAVA-TO-PYTHON, EIFFEL-TO-PYTHON?
but at each point, one representation will be the definitive ir. We think of A. Two IRs suffice: One for OO; one for procedural.
the definitive ir as the version of the program passed between independent 4of18⇒pIhRasseshofuthldecboempailser,laiknegthueairgpea-sisnedefrpomenthdeefnrotntaensdptotshesibalcek.end
in the preceding drawing.
on of compil
meaning
mesetofd that it pro iate represe
In a two-phase compiler, the front end must ensure that the source program is well formed, and it must map that code into the ir. The back end must map
Introducing an ir makes it possible to add more phases to compilation. The compiler writer can insert a third phase between the front end and the back end. This middle section, or optimizer, takes an ir program as its input and
called an
produces a semantically equivalent ir program as its output. By using the ir
stheIRto
as an interface, the compiler writer can insert this third phase with minimal
disruption to the front end and back end. This leads to the following compiler Comsptruiclteurre,:terTmyedpaithcreae-lphIansefcromapsiletrr.ucture(2)
Source Target
Program Program
Front End
IR IR
Optimizer
The optimizer is an ir-to-ir transformer that tries to improve the ir program OPTIMIZER:
○ An IR-to-IR transformer that aims at “improving” the output of according to our definition in Section 1.1.) The optimizer can make one or
in some way. (Notice that these transformers are, themselves, compilers
front end, before passing it as input of the back end.
more passes over the ir, analyze the ir, and rewrite the ir. The optimizer
○ Think of this transformer as attempting to discover an “optimal” may rewrite the ir in a way that is likely to produce a faster target program
solution to some computational problem.
from the back end or a smaller target program from the back end. It may
e.g., runtime performance, static design
have other objectives, such as a program that produces fewer page faults or
Q. Behaviour of the target program predicated upon? uses less energy.
1. Meaning of the source preserved in IR?
Conceptually, the three-phase structure represents the classic optimizing
2. IR-to-IR transformation of the optimizer semantics-preserving? compiler. In practice, each phase is divided internally into a series of passes.
3. Meaning of IR preserved in the generated target?
Thefrontendconsistsoftwoorthreepassesedetailsof
(1) – (3) necessary & sufficient for the of a compiler.
5 of 18 recognizing valid source-language programs and producing the initial ir form of the program. The middle section contains passes that perform dif-
Back End
Compiler
that handle th
soundness
Example Compiler One:
Scanner vs. Parser vs. Optimizer
Lexical Analysis Source Program
(seq. of characters)
Scanner
Syntactic Analysis seq. of tokens
Parser
Semantic Analysis
AST1 … ASTn
pretty printed
Target Program
● The same input program may be treated differently:
1. As a character sequence [ subject to lexical analysis ] 2. As a token sequence [ subject to syntactic analysis ] 3. As a abstract syntax tree (AST) [ subject to semantic analysis ]
● (1) & (2) are routine tasks of lexical/grammar rule specification. ● (3) is where the is about writing a compiler:
most fun
A series of semantics-preserving AST-to-AST transformations. 7 of 18
f a compiler, nd transform
ferent optimizations. The number and purpose of these passes vary from compiler to compiler. The back end consists of a series of passes, each of
which takes the ir program one step closer to the target machine’s instruc-
tion set. The three phases and their individual passes share a common infrastructure. This structure is shown in Figure 1.1.
Example Compiler One
In practice, the conceptual division of a compiler into three phases, a front end, a middle section or optimizer, and a back end, is useful. The problems
● Consider a conventional compiler which turns
addressed by these phases are different. The front end is concerned with
a C-like program into executable machine instructions. understanding the source program and recording the results of its analy-
● The source (C-like program) and target (machine instructions) sis into ir form. The optimizer section focuses on improving the ir form.
are at different levels of abstraction :
○ C-like program is like “high-level” specification.
○ Macine instructions are the low-level, efficient implementation.
1.3 Overvie
⌥⌥⌥⌅⌥⌥⌥⌅⌥⌥⌥⌅ —-
Infrastructure
⌅⌅
—
⌅⌅
⌅⌅
—
Front End
Optimizer
Back End
—
…
⌃⇧⌃⇧⌃⇧⌃⇧⌃ ⇧
6666 6 ⌥? ? ? ? ?⌅
6 ?
⇧⌃
6 ?
⇧⌃
6 ?
⇧⌃
6 ?
⇧⌃
6 of 18
⌃⇧
nFIGURE1.1 StructureofaTypicalCompiler.
Example Compiler One: Scanner
● The source program is treated as a sequence of characters.
● A scanner performs lexical analysis on the input character
lation 9
sequence and produces a sequence of tokens.
● ANALOGY: Tokens are like individual words in an essay.
⇒ Invalid tokens ≈ Misspelt words
e.g., a token for a useless delimiter: e.g., space, tab, new line e.g., a token for a useful delimiter: e.g., (, ), {, }, ,
e.g., a token for an identifier (for e.g., a variable, a function) e.g., a token for a keyword (e.g,. int, char, if, for, while) e.g., a token for a number (for e.g., 1.23, 2.46)
Q. How to specify such pattern pattern of characters?
A. Regular Expressions (REs)
e.g., RE for keyword while
e.g., RE for an identifier
e.g., RE for a white space
8 of 18
[ while ] [ [a-zA-Z][a-zA-Z0-9_]* ] [ [ \t\r]+ ]
The back end must map the transformed ir program onto the bounded resources of the target machine in a way that leads to efficient use of those
w of Trans
o a
Scanner Parser Elaboration Optimization 1 Optimization 2 Optimization n Inst Selection Inst Scheduling Reg Allocation
Example Compiler One: Parser
● A parser’s input is a sequence of tokens (by some scanner). ● A parser performs syntactic analysis on the input token
sequence and produces an abstract syntax tree (AST).
● ANALOGY: ASTs are like individual sentences in an essay.
⇒ Tokens not parseable into a valid AST ≈ Grammatical errors
Q. An essay with no speling and grammatical errors good enough? A. No, it may talk about non-sense (sentences in wrong contexts).
⇒ An input program with no lexical/syntactic errors should still be subject to semantic analysis (e.g., type checking, code optimization).
Q.: How to specify such pattern pattern of tokens? A.: Context-Free Grammars (CFGs)
e.g., CFG (with terminals and non-terminals) for a while-loop:
9 of 18
WhileLoop ∶∶= WHILE LPAREN BoolExpr RPAREN LCBRAC Impl RCBRAC Impl ∶∶= Instruction SEMICOL Impl
Example Compiler Two
e.g., data model of a Hotel Reservation System:
● Consider a compiler which turns a Domain-Specific Language (DSL) of classes & predicates into a SQL database.
● The input/source contains 2 parts:
○ DATA MODEL: classes and associations (client-supplier relations)
mentor 0..1
reservations *seq
mentee 0..1
employees *seq
consultants *seq
account owner 0..1 1
Staff
reservations host *seq 1
License
permit 1*
licensee 1
reglist **
employers *seq
clients
room 0..1
room 0..1
Hotel
Account
registered
Traveller
Reservation
Allocation
host allocations 1*
host 1
rooms *seq
allocations *
Room
○ BEHAVIOURAL MODEL: update methods specified as predicates 11 of 18
Example Compiler One: Optimizer
● Consider an input which has the pretty printing:
AST
b := … ; c := … ; a := … across i |..| n is i
loop
read d
a := a * 2 * b * c * d end
Q. AST of above program optimized for performance?
A. No ∵ values of 2, b, c stay invariant within the loop. ● An optimizer may AST like above into:
transform
b := … ; c := … ; a := … temp := 2 * b * c
across i |..| n is i
loop
read d
a := a * d end
10 of 18
Example Compiler Two: Mapping Data
● Each class is turned into a class table:
○ Column oid stores the object reference. [ PRIMARY KEY ] ○ Implementation strategy for attributes:
class A { attributes
s: string
as: set(A . b) [*] }
class B { attributes
is: set (int) b: B . as }
SINGLE-VALUED
MULTI-VALUED
PRIMITIVE-TYPED
column in class table
collection table
REFERENCE-TYPED
association table
● Each collection table:
○ Column oid stores the context object.
○ 1 column stores the corresponding primitive value or oid.
● Each association table:
○ Column oid stores the association reference.
○ 2 columns store oid’s of both association ends. [ FOREIGN KEY ] 12 of 18
Example Compiler Two: Input/Source
● Consider a valid input/source program:
class Account { attributes
owner: Traveller . account
balance: int }
class Traveller { attributes
name: string
reglist: set(Hotel . registered)[*] }
class Hotel { attributes
name: string
registered: set(Traveller . reglist)[*] methods
register {
t? : extent(Traveller)
& t? /: registered ==>
registered := registered \/ {t?} || t?.reglist := t?.reglist \/ {this}
} }
● How do you specify the scanner and parser accordingly? 13 of 18
Example Compiler Two: Mapping Behaviour
● Challenge: Transform the OO dot notation into table queries. e.g., The AST corresponding to the following dot notation
(in context of class Account, retrieving the owner’s list of registrations)
may be transformed into the following (nested) table lookups:
this.owner.reglist
SELECT (VAR ‘reglist‘)
(TABLE ‘Hotel_registered_Traveller_reglist‘) (VAR ‘registered‘ = (SELECT (VAR ‘owner‘)
(TABLE ‘Account_owner_Traveller_account‘) (VAR ‘owner‘ = VAR ‘this‘)))
● At the database level:
○ Maintaining a large amount of data is efficient
○ Specifying data and updates is tedious & error-prone. ○ RESOLUTIONS:
● DefineaDSLsupportingtherightlevelofabstractionforspecification 15of18 ● ImplementaDSL-TO-SQLcompiler.
Example Compiler Two: Output/Target
● Class associations are compiled into database schemas.
● Predicate methods are compiled into stored procedures. 14 of 18
CREATE TABLE ‘Account‘(
‘oid‘ INTEGER AUTO_INCREMENT,‘balance‘ INTEGER, PRIMARY KEY (‘oid‘));
CREATE TABLE ‘Traveller‘(
‘oid‘ INTEGER AUTO_INCREMENT,‘name‘ CHAR(30), PRIMARY KEY (‘oid‘));
CREATE TABLE ‘Hotel‘(
‘oid‘ INTEGER AUTO_INCREMENT,‘name‘ CHAR(30), PRIMARY KEY (‘oid‘));
CREATE TABLE ‘Account_owner_Traveller_account‘(
‘oid‘ INTEGER AUTO_INCREMENT, ‘owner‘ INTEGER, ‘account‘ INTEGER, PRIMARY KEY (‘oid‘));
CREATE TABLE ‘Traveller_reglist_Hotel_registered‘(
‘oid‘ INTEGER AUTO_INCREMENT, ‘reglist‘ INTEGER, ‘registered‘ INTEGER, PRIMARY KEY (‘oid‘));
CREATE PROCEDURE ‘Hotel_register‘(IN ‘this?‘ INTEGER, IN ‘t?‘ INTEGER) BEGIN
…
END
Beyond this lecture . . .
● Read Chapter 1 of EAC2 to find out more about Example Compiler One
● Read this paper to find out more about Example Compiler Two: 16 of 18
http://dx.doi.org/10.4204/EPTCS.105.8
Index (1)
What is a Compiler? (1)
What is a Compiler? (2)
Compiler: Typical Infrastructure (1)
Compiler: Typical Infrastructure (2)
Example Compiler One
Example Compiler One:
Scanner vs. Parser vs. Optimizer
Example Compiler One: Scanner
Example Compiler One: Parser
Example Compiler One: Optimizer
Example Compiler Two
17 of 18
Index (2)
Example Compiler Two: Mapping Data
Example Compiler Two: Input/Source
Example Compiler Two: Output/Target
Example Compiler Two: Mapping Behaviour
Beyond this lecture. . .
18 of 18