Overview of Compilation Readings: EAC2 Chapter 1
EECS4302 M: Compilers and Interpreters Winter 2020
CHEN-WEI WANG
What is a Compiler? (1)
A software system that automatically translates/transforms
input/source programs (written in one language) to output/target programs (written in another language).
2 of 18
input
semantic domain
Input/Source
Language
Input/Source
Program
encoded into
passed to
encoded into
output
semantic domain
Output/Target
Language
Output/Target
Program
Compiler
generates
X : context with its own vocabulary and meanings e.g., OO, database, predicates
X Source and target may be in different semantic domains.
e.g., Java programs to SQL relational database schemas/queries
e.g., C procedural programs to MISP assembly instructions
Semantic Domain
What is a Compiler? (2)
The idea about a compiler is extremely powerful: You can turn anything to anything else,
as long as the following are clear about them:
X SYNTAX [ specifiable as CFGs ] X SEMANTICS [ programmable as mapping functions ]
Construction of a compiler should conform to good software engineering principles .
X Modularity & Information Hiding [ interacting components ] X Single Choice Principle
X Design Patterns (e.g., composite, visitor)
X Regression Testing at different levels: e.g., Unit & Acceptance
3 of 18
As the single-box model suggests, a compiler must both understand the
ata structures to
● sEtrunctourdeefsorklantoerwulseedbgyethoefbtahcekteanrdg. Tehtilsaintgeurmaegdeiate representation (ir)
● bTercaonmsefsotrhme cso:mfrpoilmer’tshdefiInRititvoe trheperetsaerngtaetiton for the code it is translating.
machine. The distinct nature of these two tasks suggests a division of labor and leads to a design that decomposes compilation into two major pieces: a
Compiler: Typical Infrastructure (1)
front end and a back end.
Source Target
Program Program
Front End
IR
Back End
X FRTOhNe frEonNtDen:d focuses on understanding the source-language program. The back end focuses on mapping programs to the target machine. This sep-
● Encodes:knowledgeofthesourcelanguage
aration of concerns has several important implications for the design and
● Transforms:fromthesourcetosomeIR(intermediaterepresentation) implementatiers.
● Principle: of the source must be preserved in the IR.
Compiler
on of compil
meaning
X BATCheKfEronNtDe:nd must encode its knowledge of the source program in some
cesses. That form is
ntation, or IR. Q. How many IRs needed for building a number of compilers:
At each point in compilation, the compiler will have a definitive represen-
tation. It may, in fact, use several different irs as compilation progresses, JAVA-TO-C, EIFFEL-TO-C, JAVA-TO-PYTHON, EIFFEL-TO-PYTHON?
but at each point, one representation will be the definitive ir. We think of A. Two IRs suffice: One for OO; one for procedural.
the definitive ir as the version of the program passed between independent pIhRasseshofuthlde cboempailser,laiknegthueairgpea-sisnedefrpomenthdeefnrotntaensdptotshesibalcek.end
4 of 18
in the preceding drawing.
source program that it takes as input and map its functionality to the target
called an
produces a semantically equivalent ir program as its output. By using the ir
stheIRto
as an interface, the compiler writer can insert this third phase with minimal
disruption to the front end and back end. This leads to the following compiler Comsptruiclteurre,:terTmyedpaithcreae-lphIansefcromapsiletrr.ucture(2)
Source Target
Program Program
Front End
IR IR
Optimizer
The optimizer is an ir-to-ir transformer that tries to improve the ir program OPTIMIZER:
1. 2. 3.
5 of 18
uses less energy.
in some way. (Notice that these transformers are, themselves, compilers
X An IR-to-IR transformer that aims at “improving” the output of according to our definition in Section 1.1.) The optimizer can make one or
front end, before passing it as input of the back end.
more passes over the ir, analyze the ir, and rewrite the ir. The optimizer
X Think of this transformer as attempting to discover an “optimal” may rewrite the ir in a way that is likely to produce a faster target program
solution to some computational problem.
from the back end or a smaller target program from the back end. It may
e.g., runtime performance, static design
have other objectives, such as a program that produces fewer page faults or
Q. Behaviour of the target program predicated upon?
Meaning of the source preserved in IR?
Conceptually, the three-phase structure represents the classic optimizing
IR-to-IR transformation of the optimizer semantics-preserving? compiler. In practice, each phase is divided internally into a series of passes.
Meaning of IR preserved in the generated target?
Thefrontendconsistsoftwoorthreepassesedetailsof
(1) – (3) necessary & sufficient for the of a compiler.
recognizing valid source-language programs and producing the initial ir form of the program. The middle section contains passes that perform dif-
Back End
Compiler
that handle th
soundness
m
Example Compiler One
Consider a conventional compiler which turns
a C-like program into executable machine instructions.
The source (C-like program) and target (machine instructions)
are at different levels of abstraction :
X C-like program is like “high-level” specification.
X Macine instructions are the low-level, efficient implementation.
EE
EE
Front End
Optimizer
Back End
EE
…
6 of 18
EEEE
TTTT T c c c c c
T c
T c
T c
T c
1.3 Overvie
Infrastructure
sFIGURE1.1 StructureofaTypicalCompiler.
w
Scanner Parser Elaboration Optimization 1 Optimization 2 Optimization n Inst Selection Inst Scheduling Reg Allocation
Example Compiler One:
Scanner vs. Parser vs. Optimizer
Lexical Analysis Source Program
(seq. of characters)
Scanner
Syntactic Analysis seq. of tokens
Parser
Semantic Analysis
AST1 … ASTn
pretty printed
Target Program
The same input program may be treated differently:
1. As a character sequence
2. As a token sequence
3. As a abstract syntax tree (AST)
(1) & (2) are routine tasks of lexical/grammar rule specification.
(3) is where the is about writing a compiler:
A series of semantics-preserving AST-to-AST transformations. 7 of 18
most fun
[ subject to lexical analysis ] [ subject to syntactic analysis ] [ subject to semantic analysis ]
Example Compiler One: Scanner
The source program is treated as a sequence of characters. A scanner performs lexical analysis on the input character
sequence and produces a sequence of tokens. ANALOGY: Tokens are like individual words in an essay.
Invalid tokens Misspelt words
e.g., a token for a useless delimiter: e.g., space, tab, new line e.g., a token for a useful delimiter: e.g., (, ), {, }, ,
e.g., a token for an identifier (for e.g., a variable, a function) e.g., a token for a keyword (e.g,. int, char, if, for, while) e.g., a token for a number (for e.g., 1.23, 2.46)
Q. How to specify such pattern pattern of characters?
A. Regular Expressions (REs)
e.g., RE for keyword while
e.g., RE for an identifier
e.g., RE for a white space
8 of 18
[ while ] [ [a-zA-Z][a-zA-Z0-9_]* ] [ [ \t\r]+ ]
Example Compiler One: Parser
A parser’s input is a sequence of tokens (by some scanner). A parser performs syntactic analysis on the input token
sequence and produces an abstract syntax tree (AST). ANALOGY: ASTs are like individual sentences in an essay.
Tokens not parseable into a valid AST Grammatical errors
Q. An essay with no speling and grammatical errors good enough? A. No, it may talk about non-sense (sentences in wrong contexts).
An input program with no lexical/syntactic errors should still be subject to semantic analysis (e.g., type checking, code optimization).
Q.: How to specify such pattern pattern of tokens? A.: Context-Free Grammars (CFGs)
e.g., CFG (with terminals and non-terminals) for a while-loop:
9 of 18
WhileLoop ∶∶= WHILE LPAREN BoolExpr RPAREN LCBRAC Impl RCBRAC Impl ∶∶=
∣ Instruction SEMICOL Impl
Example Compiler One: Optimizer
Consider an input which has the pretty printing:
AST
b := … ; c := … ; a := … across i |..| n is i
loop
read d
a := a * 2 * b * c * d end
Q. AST of above program optimized for performance?
A. No ì values of 2, b, c stay invariant within the loop. An optimizer may AST like above into:
10 of 18
transform
b := … ; c := … ; a := … temp := 2 * b * c
across i |..| n is i
loop
read d
a := a * d end
Example Compiler Two
Consider a compiler which turns a Domain-Specific Language (DSL) of classes & predicates into a SQL database.
The input/source contains 2 parts:
X DATA MODEL: classes and associations (client-supplier relations) e.g., data model of a Hotel Reservation System:
mentor 0..1
mentee 0..1
employees *seq
consultants *seq
reservations *seq
allocations *
Staff
reservations host *seq 1
License
permit 1*
licensee 1
reglist **
employers *seq
clients
room 0..1
room 0..1
Hotel
Account
account owner 0..1 1
Traveller
registered
Reservation
Allocation
host allocations 1*
host 1
rooms *seq
Room
X BEHAVIOURAL MODEL: update methods specified as predicates 11 of 18
Example Compiler Two: Mapping Data
class A { attributes
s: string
as: set(A . b) [*] }
Each class is turned into a class table:
X Column oid stores the object reference. [ PRIMARY KEY ] X Implementation strategy for attributes:
class B { attributes
is: set (int) b: B . as }
SINGLE-VALUED
MULTI-VALUED
PRIMITIVE-TYPED
column in class table
collection table
REFERENCE-TYPED
association table
Each collection table:
X Column oid stores the context object.
X 1 column stores the corresponding primitive value or oid.
Each association table:
X Column oid stores the association reference.
X 2 columns store oid’s of both association ends. [ FOREIGN KEY ] 12 of 18
Example Compiler Two: Input/Source Consider a valid input/source program:
class Account { attributes
owner: Traveller . account
balance: int }
class Traveller { attributes
name: string
reglist: set(Hotel . registered)[*] }
class Hotel { attributes
name: string
registered: set(Traveller . reglist)[*] methods
register {
t? : extent(Traveller)
& t? /: registered ==>
registered := registered \/ {t?} || t?.reglist := t?.reglist \/ {this}
} }
How do you specify the scanner and parser accordingly? 13 of 18
Example Compiler Two: Output/Target
Class associations are compiled into database schemas.
CREATE TABLE ‘Account‘(
‘oid‘ INTEGER AUTO_INCREMENT,‘balance‘ INTEGER, PRIMARY KEY (‘oid‘));
CREATE TABLE ‘Traveller‘(
‘oid‘ INTEGER AUTO_INCREMENT,‘name‘ CHAR(30), PRIMARY KEY (‘oid‘));
CREATE TABLE ‘Hotel‘(
‘oid‘ INTEGER AUTO_INCREMENT,‘name‘ CHAR(30), PRIMARY KEY (‘oid‘));
CREATE TABLE ‘Account_owner_Traveller_account‘(
‘oid‘ INTEGER AUTO_INCREMENT, ‘owner‘ INTEGER, ‘account‘ INTEGER, PRIMARY KEY (‘oid‘));
CREATE TABLE ‘Traveller_reglist_Hotel_registered‘(
‘oid‘ INTEGER AUTO_INCREMENT, ‘reglist‘ INTEGER, ‘registered‘ INTEGER, PRIMARY KEY (‘oid‘));
Predicate methods are compiled into stored procedures.
14 of 18
CREATE PROCEDURE ‘Hotel_register‘(IN ‘this?‘ INTEGER, IN ‘t?‘ INTEGER) BEGIN
…
END
Example Compiler Two: Mapping Behaviour
Challenge: Transform the OO dot notation into table queries. e.g., The AST corresponding to the following dot notation
(in context of class Account, retrieving the owner’s list of registrations)
may be transformed into the following (nested) table lookups:
this.owner.reglist
SELECT (VAR ‘reglist‘)
(TABLE ‘Hotel_registered_Traveller_reglist‘) (VAR ‘registered‘ = (SELECT (VAR ‘owner‘)
(TABLE ‘Account_owner_Traveller_account‘) (VAR ‘owner‘ = VAR ‘this‘)))
At the database level:
X Maintaining a large amount of data is efficient
X Specifying data and updates is tedious & error-prone. X RESOLUTIONS:
● DefineaDSLsupportingtherightlevelofabstractionforspecification
● Implement a DSL-TO-SQL compiler. 15 of 18
Beyond this lecture . . .
Read Chapter 1 of EAC2 to find out more about Example Compiler One
Read this paper to find out more about Example Compiler Two:
http://dx.doi.org/10.4204/EPTCS.105.8
16 of 18
Index (1)
What is a Compiler? (1)
What is a Compiler? (2)
Compiler: Typical Infrastructure (1)
Compiler: Typical Infrastructure (2)
Example Compiler One
Example Compiler One:
Scanner vs. Parser vs. Optimizer
Example Compiler One: Scanner Example Compiler One: Parser Example Compiler One: Optimizer
Example Compiler Two
17 of 18
Index (2)
Example Compiler Two: Mapping Data Example Compiler Two: Input/Source Example Compiler Two: Output/Target Example Compiler Two: Mapping Behaviour Beyond this lecture. . .
18 of 18