程序代写代做代考 database c++ go compiler data structure flex gui algorithm Java jvm interpreter assembler html assembly Haskell discrete mathematics javascript C COMP90045 Programming Language Implementation

COMP90045 Programming Language Implementation
Introduction and Welcome
Harald Søndergaard Lecture 1
Semester 1, 2019
PLI (Sem 1, 2019) Introduction and Welcome ⃝c University of Melbourne 1 / 36

Welcome to COMP90045
Lecturer: Harald Søndergaard
Tutors: Alice Johnson and Mak Nazecic-Andrlon Prerequisites:
• Algorithms and Complexity
• Declarative Programming (or equivalent), for Haskell proficiency
• Knowledge of C, Java or similar programming language
• Discrete mathematics (sets, functions, relations)
• Plenty of curiosity and enthusiasm, openness to collaborative work
PLI (Sem 1, 2019) Introduction and Welcome ⃝c University of Melbourne 2 / 36

Prescribed Text
We will use this e-book, available through the University Library:
T. Æ. Mogensen: Introduction to Compiler Design (ICD). Springer, 2011.
Even though the lecture slides are quite detailed, you may need to consult books that have the space to cover some topics more thoroughly.
In a few cases the lecture slides depart from notation and algorithms used in those books, and sometimes we go further than those books.
PLI (Sem 1, 2019) Introduction and Welcome ⃝c University of Melbourne 3 / 36

Other Texts
These are also freely available through the University Library: Wilhelm et al.: Compiler Design: Syntactic and Semantic
Analysis (CD-SSA)
Seidl et al.: Compiler Design: Analysis and Transformation
(CD-AT)
One (classic and detailed) text is Aho, Lam, Sethi and Ullman: Compilers: Principles, Techniques, and Tools, Addison-Wesley, also known as the dragon book. The second edition is from 2007.
You may be able to find second-hand copies of that book.
PLI (Sem 1, 2019) Introduction and Welcome ⃝c University of Melbourne 4 / 36

Other Readings and Tools
Other materials have been made available through the LMS.
Under “Readings Online” there are chapters on regular and context-free languages. If you are new to finite-state automata and context-free grammars then the lectures may be too fast, and you will need to read through these chapters first.
Under “Other Resources” you will find a link to JFLAP, a visualization tool that may help you understand finite-state automata and related concepts.
You will also find various Haskell documentation there.
PLI (Sem 1, 2019) Introduction and Welcome ⃝c University of Melbourne 5 / 36

The Project
This subject is also, indirectly, a functional programming subject.
Probably the most important activity in this subject is you writing a compiler, as part of a small team.
This is where Haskell comes in: we will use as an implementation language in the project.
PLI (Sem 1, 2019) Introduction and Welcome ⃝c University of Melbourne 6 / 36

The Timetable
There are two weekly lectures: • Mondays 9–10
• Tuesdays 3:15–4:15
There are four tutorial time slots:
• Thursdays 3:15–4:15 • Thursdays 4:15–5:15 • Fridays 10–11
• Fridays 11–12
PLI (Sem 1, 2019) Introduction and Welcome ⃝c University of Melbourne 7 / 36

Time Commitment
For the 12 weeks of semester, expect roughly
36 hours of class time,
44 hours of reading and tute preparation 40 hours spent on the project
That is an average of 10 hours per week.
As with all subjects, there will be peaks and troughs, but the project,
being a single, staged project, should allow you to plan ahead.
PLI (Sem 1, 2019) Introduction and Welcome ⃝c University of Melbourne 8 / 36

Assessment
A 3-hour final exam (70% of the final mark) at the end of the semester
A staged programming project (30% of the final mark) including peer reviewing
To pass the subject you must obtain at least 50/100 overall and at least 35/70 in the exam.
PLI (Sem 1, 2019) Introduction and Welcome ⃝c University of Melbourne 9 / 36

Getting Started
We want a flying start, so the first two weeks are full of reading! Tutorials start in Week 1.
In the first week, read ICD chapter 1 and chapter 2 up to 2.5.
If you are not familiar with finite-state automata and context-free grammars, read sections 1.1–1.7 and 2.1–2.4 carefully and be prepared to work extra hard in the first three weeks.
PLI (Sem 1, 2019) Introduction and Welcome ⃝c University of Melbourne 10 / 36

How to be Successful
Understand the material, don’t just memorize it.
If you fall behind, try to catch up as fast as possible.
Attempt the tutorial questions every week, before you attend the tutorial, if at all possible.
Support the learning of your fellow students and expect their support, in class and through the LMS discussion board.
Participate in the discussions on the subject’s LMS site and check regularly for announcements.
PLI (Sem 1, 2019) Introduction and Welcome ⃝c University of Melbourne 11 / 36

Syllabus
We study the main components of conventional compilers, mostly in the order in which they are usually executed.
overview
lexical analysis (scanning) syntax analysis (parsing) semantic analysis intermediate code generation final code generation optimization
runtime systems
PLI (Sem 1, 2019) Introduction and Welcome ⃝c University of Melbourne 12 / 36

Why Study Compilers?
Very few of you will end up implementing a full programming language, so why should you study compilers?
Answer 1: Programming language technology is interesting: Compiler theory is full of useful concepts and ideas, with a scope
that goes well beyond PLI.
The theories of automata, parsing, program analysis, and so on, are beautiful in their own right, but they have also proven incredibly powerful in their ability to improve practice.
In fact, it is hard to think of better examples of strong synergy between theory and practice than the development of programming language technologies (parser generators, for example).
PLI (Sem 1, 2019) Introduction and Welcome ⃝c University of Melbourne 13 / 36

Why Study Compilers?
Answer 2: Programming language technology is central and important:
Programming languages (and the associated technologies) are the machine tools of our discipline. They are the tools we use to build tools! The quality of every tool that we build depends on the quality of these machine tools.
Good craftsmen know their tools, and programming languages and compilers are among programmers’ most basic tools. Having an idea of what your high-level program looks like when compiled can only help you write more efficient code.
PLI (Sem 1, 2019) Introduction and Welcome ⃝c University of Melbourne 14 / 36

Why Study Compilers?
Answer 3: Many “compiler ideas” are versatile:
Many of the ideas we cover are very useful in a wide variety of
domains, not just compilers.
Most programs have inputs whose structure is non-trivial to recognize. These inputs may be from users, from files (e.g., configuration files) or from other programs.
Some tasks are best accomplished by generating commands for an existing program that provides expertise for that type of task.
PLI (Sem 1, 2019) Introduction and Welcome ⃝c University of Melbourne 15 / 36

Why Study Compilers?
Answer 4: Building a compiler is a great learning experience.
In this subject, learning-by-doing is king, but what you will learn is
more than compiler construction.
The project is your chance to practice serious functional
programming.
The project is your chance to improve your team-working skills, learn better coding, debugging and testing practices, improve you ability for precise technical communication, conduct code reviews, and so on.
The project is an exercise in software-engineering-in-the-small.
PLI (Sem 1, 2019) Introduction and Welcome ⃝c University of Melbourne 16 / 36

Compilation = Analysis + Synthesis
Many programs share their basic structure with compilers.
The program front end reads and analyses a description of a problem, to understand what it means and to check its validity.
The program back end then synthesizes a solution for that problem by generating a sequence of commands that accomplishes the requested task.
A conventional compiler generates commands for a CPU.
Other programs may generate commands for a printer, a display screen, or some other device connected to a computer (e.g., a robot making cars).
Still others may generate commands for other programs.
PLI (Sem 1, 2019) Introduction and Welcome ⃝c University of Melbourne 17 / 36

Examples of Analysis + Synthesis
Application programs:
Text formatters like LATEX
Viewers for page description languages like Postscript, PDF Syntax-aware editors, e.g., Emacs C-mode
Silicon compilers (Verilog, VHDL)
Web browsers (HTML, JavaScript)
Query interpreters, e.g., SQL database interface
Domain-specific languages: make
sh, awk, perl, …
Preprocessors for text formatters, e.g., eqn, tbl, pic, chem
PLI (Sem 1, 2019) Introduction and Welcome ⃝c University of Melbourne 18 / 36

Examples of Analysis + Synthesis
Program generators:
Compiler generators, like flex, bison, alex, happy,… GUI generators
Unconventional compilers: Java to JVM
C++ to C Haskell to C
PLI (Sem 1, 2019)
Introduction and Welcome
⃝c University of Melbourne
19 / 36

Compiler Phases
source language
lexical analysis
syntax analysis
semantic analysis
symbol table intermediate code generator
optimizer
code generator target language
error handler
PLI (Sem 1, 2019)
Introduction and Welcome
⃝c University of Melbourne 20 / 36

Phases and Passes
Each phase performs a specific task.
Each pass processes the input (a program or a module) from
beginning to end.
Since each pass requires traversing a large data structure, reducing the number of passes speeds up the compiler.
Therefore designers often group more than one phase into one pass.
One of the phases is in control; it invokes the other phases to give them input or to consume their output.
This interleaves the execution of the various phases in the pass.
PLI (Sem 1, 2019) Introduction and Welcome ⃝c University of Melbourne 21 / 36

Analysis Phases
Lexical analysis (scanning) groups the characters of the input into lexemes, the smallest meaningful units of the program. It produces a sequence of tokens, one per lexeme.
It also discards irrelevant parts of the input (white space, comments).
Syntax analysis (parsing) groups tokens into hierarchies of bigger and bigger units, with the biggest unit being the whole
program.
Semantic analysis checks whether the program violates any of the rules of the programming language, for example, by using undeclared variables.
PLI (Sem 1, 2019) Introduction and Welcome ⃝c University of Melbourne 22 / 36

Lexical Analysis
mid = (hi + lo) / 2;
String
Token
Token Value
mid
identifier
“mid”
=
assign op
(
left paren
hi
identifier
“hi”
+
binop
ADD
lo
identifier
“lo”
)
right paren
/
binop
DIV
2
int const
2
;
semicolon
PLI (Sem 1, 2019) Introduction and Welcome ⃝c University of Melbourne 23 / 36

Syntax
The structure of the input is usually described using context-free grammar rules, like these:
assignment → lvalue → expr → expr → expr → expr → const →
lvalue assign op expr semicolon identifier
left paren expr right paren
expr binop expr
identifier const
int const
PLI (Sem 1, 2019)
Introduction and Welcome ⃝c University of Melbourne 24 / 36

Syntax Analysis
assignment
lvalue
id (mid) left paren
expr id (hi)
assign op expr
expr binop (ADD)
expr binop (DIV)
right paren expr
id (lo)
semicolon expr
const
int const (2)
The leaves of this parse tree are the original tokens, in order.
PLI (Sem 1, 2019) Introduction and Welcome ⃝c University of Melbourne 25 / 36

Abstract Syntax Tree
The parse tree contains superficial details that have no effect on the rest of the compiler.
Deleting them yields the abstract syntax tree:
assignment id (mid)
expr
id (hi) binop (ADD)
expr binop (DIV)
id (lo)
int const (2)
PLI (Sem 1, 2019) Introduction and Welcome
⃝c University of Melbourne 26 / 36

Semantic Analysis
A syntactically valid program may still be meaningless if, say, it uses undefined variables or passes the wrong number of arguments to functions.
Semantic analysis (1) checks the program for such semantic errors and (2) gathers information for code generation.
These two tasks are synergistic, because a semantic error often manifests itself as
the absence of information needed for code generation (e.g., the absence of a type declaration for a variable), or
an inconsistency between two sources of this information (e.g., a mismatch between the types of formal and actual parameters).
PLI (Sem 1, 2019) Introduction and Welcome ⃝c University of Melbourne 27 / 36

Attribute Grammars
The information gathered by semantic analysis can be stored in the abstract syntax tree in the form of attributes attached to nodes.
Attributes can be, for example
the type of an expression
a table mapping the local variables of a function to their current locations.
An attribute grammar extends a context-free grammar with
rules for assigning values to attributes of nodes in a parse tree conditions on attribute values—violations represent errors.
PLI (Sem 1, 2019) Introduction and Welcome ⃝c University of Melbourne 28 / 36

Uses of Attributes
The language semantics determines, e.g., what type of result you get when multiplying an integer and a float, and whether the program may assign a float value to an integer variable.
degc = (degf – 32) * (5.0/9);
Abstract syntax tree with type attributes: assignment
id (degc): int expr: float
expr: int binop (MUL) expr: float
id (degf): int binop (SUB) int const (32) float const (5.0) binop (DIV) int const (9)
PLI (Sem 1, 2019) Introduction and Welcome ⃝c University of Melbourne 29 / 36

Intermediate Code Generation
Instead of generating assembly language or machine code directly, compilers usually generate code for a simple abstract machine. This intermediate representation (IR) is designed to be easier to optimize than the target language.
The process of generating the IR is influenced by attribute information. For example, we need type attributes to discover the need for the conversions between integers and floats.
itmp1 := int_sub(degf, 32) ftmp1 := int_to_float(itmp1) ftmp2 := int_to_float(9)
ftmp3 := float_div(5.0, ftmp2) ftmp4 := float_mul(ftmp1, ftmp3) degc := float_to_int(ftmp4)
PLI (Sem 1, 2019) Introduction and Welcome ⃝c University of Melbourne 30 / 36

Code Optimization
The optimizer seeks to improve the program by making it execute faster or in less space (though it cannot guarantee actual optimality).
For example,
ftmp2 = int_to_float(9) ftmp3 = float_div(5.0, ftmp2)
can be replaced by
ftmp3 = 0.55555555
and future occurrences of degf – 32 can be replaced by itmp1 up
to the next assignment to degf.
PLI (Sem 1, 2019) Introduction and Welcome ⃝c University of Melbourne 31 / 36

Code Generation
IRs are simple to work with because they ignore concerns that must be addressed when generating code for a real machine.
Real machines have a limited number of registers.
Real machines often have more than one way to perform an operation.
Real machines often require some operands to be in particular places for particular operations.
Some compilers translate IR to assembly code, while others convert it directly to machine code.
The former is easier to implement and to debug, the latter speeds up compilation.
PLI (Sem 1, 2019) Introduction and Welcome ⃝c University of Melbourne 32 / 36

The Compiler’s Context
Original source program
Preprocessor (for some languages)
Preprocessed source program Compiler
Target assembly program
Assembler Relocatable machine code
Loader/linker (+ libraries) Absolute machine code
PLI (Sem 1, 2019)
Introduction and Welcome ⃝c University of Melbourne 33 / 36

The Symbol Table
The data structures where compilers keep information about identifiers are collectively known as the symbol table.
In typical compilers the tables for local scopes (e.g., functions) are separate from the table(s) for the global scope.
There may also be different tables for different kinds of things, since the set of attributes attached to variables need not be closely related to the set of attributes attached to functions.
Each phase of the compiler may add information to the symbol table, and a subset of its contents may be preserved in the object file for use by the debugger.
PLI (Sem 1, 2019) Introduction and Welcome ⃝c University of Melbourne 34 / 36

Writing the Symbol Table
The symbol table may be updated by any part of the compiler, for example:
The lexical analyser may enter an identifier.
The semantic analyser may record that it is a variable and what its type is.
The code generator may enter information about the storage assigned.
The error handler may set a flag to avoid repeating some error message.
PLI (Sem 1, 2019) Introduction and Welcome ⃝c University of Melbourne 35 / 36

Next Up …
We look in more detail at the lexical analysis phase.
If you haven’t done so already, read (or at least skim) ICD Chapter 1!
PLI (Sem 1, 2019) Introduction and Welcome ⃝c University of Melbourne 36 / 36