Compilers and computer architecture: introduction
Martin Berger
September 2015
Administrative matters: lecturer
Administrative matters: lecturer
Name:MartinBerger
Administrative matters: lecturer
Name:MartinBerger
Email: M.F.Berger@sussex.ac.uk
Administrative matters: lecturer
Name:MartinBerger
Email: M.F.Berger@sussex.ac.uk
Web: www.sussex.ac.uk/Users/mfb21/compilers
Administrative matters: lecturer
Name:MartinBerger
Email: M.F.Berger@sussex.ac.uk
Web: www.sussex.ac.uk/Users/mfb21/compilers
Lecturenotesetc: www.sussex.ac.uk/Users/mfb21/compilers/material.html
Administrative matters: lecturer
Name:MartinBerger
Email: M.F.Berger@sussex.ac.uk
Web: www.sussex.ac.uk/Users/mfb21/compilers
Lecturenotesetc: www.sussex.ac.uk/Users/mfb21/compilers/material.html
Officehours:afterthelectures,andonrequest(please arrange by email)
Administrative matters: lecturer
Name:MartinBerger
Email: M.F.Berger@sussex.ac.uk
Web: www.sussex.ac.uk/Users/mfb21/compilers
Lecturenotesetc: www.sussex.ac.uk/Users/mfb21/compilers/material.html
Officehours:afterthelectures,andonrequest(please arrange by email)
Myroom:ChichesterII,312
Administrative matters: Dates, times and assessment
Administrative matters: Dates, times and assessment
Lectures:Twolecturesperweek,Mondays16:00-17:00in Fulton B and Fridays 15:00-16:00 in Arts A A02.
Administrative matters: Dates, times and assessment
Lectures:Twolecturesperweek,Mondays16:00-17:00in Fulton B and Fridays 15:00-16:00 in Arts A A02.
Tutorials:Thursdays14:00-15:00and15:00-16:00in Chichester 1 CI204/5, and Fridays 11:00 – 12:00 and 12:00 – 13:00 also in Chichester 1 CI204/5. The TA is Alex Jeffery, email: apj21@sussex.ac.uk.
Administrative matters: Dates, times and assessment
Lectures:Twolecturesperweek,Mondays16:00-17:00in Fulton B and Fridays 15:00-16:00 in Arts A A02.
Tutorials:Thursdays14:00-15:00and15:00-16:00in Chichester 1 CI204/5, and Fridays 11:00 – 12:00 and 12:00 – 13:00 also in Chichester 1 CI204/5. The TA is Alex Jeffery, email: apj21@sussex.ac.uk.
Therewill(probably)bePALsessions,moresoon.
Administrative matters: Dates, times and assessment
Lectures:Twolecturesperweek,Mondays16:00-17:00in Fulton B and Fridays 15:00-16:00 in Arts A A02.
Tutorials:Thursdays14:00-15:00and15:00-16:00in Chichester 1 CI204/5, and Fridays 11:00 – 12:00 and 12:00 – 13:00 also in Chichester 1 CI204/5. The TA is Alex Jeffery, email: apj21@sussex.ac.uk.
Therewill(probably)bePALsessions,moresoon.
Assessment:coursework(50%)andbyunseen examination (50%). The coursework consists of two parts (weighted 30% and 70% respectively). Both involve writing parts of a compiler. Due dates TBC.
Administrative matters: Dates, times and assessment
Lectures:Twolecturesperweek,Mondays16:00-17:00in Fulton B and Fridays 15:00-16:00 in Arts A A02.
Tutorials:Thursdays14:00-15:00and15:00-16:00in Chichester 1 CI204/5, and Fridays 11:00 – 12:00 and 12:00 – 13:00 also in Chichester 1 CI204/5. The TA is Alex Jeffery, email: apj21@sussex.ac.uk.
Therewill(probably)bePALsessions,moresoon.
Assessment:coursework(50%)andbyunseen examination (50%). The coursework consists of two parts (weighted 30% and 70% respectively). Both involve writing parts of a compiler. Due dates TBC.
Iwillusethecourse’sStudyDirectpagefrequently.Ifyou have questions, you are free to ask me or the TA in person, but please consider asking them on Study Direct or in class, because then the answers can benefit other students too. I can set up Twitter / Facebook if you want.
Prerequisites
Prerequisites
Good Java programming skills are indispensable.
Prerequisites
Good Java programming skills are indispensable.
It helps if you have already seen e.g. regular expressions, FSMs etc. But we will cover all this from scratch.
Prerequisites
Good Java programming skills are indispensable.
It helps if you have already seen e.g. regular expressions, FSMs etc. But we will cover all this from scratch.
It helps if you have already seen a CPU, e.g. know what a register is or a stack pointer.
Course content
Course content
I’m planning to give a fairly orthodox compilers course that show you all parts of a compiler. At the end of this course you should be able to write a fully blown compiler yourself and implement programming languages.
Course content
I’m planning to give a fairly orthodox compilers course that show you all parts of a compiler. At the end of this course you should be able to write a fully blown compiler yourself and implement programming languages.
We will also look at computer architecture, although more superficially.
Course content
I’m planning to give a fairly orthodox compilers course that show you all parts of a compiler. At the end of this course you should be able to write a fully blown compiler yourself and implement programming languages.
We will also look at computer architecture, although more superficially.
This will take approximately 9 weeks, so we have time at the end for some advanced material. I’m happy to tailor the course to your interest, so please let me know what you want to hear about.
Coursework
Coursework
There will be two assessed courseworks (dates TBC). Evaluation of courseworks will (largely) be by automated tests. This is quite different from what you’ve seen so far. The reason for this new approach is threefold.
Coursework
There will be two assessed courseworks (dates TBC). Evaluation of courseworks will (largely) be by automated tests. This is quite different from what you’ve seen so far. The reason for this new approach is threefold.
Compilersarecomplicatedalgorithmsandit’sbeyond human capabilities to find subtle bugs.
Realism.Inindustryyoudon’tgetpaidforbeingnice,orfor having code that “almost” works.
Fairness.Automatictestingremovessubjectiveelement.
Coursework
There will be two assessed courseworks (dates TBC). Evaluation of courseworks will (largely) be by automated tests. This is quite different from what you’ve seen so far. The reason for this new approach is threefold.
Compilersarecomplicatedalgorithmsandit’sbeyond human capabilities to find subtle bugs.
Realism.Inindustryyoudon’tgetpaidforbeingnice,orfor having code that “almost” works.
Fairness.Automatictestingremovessubjectiveelement.
Note that if you make a basic error in your compiler then it is quite likely that every test fails and you will get 0 points. So it is really important that you test your code before submission thoroughly. I encourage you to share tests and testing frameworks with other students: as tests are not part of the deliverable, you make share them. Of course the compiler must be written by yourself.
Plan for today’s lecture
Plan for today’s lecture
Whirlwind overview of the course.
Plan for today’s lecture
Whirlwind overview of the course. Basicmaterial
Plan for today’s lecture
Whirlwind overview of the course.
Basicmaterial
Why study compilers?
Plan for today’s lecture
Whirlwind overview of the course. Basicmaterial
Why study compilers?
What is a compiler?
Plan for today’s lecture
Whirlwind overview of the course. Basicmaterial
Why study compilers?
What is a compiler?
Compiler structure
Plan for today’s lecture
Whirlwind overview of the course. Basicmaterial
Why study compilers?
What is a compiler?
Compiler structure
Lexical analysis
Plan for today’s lecture
Whirlwind overview of the course. Basicmaterial
Why study compilers?
What is a compiler?
Compiler structure
Lexical analysis
Syntax analysis
Plan for today’s lecture
Whirlwind overview of the course. Basicmaterial
Why study compilers?
What is a compiler?
Compiler structure
Lexical analysis
Syntax analysis
Semantic analysis, type-checking
Plan for today’s lecture
Whirlwind overview of the course. Basicmaterial
Why study compilers?
What is a compiler?
Compiler structure
Lexical analysis
Syntax analysis
Semantic analysis, type-checking
Code generation
Plan for today’s lecture
Whirlwind overview of the course. Basicmaterial
Why study compilers?
What is a compiler?
Compiler structure
Lexical analysis
Syntax analysis
Semantic analysis, type-checking
Code generation
Advancedmaterial
Plan for today’s lecture
Whirlwind overview of the course. Basicmaterial
Why study compilers?
What is a compiler?
Compiler structure
Lexical analysis
Syntax analysis
Semantic analysis, type-checking
Code generation
Advancedmaterial Optimisation
Plan for today’s lecture
Whirlwind overview of the course. Basicmaterial
Why study compilers?
What is a compiler?
Compiler structure
Lexical analysis
Syntax analysis
Semantic analysis, type-checking
Code generation
Advancedmaterial
Optimisation
Garbage collection
Plan for today’s lecture
Whirlwind overview of the course. Basicmaterial
Why study compilers?
What is a compiler?
Compiler structure
Lexical analysis
Syntax analysis
Semantic analysis, type-checking
Code generation
Advancedmaterial
Optimisation
Garbage collection
Exceptions
Plan for today’s lecture
Whirlwind overview of the course. Basicmaterial
Why study compilers?
What is a compiler?
Compiler structure
Lexical analysis
Syntax analysis
Semantic analysis, type-checking
Code generation
Advancedmaterial
Optimisation
Garbage collection
Exceptions
Caches
Plan for today’s lecture
Whirlwind overview of the course. Basicmaterial
Why study compilers?
What is a compiler?
Compiler structure
Lexical analysis
Syntax analysis
Semantic analysis, type-checking
Code generation
Advancedmaterial
Optimisation
Garbage collection
Exceptions
Caches
Just-in-time compilers
Why study compilers?
Why study compilers?
To become a good programmer, you need to understand what happens ’under the hood’ when you write programs in a high-level language.
Why study compilers?
To become a good programmer, you need to understand what happens ’under the hood’ when you write programs in a high-level language.
To understand low-level languages (assembler, C/C++ and Rust) better. Those languages are of prime importance, e.g. for writing operating systems, embedded code and generally code that needs to be really fast (e.g. computer games, big data).
Why study compilers?
To become a good programmer, you need to understand what happens ’under the hood’ when you write programs in a high-level language.
To understand low-level languages (assembler, C/C++ and Rust) better. Those languages are of prime importance, e.g. for writing operating systems, embedded code and generally code that needs to be really fast (e.g. computer games, big data).
Most large programs have a tendency to embed a programming language. The skill quickly to write an interpreter or compiler for such embedded languages is invaluable.
Why study compilers?
To become a good programmer, you need to understand what happens ’under the hood’ when you write programs in a high-level language.
To understand low-level languages (assembler, C/C++ and Rust) better. Those languages are of prime importance, e.g. for writing operating systems, embedded code and generally code that needs to be really fast (e.g. computer games, big data).
Most large programs have a tendency to embed a programming language. The skill quickly to write an interpreter or compiler for such embedded languages is invaluable.
But most of all: compilers are extremely amazing, beautiful and one of the all time great examples of human ingenuity. After 70 years of refinement compilers are a paradigm case of beautiful software structure (modularisation). I hope it inspires you.
Overview: what is a compiler?
Overview: what is a compiler?
A compiler is a program that translates programs from one programming language to programs in another programming language. The translation should preserve meaning (what does “preserve” and “meaning” mean in this context?).
Source program Compiler Target program
Error messages
Overview: what is a compiler?
A compiler is a program that translates programs from one programming language to programs in another programming language. The translation should preserve meaning (what does “preserve” and “meaning” mean in this context?).
Source program Compiler Target program
Error messages
Typically, the input language (called source language) is more high-level than the output language (called target language)
Overview: what is a compiler?
A compiler is a program that translates programs from one programming language to programs in another programming language. The translation should preserve meaning (what does “preserve” and “meaning” mean in this context?).
Source program Compiler Target program
Error messages
Typically, the input language (called source language) is more high-level than the output language (called target language) Examples
Overview: what is a compiler?
A compiler is a program that translates programs from one programming language to programs in another programming language. The translation should preserve meaning (what does “preserve” and “meaning” mean in this context?).
Source program Compiler Target program
Error messages
Typically, the input language (called source language) is more high-level than the output language (called target language) Examples
Source:Java,target:JVMbytecode.
Overview: what is a compiler?
A compiler is a program that translates programs from one programming language to programs in another programming language. The translation should preserve meaning (what does “preserve” and “meaning” mean in this context?).
Source program Compiler Target program
Error messages
Typically, the input language (called source language) is more high-level than the output language (called target language) Examples
Source:Java,target:JVMbytecode.
Source:ObjectiveC,target:ARMmachinecode.
Overview: what is a compiler?
A compiler is a program that translates programs from one programming language to programs in another programming language. The translation should preserve meaning (what does “preserve” and “meaning” mean in this context?).
Source program Compiler Target program
Error messages
Typically, the input language (called source language) is more high-level than the output language (called target language) Examples
Source:Java,target:JVMbytecode.
Source:ObjectiveC,target:ARMmachinecode.
Source:JVMbytecode,target:ARM/x86machinecode
Overview: Compiler is a bridge
Example translation: source program
Example translation: source program
Here is a little program. (What does it do?)
int testfun( int n ){
int res = 1;
while( n > 0 ){
n–;
res *= 2; }
return res; }
Example translation: source program
Here is a little program. (What does it do?)
int testfun( int n ){
int res = 1;
while( n > 0 ){
n–;
res *= 2; }
return res; }
Using gcc -S this translates to the following x86 machine code …
Example translation: target program
Leh_func_begin1: pushq %rbp
Ltmp0: Ltmp1:
LBB1_1:
movq %rsp, %rbp
movl %edi, -4(%rbp) movl $1, -16(%rbp) jmp LBB1_2
movl -4(%rbp), %eax subl $1, %eax
movl %eax, -4(%rbp) movl -16(%rbp), %eax imull $2, %eax, %eax movl %eax, -16(%rbp)
movl -4(%rbp), %eax cmpl $0, %eax
jg LBB1_1
movl -16(%rbp), %eax movl %eax, -12(%rbp) movl -12(%rbp), %eax movl %eax, -8(%rbp) movl -8(%rbp), %eax popq %rbp
LBB1_2:
ret Leh_func_end1:
.globl _main .align 4, 0x90
Compilers have a beautifully simple structure
Source program
Analysis phase
Code generation
Generated program
Compilers have a beautifully simple structure
Source program
Analysis phase
Code generation
Generated program
In the analysis phase two things happen:
Compilers have a beautifully simple structure
Source program
Analysis phase
Code generation
Generated program
In the analysis phase two things happen:
Creatingaconvenient(foracomputer) representation of the source program structure for further processing. (Abstract syntax tree (AST), symbol table).
Compilers have a beautifully simple structure
Source program
Analysis phase
Code generation
Generated program
In the analysis phase two things happen:
Creatingaconvenient(foracomputer) representation of the source program structure for further processing. (Abstract syntax tree (AST), symbol table).
Analysingiftheprogramiswell-formed(e.g. checking for syntax and type errors).
Compilers have a beautifully simple structure
Source program
Analysis phase
Code generation
Generated program
In the analysis phase two things happen:
Creatingaconvenient(foracomputer) representation of the source program structure for further processing. (Abstract syntax tree (AST), symbol table).
Analysingiftheprogramiswell-formed(e.g. checking for syntax and type errors).
The executable program is then generated from the AST in the code generation phase.
Compilers have a beautifully simple structure
Source program
Analysis phase
Code generation
Generated program
In the analysis phase two things happen:
Creatingaconvenient(foracomputer) representation of the source program structure for further processing. (Abstract syntax tree (AST), symbol table).
Analysingiftheprogramiswell-formed(e.g. checking for syntax and type errors).
The executable program is then generated from the AST in the code generation phase.
Let’s refine this.
Compiler structure
Compiler structure
Compilers have a beautifully simple structure. This structure was arrived at by breaking a hard problem (compilation) into several smaller problems and solving them separately. This has the added advantage of allowing to retarget compilers (changing source or target language) quite easily.
Compiler structure
Compilers have a beautifully simple structure. This structure was arrived at by breaking a hard problem (compilation) into several smaller problems and solving them separately. This has the added advantage of allowing to retarget compilers (changing source or target language) quite easily.
Source program
Lexical analysis
Syntax analysis
Semantic analysis, e.g. type checking
Intermediate code generation
Optimisation
Code generation
Translated program
Compiler structure
Compiler structure
Source program
Lexical analysis
Syntax analysis
Semantic analysis, e.g. type checking
Intermediate code generation
Optimisation
Code generation
Translated program
Interesting question: when do these phases happen?
Compiler structure
Source program
Lexical analysis
Syntax analysis
Semantic analysis, e.g. type checking
Intermediate code generation
Optimisation
Code generation
Translated program
Interesting question: when do these phases happen?
In the past, all happend at … compile-time. Now some happen at run-time in Just-in-time compilers (JITs). This has profound influences on choice of algorithms and performance.
Compiler structure
Compiler structure
Source program
Lexical analysis
Syntax analysis
Semantic analysis, e.g. type checking
Intermediate code generation
Optimisation
Code generation
Translated program
Another interesting question: do you note some thing about all these phases?
Compiler structure
Source program
Lexical analysis
Syntax analysis
Semantic analysis, e.g. type checking
Intermediate code generation
Optimisation
Code generation
Translated program
Another interesting question: do you note some thing about all these phases?
The phases are purely functional, in that they take one input, and return one output. Functional programming languages like Haskell, Ocaml, F#, or Scala are ideal for writing compilers.
Phases: Overview
Lexicalanalysis
Syntacticanalysis(parsing)
Semanticanalysis(type-checking) Intermediatecodegeneration
Optimisation
Codegeneration
Phases: Lexical analysis
Phases: Lexical analysis
Source program
Lexical analysis
Syntax analysis
Semantic analysis, e.g. type checking
Intermediate code generation
Optimisation
Code generation
Translated program
Phases: Lexical analysis
Phases: Lexical analysis
What is the input to a compiler?
Phases: Lexical analysis
What is the input to a compiler?
A (often long) string, i.e. a sequence of characters.
Phases: Lexical analysis
What is the input to a compiler?
A (often long) string, i.e. a sequence of characters.
Strings are not an efficient data-structure for a compiler to work with (= generate code from). Instead, compilers generate code from a more convenient data structure called “abstract syntax trees” (ASTs). We construct the AST of a program in two phases:
Lexicalanlysis.Wheretheinputstringisconvertedintoa list of tokens.
Parsing.WheretheASTisconstructedfromatokenlist.
Phases: Lexical analysis
Phases: Lexical analysis
In the lexical analysis, a string is converted into a list of tokens. Example: The program
Phases: Lexical analysis
In the lexical analysis, a string is converted into a list of tokens. Example: The program
int testfun( int n ){
int res = 1;
while( n > 0 ){
n–;
res *= 2; }
return res; }
Phases: Lexical analysis
In the lexical analysis, a string is converted into a list of tokens. Example: The program
int testfun( int n ){
int res = 1;
while( n > 0 ){
n–;
res *= 2; }
return res; }
Is (could be) represented as the list
T_int, T_ident ( “testfun” ), T_left_brack,
T_int, T_ident ( “n” ), T_rightbrack,
T_left_curly_brack, T_int, T_ident ( “res” ),
T_eq, T_num ( 1 ), T_semicolon, T_while, …
Phases: Lexical analysis
T_int, T_ident ( “testfun” ), T_left_brack,
T_int, T_ident ( “n” ), T_rightbrack,
T_left_curly_brack, T_int, T_ident ( “res” ),
T_eq, T_num ( 1 ), T_semicolon, T_while, …
Why is this interesting?
Phases: Lexical analysis
T_int, T_ident ( “testfun” ), T_left_brack,
T_int, T_ident ( “n” ), T_rightbrack,
T_left_curly_brack, T_int, T_ident ( “res” ),
T_eq, T_num ( 1 ), T_semicolon, T_while, …
Why is this interesting?
Savesmemory(notparticularlyrelevantnow,butvery important in the past).
Phases: Lexical analysis
T_int, T_ident ( “testfun” ), T_left_brack,
T_int, T_ident ( “n” ), T_rightbrack,
T_left_curly_brack, T_int, T_ident ( “res” ),
T_eq, T_num ( 1 ), T_semicolon, T_while, …
Why is this interesting?
Savesmemory(notparticularlyrelevantnow,butvery important in the past).
Abstractsfromirrelevantdetail(e.g.syntaxofkeywords, whitespace).
Phases: Lexical analysis
T_int, T_ident ( “testfun” ), T_left_brack,
T_int, T_ident ( “n” ), T_rightbrack,
T_left_curly_brack, T_int, T_ident ( “res” ),
T_eq, T_num ( 1 ), T_semicolon, T_while, …
Why is this interesting?
Savesmemory(notparticularlyrelevantnow,butvery important in the past).
Abstractsfromirrelevantdetail(e.g.syntaxofkeywords, whitespace).
Makesthenextphase(parsing)mucheasier.
Phases: Lexical analysis
T_int, T_ident ( “testfun” ), T_left_brack,
T_int, T_ident ( “n” ), T_rightbrack,
T_left_curly_brack, T_int, T_ident ( “res” ),
T_eq, T_num ( 1 ), T_semicolon, T_while, …
Why is this interesting?
Savesmemory(notparticularlyrelevantnow,butvery important in the past).
Abstractsfromirrelevantdetail(e.g.syntaxofkeywords, whitespace).
Makesthenextphase(parsing)mucheasier.
(Advanced:makesgrammardefiningprogramsyntax easier.)
Phases: syntax analysis (parsing)
Phases: syntax analysis (parsing)
Source program
Lexical analysis
Syntax analysis
Semantic analysis, e.g. type checking
Intermediate code generation
Optimisation
Code generation
Translated program
Phases: syntax analysis (parsing)
Phases: syntax analysis (parsing)
This phase converts the program (list of tokens) into a tree, the AST of the program (compare to the DOM of a webpage). This is a very convenient data structure because syntax-checking (type-checking) and code-generation can be done by walking the AST (cf visitor pattern). But how is a program a tree?
Phases: syntax analysis (parsing)
This phase converts the program (list of tokens) into a tree, the AST of the program (compare to the DOM of a webpage). This is a very convenient data structure because syntax-checking (type-checking) and code-generation can be done by walking the AST (cf visitor pattern). But how is a program a tree?
while( n > 0 ){
n–;
res *= 2; }
Phases: syntax analysis (parsing)
This phase converts the program (list of tokens) into a tree, the AST of the program (compare to the DOM of a webpage). This is a very convenient data structure because syntax-checking (type-checking) and code-generation can be done by walking the AST (cf visitor pattern). But how is a program a tree?
while( n > 0 ){
n–;
res *= 2; }
T_while
T_greater
T_var ( n ) T_num ( 0 )
T_semicolon
T_decrement
T_var ( n )
T_var ( res )
T_update
T_mult
T_var ( res )
T_num ( 2 )
Phases: syntax analysis (parsing)
Phases: syntax analysis (parsing)
T_while
T_greater
T_var ( n ) T_num ( 0 )
T_semicolon
T_decrement
T_var ( n )
T_var ( res )
T_var ( res )
T_update
T_mult
T_num ( 2 )
Phases: syntax analysis (parsing)
T_while
T_greater
T_var ( n ) T_num ( 0 )
T_semicolon
T_decrement
T_var ( n )
T_var ( res )
T_var ( res )
T_update
T_mult
T_num ( 2 )
TheASTisoftenimplementedasatreeoflinkedobjects.
Phases: syntax analysis (parsing)
T_while
T_greater
T_var ( n ) T_num ( 0 )
T_semicolon
T_decrement
T_var ( n )
T_var ( res )
T_var ( res )
T_update
T_mult
T_num ( 2 )
TheASTisoftenimplementedasatreeoflinkedobjects. ThecompilerwritermustdesigntheASTdatastructure
carefully so that it is easy to build (during syntax analysis), and easy to walk (during code generation).
Phases: syntax analysis (parsing)
T_while
T_greater
T_var ( n ) T_num ( 0 )
T_semicolon
T_decrement
T_var ( n )
T_var ( res )
T_var ( res )
T_update
T_mult
T_num ( 2 )
TheASTisoftenimplementedasatreeoflinkedobjects.
ThecompilerwritermustdesigntheASTdatastructure
carefully so that it is easy to build (during syntax analysis),
and easy to walk (during code generation).
Theperformanceofthecompilerstronglydependsonthe
AST, so a lot of optimisation goes here for instustrial strength compilers.
Phases: syntax analysis (parsing)
Phases: syntax analysis (parsing)
T_while
T_greater
T_var ( n ) T_num ( 0 )
T_semicolon
T_decrement
T_var ( n )
T_var ( res )
T_var ( res )
T_update
T_mult
T_num ( 2 )
Phases: syntax analysis (parsing)
T_while
T_greater
T_var ( n ) T_num ( 0 )
T_semicolon
T_decrement
T_var ( n )
T_var ( res )
T_var ( res )
T_update
T_mult
T_num ( 2 )
The construction of the AST has another important role: syntax checking, i.e. checking if the program is syntactically valid!
Phases: syntax analysis (parsing)
T_while
T_greater
T_var ( n ) T_num ( 0 )
T_semicolon
T_decrement
T_var ( n )
T_var ( res )
T_var ( res )
T_update
T_mult
T_num ( 2 )
The construction of the AST has another important role: syntax checking, i.e. checking if the program is syntactically valid!
This dual role is because the rules for constructing the AST are essentially exactly the rules that determine the set of syntactically valid programs. Here the theory of formal languages (context free, context sensitive, and finite automata) is of prime importance. We will study this in detail.
Phases: syntax analysis (parsing)
Phases: syntax analysis (parsing)
T_while
T_greater
T_var ( n ) T_num ( 0 )
T_semicolon
T_decrement
T_var ( n )
T_var ( res )
T_var ( res )
T_update
T_mult
T_num ( 2 )
Phases: syntax analysis (parsing)
T_while
T_greater
T_var ( n ) T_num ( 0 )
T_semicolon
T_decrement
T_var ( n )
T_var ( res )
T_var ( res )
T_update
T_mult
T_num ( 2 )
Great news: the generation of lexical analysers and parsers can be automated by using parser generators (e.g. lex, yacc). Decades of research have gone into parser generators, and in practise they generate better lexers and parsers than most programmers would be able to. Alas, parser generators are quite complicated beasts, and in order to understand them, it is helpful to understand formal languages and lexing/parsing. The best way to understand this is to write a toy lexer and parser.
Phases: semantic analysis
Phases: semantic analysis
Source program
Lexical analysis
Syntax analysis
Semantic analysis, e.g. type checking
Intermediate code generation
Optimisation
Code generation
Translated program
Phases: semantic analysis
Phases: semantic analysis
While parsing can reject syntactically invalid programs, it cannot reject semantically invalid programs, e.g. programs with more complicated ’semantic’ mistakes are harder to catch. Examples.
Phases: semantic analysis
While parsing can reject syntactically invalid programs, it cannot reject semantically invalid programs, e.g. programs with more complicated ’semantic’ mistakes are harder to catch. Examples.
void main() { i=7
int i = 7 …
Phases: semantic analysis
While parsing can reject syntactically invalid programs, it cannot reject semantically invalid programs, e.g. programs with more complicated ’semantic’ mistakes are harder to catch. Examples.
void main() { i=7
int i = 7 …
if ( 3 + true ) > “hello” then …
Phases: semantic analysis
While parsing can reject syntactically invalid programs, it cannot reject semantically invalid programs, e.g. programs with more complicated ’semantic’ mistakes are harder to catch. Examples.
void main() { i=7
int i = 7 …
if ( 3 + true ) > “hello” then …
They are caught with semantic analysis. The key technology are types. Modern languages like Scala, Haskell, Ocaml, F# employ type inference.
Phases: intermediate code generation
Phases: intermediate code generation
Source program
Lexical analysis
Syntax analysis
Semantic analysis, e.g. type checking
Intermediate code generation
Optimisation
Code generation
Translated program
Phases: intermediate code generation
Phases: intermediate code generation
There are many different CPUs with different machine languages. Often the machine language changes subtly from CPU version to CPU version. It would be annoying if we had to rewrite large parts of the compiler. Fortunately, most machine languages are rather similar. This helps us to abstract almost the whole compiler from the details of the target language. The way we do this is by using in essence two compilers.
Phases: intermediate code generation
There are many different CPUs with different machine languages. Often the machine language changes subtly from CPU version to CPU version. It would be annoying if we had to rewrite large parts of the compiler. Fortunately, most machine languages are rather similar. This helps us to abstract almost the whole compiler from the details of the target language. The way we do this is by using in essence two compilers.
Developanintermediatelanguagethatcapturesthe essence of almost all machine languages.
Phases: intermediate code generation
There are many different CPUs with different machine languages. Often the machine language changes subtly from CPU version to CPU version. It would be annoying if we had to rewrite large parts of the compiler. Fortunately, most machine languages are rather similar. This helps us to abstract almost the whole compiler from the details of the target language. The way we do this is by using in essence two compilers.
Developanintermediatelanguagethatcapturesthe essence of almost all machine languages.
Compiletothisintermediatelanguage.
Phases: intermediate code generation
There are many different CPUs with different machine languages. Often the machine language changes subtly from CPU version to CPU version. It would be annoying if we had to rewrite large parts of the compiler. Fortunately, most machine languages are rather similar. This helps us to abstract almost the whole compiler from the details of the target language. The way we do this is by using in essence two compilers.
Developanintermediatelanguagethatcapturesthe essence of almost all machine languages.
Compiletothisintermediatelanguage.
Docompileroptimisationsintheintermediatelanguage.
Phases: intermediate code generation
There are many different CPUs with different machine languages. Often the machine language changes subtly from CPU version to CPU version. It would be annoying if we had to rewrite large parts of the compiler. Fortunately, most machine languages are rather similar. This helps us to abstract almost the whole compiler from the details of the target language. The way we do this is by using in essence two compilers.
Developanintermediatelanguagethatcapturesthe essence of almost all machine languages.
Compiletothisintermediatelanguage.
Docompileroptimisationsintheintermediatelanguage.
Translatetheintermediaterepresentationtothetarget
machine language. This step can be seen as a mini-compiler.
Phases: intermediate code generation
There are many different CPUs with different machine languages. Often the machine language changes subtly from CPU version to CPU version. It would be annoying if we had to rewrite large parts of the compiler. Fortunately, most machine languages are rather similar. This helps us to abstract almost the whole compiler from the details of the target language. The way we do this is by using in essence two compilers.
Developanintermediatelanguagethatcapturesthe essence of almost all machine languages.
Compiletothisintermediatelanguage.
Docompileroptimisationsintheintermediatelanguage.
Translatetheintermediaterepresentationtothetarget
machine language. This step can be seen as a
mini-compiler.
Ifwewanttoretargetthecompilertoanewmachine
language, only this last step needs to be rewritten. Nice data abstraction.
Phases: optimiser
Phases: optimiser
Source program
Lexical analysis
Syntax analysis
Semantic analysis, e.g. type checking
Intermediate code generation
Optimisation
Code generation
Translated program
Phases: optimiser
Phases: optimiser
Translating a program often introduces various inefficiencies, make the program e.g. run slow, or use a lot of memories, or use a lot of power (important for mobile phones). Optimisers try to remove these inefficiencies, by replacing the inefficient program with a more efficient version (without changing the meaning of the program).
Phases: optimiser
Translating a program often introduces various inefficiencies, make the program e.g. run slow, or use a lot of memories, or use a lot of power (important for mobile phones). Optimisers try to remove these inefficiencies, by replacing the inefficient program with a more efficient version (without changing the meaning of the program).
Most code optimisations are problems are difficult (NP complete or undecidable), so optimisers are expensive to run, often (but not always) lead to modest improvements only. They are also difficult algorithmically. These difficulties are exacerbate for JITs because the are executed at program run-time.
Phases: optimiser
Translating a program often introduces various inefficiencies, make the program e.g. run slow, or use a lot of memories, or use a lot of power (important for mobile phones). Optimisers try to remove these inefficiencies, by replacing the inefficient program with a more efficient version (without changing the meaning of the program).
Most code optimisations are problems are difficult (NP complete or undecidable), so optimisers are expensive to run, often (but not always) lead to modest improvements only. They are also difficult algorithmically. These difficulties are exacerbate for JITs because the are executed at program run-time.
However, some optimisations are easy, e.g. inlining of functions: if a function is short (e.g. computing sum of two numbers), replacing the call to the function with its code, can lead to faster code. (What is the disadvantage of this?)
Phases: code generation
Phases: code generation
Source program
Lexical analysis
Syntax analysis
Semantic analysis, e.g. type checking
Intermediate code generation
Optimisation
Code generation
Translated program
Phases: code generation
Phases: code generation
This straighforward phase translates the generated intermediate code to machine code. As machine code and intermediate code are much alike, this ’mini-compiler’ is simple and fast.
Compilers vs interpreters
Compilers vs interpreters
Interpreters are a second way to run programs.
Compilers vs interpreters
Interpreters are a second way to run programs.
Source program Compiler
Data
Executable Output
At runtime. Data
Interpreter Output
Source program
Compilers vs interpreters
Interpreters are a second way to run programs.
Theadvantageofcompilersis that generated code is faster, because a lot of work has to be done only once (e.g. lexing, parsing, type-checking, optimisation). And the results of this work are shared in every execution. The interpreter has to redo this work everytime.
Source program Compiler
Data
Executable Output
At runtime. Data
Interpreter Output
Source program
Compilers vs interpreters
Interpreters are a second way to run programs.
Theadvantageofcompilersis that generated code is faster, because a lot of work has to be done only once (e.g. lexing, parsing, type-checking, optimisation). And the results of this work are shared in every execution. The interpreter has to redo this work everytime.
Theadvantageofinterpreters is that they are much simpler than compilers.
Source program Compiler
Data
Executable Output
At runtime. Data
Interpreter Output
Source program
Compilers vs interpreters
Interpreters are a second way to run programs.
Theadvantageofcompilersis that generated code is faster, because a lot of work has to be done only once (e.g. lexing, parsing, type-checking, optimisation). And the results of this work are shared in every execution. The interpreter has to redo this work everytime.
Theadvantageofinterpreters is that they are much simpler than compilers.
We won’t say much more about interpreters in this course.
Source program Compiler
Data
Executable Output
At runtime. Data
Interpreter Output
Source program
Literature
Literature
Compilers are among the most studied and most well understood parts of informatics. Many good books exist. Here are some of my favourites, although I won’t follow any of them closely.
Literature
Compilers are among the most studied and most well understood parts of informatics. Many good books exist. Here are some of my favourites, although I won’t follow any of them closely.
ModernCompilerImplementationinJava(second edition) by Andrew Appel and Jens Palsberg. Probably closest to our course. Moves quite fast.
Literature
Compilers are among the most studied and most well understood parts of informatics. Many good books exist. Here are some of my favourites, although I won’t follow any of them closely.
ModernCompilerImplementationinJava(second edition) by Andrew Appel and Jens Palsberg. Probably closest to our course. Moves quite fast.
Compilers-Principles,TechniquesandTools(second edition) by Alfred V. Aho, Monica Lam, Ravi Sethi, and Jeffrey D. Ullman. The first edition of this book is is the classic text on compilers, known as the “Dragon Book”, but its first edition is a bit obsolete. The second edition is substantially expanded and goes well beyond the scope of our course. For my liking, the book is a tad long.
Literature
Literature
Some other material:
Literature
Some other material:
EngineeringaCompiler,byKeithCooper,LindaTorczon.
Literature
Some other material:
EngineeringaCompiler,byKeithCooper,LindaTorczon. TheCourseraonlinecourseoncompilers,byAlex
Aiken. This course coveres similar ground as ours, but goes more in-depth. I was quite influenced by Aiken’s course when I designed our’s.
Literature
Some other material:
EngineeringaCompiler,byKeithCooper,LindaTorczon. TheCourseraonlinecourseoncompilers,byAlex
Aiken. This course coveres similar ground as ours, but goes more in-depth. I was quite influenced by Aiken’s course when I designed our’s.
ComputerArchitecture-AQuantitativeApproach(fifth edition) by John Hennessey and David Patterson. This is the ’bible’ for computer architecture. It goes way beyond what is required for our course, but very well written by some of the world’s leading experts on computer architecture. Well worth studying. This book contains a great deal of information about the MIPS architecture used in this course.
Literature
Some other material:
EngineeringaCompiler,byKeithCooper,LindaTorczon. TheCourseraonlinecourseoncompilers,byAlex
Aiken. This course coveres similar ground as ours, but goes more in-depth. I was quite influenced by Aiken’s course when I designed our’s.
ComputerArchitecture-AQuantitativeApproach(fifth edition) by John Hennessey and David Patterson. This is the ’bible’ for computer architecture. It goes way beyond what is required for our course, but very well written by some of the world’s leading experts on computer architecture. Well worth studying. This book contains a great deal of information about the MIPS architecture used in this course.
High-LevelLanguagesandTheirCompilersbyDes Watson, who used to teach this course in the past years at Sussex. The text of the book is available to students via Study Direct as a pdf.
How to enjoy and benefit from this course
How to enjoy and benefit from this course
Assessedcourseworkisdesignedtoreinforceand integrate lecture material; it’s designed to help you pass the exam
How to enjoy and benefit from this course
Assessedcourseworkisdesignedtoreinforceand integrate lecture material; it’s designed to help you pass the exam
Golookatthepastpapers-now.
How to enjoy and benefit from this course
Assessedcourseworkisdesignedtoreinforceand integrate lecture material; it’s designed to help you pass the exam
Golookatthepastpapers-now.
Usethetutorialstogetfeedbackonyoursolutions
How to enjoy and benefit from this course
Assessedcourseworkisdesignedtoreinforceand integrate lecture material; it’s designed to help you pass the exam
Golookatthepastpapers-now.
Usethetutorialstogetfeedbackonyoursolutions
Substantiallabexerciseshouldbringitalltogether
How to enjoy and benefit from this course
Assessedcourseworkisdesignedtoreinforceand integrate lecture material; it’s designed to help you pass the exam
Golookatthepastpapers-now.
Usethetutorialstogetfeedbackonyoursolutions
Substantiallabexerciseshouldbringitalltogether
Askquestions,inthelectures,inthelabs,onStudyDirect or in person!
How to enjoy and benefit from this course
Assessedcourseworkisdesignedtoreinforceand integrate lecture material; it’s designed to help you pass the exam
Golookatthepastpapers-now.
Usethetutorialstogetfeedbackonyoursolutions
Substantiallabexerciseshouldbringitalltogether
Askquestions,inthelectures,inthelabs,onStudyDirect or in person!
Designyourownmini-languagesandwritecompilersfor them.
How to enjoy and benefit from this course
Assessedcourseworkisdesignedtoreinforceand integrate lecture material; it’s designed to help you pass the exam
Golookatthepastpapers-now.
Usethetutorialstogetfeedbackonyoursolutions
Substantiallabexerciseshouldbringitalltogether
Askquestions,inthelectures,inthelabs,onStudyDirect or in person!
Designyourownmini-languagesandwritecompilersfor them.
Havealookatrealcompilers.Therearemanyfree, open-source compilers, g.g. GCC, LLVM, TCC, MiniML, Ocaml, the Scala compiler, GHC, the Haskell compiler.
Questions?