程序代写代做代考 computer architecture interpreter assembler Java x86 data structure c/c++ algorithm compiler Haskell ocaml F# cache mips Compilers and computer architecture: introduction

Compilers and computer architecture: introduction
Martin Berger
September 2015

Administrative matters: lecturer

Administrative matters: lecturer
􏹩 Name:MartinBerger

Administrative matters: lecturer
􏹩 Name:MartinBerger
􏹩 Email: M.F.Berger@sussex.ac.uk

Administrative matters: lecturer
􏹩 Name:MartinBerger
􏹩 Email: M.F.Berger@sussex.ac.uk
􏹩 Web: www.sussex.ac.uk/Users/mfb21/compilers

Administrative matters: lecturer
􏹩 Name:MartinBerger
􏹩 Email: M.F.Berger@sussex.ac.uk
􏹩 Web: www.sussex.ac.uk/Users/mfb21/compilers
􏹩 Lecturenotesetc: www.sussex.ac.uk/Users/mfb21/compilers/material.html

Administrative matters: lecturer
􏹩 Name:MartinBerger
􏹩 Email: M.F.Berger@sussex.ac.uk
􏹩 Web: www.sussex.ac.uk/Users/mfb21/compilers
􏹩 Lecturenotesetc: www.sussex.ac.uk/Users/mfb21/compilers/material.html
􏹩 Officehours:afterthelectures,andonrequest(please arrange by email)

Administrative matters: lecturer
􏹩 Name:MartinBerger
􏹩 Email: M.F.Berger@sussex.ac.uk
􏹩 Web: www.sussex.ac.uk/Users/mfb21/compilers
􏹩 Lecturenotesetc: www.sussex.ac.uk/Users/mfb21/compilers/material.html
􏹩 Officehours:afterthelectures,andonrequest(please arrange by email)
􏹩 Myroom:ChichesterII,312

Administrative matters: Dates, times and assessment

Administrative matters: Dates, times and assessment
􏹩 Lectures:Twolecturesperweek,Mondays16:00-17:00in Fulton B and Fridays 15:00-16:00 in Arts A A02.

Administrative matters: Dates, times and assessment
􏹩 Lectures:Twolecturesperweek,Mondays16:00-17:00in Fulton B and Fridays 15:00-16:00 in Arts A A02.
􏹩 Tutorials:Thursdays14:00-15:00and15:00-16:00in Chichester 1 CI204/5, and Fridays 11:00 – 12:00 and 12:00 – 13:00 also in Chichester 1 CI204/5. The TA is Alex Jeffery, email: apj21@sussex.ac.uk.

Administrative matters: Dates, times and assessment
􏹩 Lectures:Twolecturesperweek,Mondays16:00-17:00in Fulton B and Fridays 15:00-16:00 in Arts A A02.
􏹩 Tutorials:Thursdays14:00-15:00and15:00-16:00in Chichester 1 CI204/5, and Fridays 11:00 – 12:00 and 12:00 – 13:00 also in Chichester 1 CI204/5. The TA is Alex Jeffery, email: apj21@sussex.ac.uk.
􏹩 Therewill(probably)bePALsessions,moresoon.

Administrative matters: Dates, times and assessment
􏹩 Lectures:Twolecturesperweek,Mondays16:00-17:00in Fulton B and Fridays 15:00-16:00 in Arts A A02.
􏹩 Tutorials:Thursdays14:00-15:00and15:00-16:00in Chichester 1 CI204/5, and Fridays 11:00 – 12:00 and 12:00 – 13:00 also in Chichester 1 CI204/5. The TA is Alex Jeffery, email: apj21@sussex.ac.uk.
􏹩 Therewill(probably)bePALsessions,moresoon.
􏹩 Assessment:coursework(50%)andbyunseen examination (50%). The coursework consists of two parts (weighted 30% and 70% respectively). Both involve writing parts of a compiler. Due dates TBC.

Administrative matters: Dates, times and assessment
􏹩 Lectures:Twolecturesperweek,Mondays16:00-17:00in Fulton B and Fridays 15:00-16:00 in Arts A A02.
􏹩 Tutorials:Thursdays14:00-15:00and15:00-16:00in Chichester 1 CI204/5, and Fridays 11:00 – 12:00 and 12:00 – 13:00 also in Chichester 1 CI204/5. The TA is Alex Jeffery, email: apj21@sussex.ac.uk.
􏹩 Therewill(probably)bePALsessions,moresoon.
􏹩 Assessment:coursework(50%)andbyunseen examination (50%). The coursework consists of two parts (weighted 30% and 70% respectively). Both involve writing parts of a compiler. Due dates TBC.
􏹩 Iwillusethecourse’sStudyDirectpagefrequently.Ifyou have questions, you are free to ask me or the TA in person, but please consider asking them on Study Direct or in class, because then the answers can benefit other students too. I can set up Twitter / Facebook if you want.

Prerequisites

Prerequisites
Good Java programming skills are indispensable.

Prerequisites
Good Java programming skills are indispensable.
It helps if you have already seen e.g. regular expressions, FSMs etc. But we will cover all this from scratch.

Prerequisites
Good Java programming skills are indispensable.
It helps if you have already seen e.g. regular expressions, FSMs etc. But we will cover all this from scratch.
It helps if you have already seen a CPU, e.g. know what a register is or a stack pointer.

Course content

Course content
I’m planning to give a fairly orthodox compilers course that show you all parts of a compiler. At the end of this course you should be able to write a fully blown compiler yourself and implement programming languages.

Course content
I’m planning to give a fairly orthodox compilers course that show you all parts of a compiler. At the end of this course you should be able to write a fully blown compiler yourself and implement programming languages.
We will also look at computer architecture, although more superficially.

Course content
I’m planning to give a fairly orthodox compilers course that show you all parts of a compiler. At the end of this course you should be able to write a fully blown compiler yourself and implement programming languages.
We will also look at computer architecture, although more superficially.
This will take approximately 9 weeks, so we have time at the end for some advanced material. I’m happy to tailor the course to your interest, so please let me know what you want to hear about.

Coursework

Coursework
There will be two assessed courseworks (dates TBC). Evaluation of courseworks will (largely) be by automated tests. This is quite different from what you’ve seen so far. The reason for this new approach is threefold.

Coursework
There will be two assessed courseworks (dates TBC). Evaluation of courseworks will (largely) be by automated tests. This is quite different from what you’ve seen so far. The reason for this new approach is threefold.
􏹩 Compilersarecomplicatedalgorithmsandit’sbeyond human capabilities to find subtle bugs.
􏹩 Realism.Inindustryyoudon’tgetpaidforbeingnice,orfor having code that “almost” works.
􏹩 Fairness.Automatictestingremovessubjectiveelement.

Coursework
There will be two assessed courseworks (dates TBC). Evaluation of courseworks will (largely) be by automated tests. This is quite different from what you’ve seen so far. The reason for this new approach is threefold.
􏹩 Compilersarecomplicatedalgorithmsandit’sbeyond human capabilities to find subtle bugs.
􏹩 Realism.Inindustryyoudon’tgetpaidforbeingnice,orfor having code that “almost” works.
􏹩 Fairness.Automatictestingremovessubjectiveelement.
Note that if you make a basic error in your compiler then it is quite likely that every test fails and you will get 0 points. So it is really important that you test your code before submission thoroughly. I encourage you to share tests and testing frameworks with other students: as tests are not part of the deliverable, you make share them. Of course the compiler must be written by yourself.

Plan for today’s lecture

Plan for today’s lecture
Whirlwind overview of the course.

Plan for today’s lecture
Whirlwind overview of the course. 􏹩 Basicmaterial

Plan for today’s lecture
Whirlwind overview of the course.
􏹩 Basicmaterial
􏹩 Why study compilers?

Plan for today’s lecture
Whirlwind overview of the course. 􏹩 Basicmaterial
􏹩 Why study compilers?
􏹩 What is a compiler?

Plan for today’s lecture
Whirlwind overview of the course. 􏹩 Basicmaterial
􏹩 Why study compilers?
􏹩 What is a compiler?
􏹩 Compiler structure

Plan for today’s lecture
Whirlwind overview of the course. 􏹩 Basicmaterial
􏹩 Why study compilers?
􏹩 What is a compiler?
􏹩 Compiler structure
􏹩 Lexical analysis

Plan for today’s lecture
Whirlwind overview of the course. 􏹩 Basicmaterial
􏹩 Why study compilers?
􏹩 What is a compiler?
􏹩 Compiler structure
􏹩 Lexical analysis
􏹩 Syntax analysis

Plan for today’s lecture
Whirlwind overview of the course. 􏹩 Basicmaterial
􏹩 Why study compilers?
􏹩 What is a compiler?
􏹩 Compiler structure
􏹩 Lexical analysis
􏹩 Syntax analysis
􏹩 Semantic analysis, type-checking

Plan for today’s lecture
Whirlwind overview of the course. 􏹩 Basicmaterial
􏹩 Why study compilers?
􏹩 What is a compiler?
􏹩 Compiler structure
􏹩 Lexical analysis
􏹩 Syntax analysis
􏹩 Semantic analysis, type-checking
􏹩 Code generation

Plan for today’s lecture
Whirlwind overview of the course. 􏹩 Basicmaterial
􏹩 Why study compilers?
􏹩 What is a compiler?
􏹩 Compiler structure
􏹩 Lexical analysis
􏹩 Syntax analysis
􏹩 Semantic analysis, type-checking
􏹩 Code generation
􏹩 Advancedmaterial

Plan for today’s lecture
Whirlwind overview of the course. 􏹩 Basicmaterial
􏹩 Why study compilers?
􏹩 What is a compiler?
􏹩 Compiler structure
􏹩 Lexical analysis
􏹩 Syntax analysis
􏹩 Semantic analysis, type-checking
􏹩 Code generation
􏹩 Advancedmaterial 􏹩 Optimisation

Plan for today’s lecture
Whirlwind overview of the course. 􏹩 Basicmaterial
􏹩 Why study compilers?
􏹩 What is a compiler?
􏹩 Compiler structure
􏹩 Lexical analysis
􏹩 Syntax analysis
􏹩 Semantic analysis, type-checking
􏹩 Code generation
􏹩 Advancedmaterial
􏹩 Optimisation
􏹩 Garbage collection

Plan for today’s lecture
Whirlwind overview of the course. 􏹩 Basicmaterial
􏹩 Why study compilers?
􏹩 What is a compiler?
􏹩 Compiler structure
􏹩 Lexical analysis
􏹩 Syntax analysis
􏹩 Semantic analysis, type-checking
􏹩 Code generation
􏹩 Advancedmaterial
􏹩 Optimisation
􏹩 Garbage collection
􏹩 Exceptions

Plan for today’s lecture
Whirlwind overview of the course. 􏹩 Basicmaterial
􏹩 Why study compilers?
􏹩 What is a compiler?
􏹩 Compiler structure
􏹩 Lexical analysis
􏹩 Syntax analysis
􏹩 Semantic analysis, type-checking
􏹩 Code generation
􏹩 Advancedmaterial
􏹩 Optimisation
􏹩 Garbage collection
􏹩 Exceptions
􏹩 Caches

Plan for today’s lecture
Whirlwind overview of the course. 􏹩 Basicmaterial
􏹩 Why study compilers?
􏹩 What is a compiler?
􏹩 Compiler structure
􏹩 Lexical analysis
􏹩 Syntax analysis
􏹩 Semantic analysis, type-checking
􏹩 Code generation
􏹩 Advancedmaterial
􏹩 Optimisation
􏹩 Garbage collection
􏹩 Exceptions
􏹩 Caches
􏹩 Just-in-time compilers

Why study compilers?

Why study compilers?
To become a good programmer, you need to understand what happens ’under the hood’ when you write programs in a high-level language.

Why study compilers?
To become a good programmer, you need to understand what happens ’under the hood’ when you write programs in a high-level language.
To understand low-level languages (assembler, C/C++ and Rust) better. Those languages are of prime importance, e.g. for writing operating systems, embedded code and generally code that needs to be really fast (e.g. computer games, big data).

Why study compilers?
To become a good programmer, you need to understand what happens ’under the hood’ when you write programs in a high-level language.
To understand low-level languages (assembler, C/C++ and Rust) better. Those languages are of prime importance, e.g. for writing operating systems, embedded code and generally code that needs to be really fast (e.g. computer games, big data).
Most large programs have a tendency to embed a programming language. The skill quickly to write an interpreter or compiler for such embedded languages is invaluable.

Why study compilers?
To become a good programmer, you need to understand what happens ’under the hood’ when you write programs in a high-level language.
To understand low-level languages (assembler, C/C++ and Rust) better. Those languages are of prime importance, e.g. for writing operating systems, embedded code and generally code that needs to be really fast (e.g. computer games, big data).
Most large programs have a tendency to embed a programming language. The skill quickly to write an interpreter or compiler for such embedded languages is invaluable.
But most of all: compilers are extremely amazing, beautiful and one of the all time great examples of human ingenuity. After 70 years of refinement compilers are a paradigm case of beautiful software structure (modularisation). I hope it inspires you.

Overview: what is a compiler?

Overview: what is a compiler?
A compiler is a program that translates programs from one programming language to programs in another programming language. The translation should preserve meaning (what does “preserve” and “meaning” mean in this context?).
Source program Compiler Target program
Error messages

Overview: what is a compiler?
A compiler is a program that translates programs from one programming language to programs in another programming language. The translation should preserve meaning (what does “preserve” and “meaning” mean in this context?).
Source program Compiler Target program
Error messages
Typically, the input language (called source language) is more high-level than the output language (called target language)

Overview: what is a compiler?
A compiler is a program that translates programs from one programming language to programs in another programming language. The translation should preserve meaning (what does “preserve” and “meaning” mean in this context?).
Source program Compiler Target program
Error messages
Typically, the input language (called source language) is more high-level than the output language (called target language) Examples

Overview: what is a compiler?
A compiler is a program that translates programs from one programming language to programs in another programming language. The translation should preserve meaning (what does “preserve” and “meaning” mean in this context?).
Source program Compiler Target program
Error messages
Typically, the input language (called source language) is more high-level than the output language (called target language) Examples
􏹩 Source:Java,target:JVMbytecode.

Overview: what is a compiler?
A compiler is a program that translates programs from one programming language to programs in another programming language. The translation should preserve meaning (what does “preserve” and “meaning” mean in this context?).
Source program Compiler Target program
Error messages
Typically, the input language (called source language) is more high-level than the output language (called target language) Examples
􏹩 Source:Java,target:JVMbytecode.
􏹩 Source:ObjectiveC,target:ARMmachinecode.

Overview: what is a compiler?
A compiler is a program that translates programs from one programming language to programs in another programming language. The translation should preserve meaning (what does “preserve” and “meaning” mean in this context?).
Source program Compiler Target program
Error messages
Typically, the input language (called source language) is more high-level than the output language (called target language) Examples
􏹩 Source:Java,target:JVMbytecode.
􏹩 Source:ObjectiveC,target:ARMmachinecode.
􏹩 Source:JVMbytecode,target:ARM/x86machinecode

Overview: Compiler is a bridge

Example translation: source program

Example translation: source program
Here is a little program. (What does it do?)
int testfun( int n ){
int res = 1;
while( n > 0 ){
n–;
res *= 2; }
return res; }

Example translation: source program
Here is a little program. (What does it do?)
int testfun( int n ){
int res = 1;
while( n > 0 ){
n–;
res *= 2; }
return res; }
Using gcc -S this translates to the following x86 machine code …

Example translation: target program
Leh_func_begin1: pushq %rbp
Ltmp0: Ltmp1:
LBB1_1:
movq %rsp, %rbp
movl %edi, -4(%rbp) movl $1, -16(%rbp) jmp LBB1_2
movl -4(%rbp), %eax subl $1, %eax
movl %eax, -4(%rbp) movl -16(%rbp), %eax imull $2, %eax, %eax movl %eax, -16(%rbp)
movl -4(%rbp), %eax cmpl $0, %eax
jg LBB1_1
movl -16(%rbp), %eax movl %eax, -12(%rbp) movl -12(%rbp), %eax movl %eax, -8(%rbp) movl -8(%rbp), %eax popq %rbp
LBB1_2:
ret Leh_func_end1:
.globl _main .align 4, 0x90

Compilers have a beautifully simple structure
Source program
Analysis phase
Code generation
Generated program

Compilers have a beautifully simple structure
Source program
Analysis phase
Code generation
Generated program
In the analysis phase two things happen:

Compilers have a beautifully simple structure
Source program
Analysis phase
Code generation
Generated program
In the analysis phase two things happen:
􏹩 Creatingaconvenient(foracomputer) representation of the source program structure for further processing. (Abstract syntax tree (AST), symbol table).

Compilers have a beautifully simple structure
Source program
Analysis phase
Code generation
Generated program
In the analysis phase two things happen:
􏹩 Creatingaconvenient(foracomputer) representation of the source program structure for further processing. (Abstract syntax tree (AST), symbol table).
􏹩 Analysingiftheprogramiswell-formed(e.g. checking for syntax and type errors).

Compilers have a beautifully simple structure
Source program
Analysis phase
Code generation
Generated program
In the analysis phase two things happen:
􏹩 Creatingaconvenient(foracomputer) representation of the source program structure for further processing. (Abstract syntax tree (AST), symbol table).
􏹩 Analysingiftheprogramiswell-formed(e.g. checking for syntax and type errors).
The executable program is then generated from the AST in the code generation phase.

Compilers have a beautifully simple structure
Source program
Analysis phase
Code generation
Generated program
In the analysis phase two things happen:
􏹩 Creatingaconvenient(foracomputer) representation of the source program structure for further processing. (Abstract syntax tree (AST), symbol table).
􏹩 Analysingiftheprogramiswell-formed(e.g. checking for syntax and type errors).
The executable program is then generated from the AST in the code generation phase.
Let’s refine this.

Compiler structure

Compiler structure
Compilers have a beautifully simple structure. This structure was arrived at by breaking a hard problem (compilation) into several smaller problems and solving them separately. This has the added advantage of allowing to retarget compilers (changing source or target language) quite easily.

Compiler structure
Compilers have a beautifully simple structure. This structure was arrived at by breaking a hard problem (compilation) into several smaller problems and solving them separately. This has the added advantage of allowing to retarget compilers (changing source or target language) quite easily.
Source program
Lexical analysis
Syntax analysis
Semantic analysis, e.g. type checking
Intermediate code generation
Optimisation
Code generation
Translated program

Compiler structure

Compiler structure
Source program
Lexical analysis
Syntax analysis
Semantic analysis, e.g. type checking
Intermediate code generation
Optimisation
Code generation
Translated program
Interesting question: when do these phases happen?

Compiler structure
Source program
Lexical analysis
Syntax analysis
Semantic analysis, e.g. type checking
Intermediate code generation
Optimisation
Code generation
Translated program
Interesting question: when do these phases happen?
In the past, all happend at … compile-time. Now some happen at run-time in Just-in-time compilers (JITs). This has profound influences on choice of algorithms and performance.

Compiler structure

Compiler structure
Source program
Lexical analysis
Syntax analysis
Semantic analysis, e.g. type checking
Intermediate code generation
Optimisation
Code generation
Translated program
Another interesting question: do you note some thing about all these phases?

Compiler structure
Source program
Lexical analysis
Syntax analysis
Semantic analysis, e.g. type checking
Intermediate code generation
Optimisation
Code generation
Translated program
Another interesting question: do you note some thing about all these phases?
The phases are purely functional, in that they take one input, and return one output. Functional programming languages like Haskell, Ocaml, F#, or Scala are ideal for writing compilers.

Phases: Overview
􏹩 Lexicalanalysis
􏹩 Syntacticanalysis(parsing)
􏹩 Semanticanalysis(type-checking) 􏹩 Intermediatecodegeneration
􏹩 Optimisation
􏹩 Codegeneration

Phases: Lexical analysis

Phases: Lexical analysis
Source program
Lexical analysis
Syntax analysis
Semantic analysis, e.g. type checking
Intermediate code generation
Optimisation
Code generation
Translated program

Phases: Lexical analysis

Phases: Lexical analysis
What is the input to a compiler?

Phases: Lexical analysis
What is the input to a compiler?
A (often long) string, i.e. a sequence of characters.

Phases: Lexical analysis
What is the input to a compiler?
A (often long) string, i.e. a sequence of characters.
Strings are not an efficient data-structure for a compiler to work with (= generate code from). Instead, compilers generate code from a more convenient data structure called “abstract syntax trees” (ASTs). We construct the AST of a program in two phases:
􏹩 Lexicalanlysis.Wheretheinputstringisconvertedintoa list of tokens.
􏹩 Parsing.WheretheASTisconstructedfromatokenlist.

Phases: Lexical analysis

Phases: Lexical analysis
In the lexical analysis, a string is converted into a list of tokens. Example: The program

Phases: Lexical analysis
In the lexical analysis, a string is converted into a list of tokens. Example: The program
int testfun( int n ){
int res = 1;
while( n > 0 ){
n–;
res *= 2; }
return res; }

Phases: Lexical analysis
In the lexical analysis, a string is converted into a list of tokens. Example: The program
int testfun( int n ){
int res = 1;
while( n > 0 ){
n–;
res *= 2; }
return res; }
Is (could be) represented as the list
T_int, T_ident ( “testfun” ), T_left_brack,
T_int, T_ident ( “n” ), T_rightbrack,
T_left_curly_brack, T_int, T_ident ( “res” ),
T_eq, T_num ( 1 ), T_semicolon, T_while, …

Phases: Lexical analysis
T_int, T_ident ( “testfun” ), T_left_brack,
T_int, T_ident ( “n” ), T_rightbrack,
T_left_curly_brack, T_int, T_ident ( “res” ),
T_eq, T_num ( 1 ), T_semicolon, T_while, …
Why is this interesting?

Phases: Lexical analysis
T_int, T_ident ( “testfun” ), T_left_brack,
T_int, T_ident ( “n” ), T_rightbrack,
T_left_curly_brack, T_int, T_ident ( “res” ),
T_eq, T_num ( 1 ), T_semicolon, T_while, …
Why is this interesting?
􏹩 Savesmemory(notparticularlyrelevantnow,butvery important in the past).

Phases: Lexical analysis
T_int, T_ident ( “testfun” ), T_left_brack,
T_int, T_ident ( “n” ), T_rightbrack,
T_left_curly_brack, T_int, T_ident ( “res” ),
T_eq, T_num ( 1 ), T_semicolon, T_while, …
Why is this interesting?
􏹩 Savesmemory(notparticularlyrelevantnow,butvery important in the past).
􏹩 Abstractsfromirrelevantdetail(e.g.syntaxofkeywords, whitespace).

Phases: Lexical analysis
T_int, T_ident ( “testfun” ), T_left_brack,
T_int, T_ident ( “n” ), T_rightbrack,
T_left_curly_brack, T_int, T_ident ( “res” ),
T_eq, T_num ( 1 ), T_semicolon, T_while, …
Why is this interesting?
􏹩 Savesmemory(notparticularlyrelevantnow,butvery important in the past).
􏹩 Abstractsfromirrelevantdetail(e.g.syntaxofkeywords, whitespace).
􏹩 Makesthenextphase(parsing)mucheasier.

Phases: Lexical analysis
T_int, T_ident ( “testfun” ), T_left_brack,
T_int, T_ident ( “n” ), T_rightbrack,
T_left_curly_brack, T_int, T_ident ( “res” ),
T_eq, T_num ( 1 ), T_semicolon, T_while, …
Why is this interesting?
􏹩 Savesmemory(notparticularlyrelevantnow,butvery important in the past).
􏹩 Abstractsfromirrelevantdetail(e.g.syntaxofkeywords, whitespace).
􏹩 Makesthenextphase(parsing)mucheasier.
􏹩 (Advanced:makesgrammardefiningprogramsyntax easier.)

Phases: syntax analysis (parsing)

Phases: syntax analysis (parsing)
Source program
Lexical analysis
Syntax analysis
Semantic analysis, e.g. type checking
Intermediate code generation
Optimisation
Code generation
Translated program

Phases: syntax analysis (parsing)

Phases: syntax analysis (parsing)
This phase converts the program (list of tokens) into a tree, the AST of the program (compare to the DOM of a webpage). This is a very convenient data structure because syntax-checking (type-checking) and code-generation can be done by walking the AST (cf visitor pattern). But how is a program a tree?

Phases: syntax analysis (parsing)
This phase converts the program (list of tokens) into a tree, the AST of the program (compare to the DOM of a webpage). This is a very convenient data structure because syntax-checking (type-checking) and code-generation can be done by walking the AST (cf visitor pattern). But how is a program a tree?
while( n > 0 ){
n–;
res *= 2; }

Phases: syntax analysis (parsing)
This phase converts the program (list of tokens) into a tree, the AST of the program (compare to the DOM of a webpage). This is a very convenient data structure because syntax-checking (type-checking) and code-generation can be done by walking the AST (cf visitor pattern). But how is a program a tree?
while( n > 0 ){
n–;
res *= 2; }
T_while
T_greater
T_var ( n ) T_num ( 0 )
T_semicolon
T_decrement
T_var ( n )
T_var ( res )
T_update
T_mult
T_var ( res )
T_num ( 2 )

Phases: syntax analysis (parsing)

Phases: syntax analysis (parsing)
T_while
T_greater
T_var ( n ) T_num ( 0 )
T_semicolon
T_decrement
T_var ( n )
T_var ( res )
T_var ( res )
T_update
T_mult
T_num ( 2 )

Phases: syntax analysis (parsing)
T_while
T_greater
T_var ( n ) T_num ( 0 )
T_semicolon
T_decrement
T_var ( n )
T_var ( res )
T_var ( res )
T_update
T_mult
T_num ( 2 )
􏹩 TheASTisoftenimplementedasatreeoflinkedobjects.

Phases: syntax analysis (parsing)
T_while
T_greater
T_var ( n ) T_num ( 0 )
T_semicolon
T_decrement
T_var ( n )
T_var ( res )
T_var ( res )
T_update
T_mult
T_num ( 2 )
􏹩 TheASTisoftenimplementedasatreeoflinkedobjects. 􏹩 ThecompilerwritermustdesigntheASTdatastructure
carefully so that it is easy to build (during syntax analysis), and easy to walk (during code generation).

Phases: syntax analysis (parsing)
T_while
T_greater
T_var ( n ) T_num ( 0 )
T_semicolon
T_decrement
T_var ( n )
T_var ( res )
T_var ( res )
T_update
T_mult
T_num ( 2 )
􏹩 TheASTisoftenimplementedasatreeoflinkedobjects.
􏹩 ThecompilerwritermustdesigntheASTdatastructure
carefully so that it is easy to build (during syntax analysis),
and easy to walk (during code generation).
􏹩 Theperformanceofthecompilerstronglydependsonthe
AST, so a lot of optimisation goes here for instustrial strength compilers.

Phases: syntax analysis (parsing)

Phases: syntax analysis (parsing)
T_while
T_greater
T_var ( n ) T_num ( 0 )
T_semicolon
T_decrement
T_var ( n )
T_var ( res )
T_var ( res )
T_update
T_mult
T_num ( 2 )

Phases: syntax analysis (parsing)
T_while
T_greater
T_var ( n ) T_num ( 0 )
T_semicolon
T_decrement
T_var ( n )
T_var ( res )
T_var ( res )
T_update
T_mult
T_num ( 2 )
The construction of the AST has another important role: syntax checking, i.e. checking if the program is syntactically valid!

Phases: syntax analysis (parsing)
T_while
T_greater
T_var ( n ) T_num ( 0 )
T_semicolon
T_decrement
T_var ( n )
T_var ( res )
T_var ( res )
T_update
T_mult
T_num ( 2 )
The construction of the AST has another important role: syntax checking, i.e. checking if the program is syntactically valid!
This dual role is because the rules for constructing the AST are essentially exactly the rules that determine the set of syntactically valid programs. Here the theory of formal languages (context free, context sensitive, and finite automata) is of prime importance. We will study this in detail.

Phases: syntax analysis (parsing)

Phases: syntax analysis (parsing)
T_while
T_greater
T_var ( n ) T_num ( 0 )
T_semicolon
T_decrement
T_var ( n )
T_var ( res )
T_var ( res )
T_update
T_mult
T_num ( 2 )

Phases: syntax analysis (parsing)
T_while
T_greater
T_var ( n ) T_num ( 0 )
T_semicolon
T_decrement
T_var ( n )
T_var ( res )
T_var ( res )
T_update
T_mult
T_num ( 2 )
Great news: the generation of lexical analysers and parsers can be automated by using parser generators (e.g. lex, yacc). Decades of research have gone into parser generators, and in practise they generate better lexers and parsers than most programmers would be able to. Alas, parser generators are quite complicated beasts, and in order to understand them, it is helpful to understand formal languages and lexing/parsing. The best way to understand this is to write a toy lexer and parser.

Phases: semantic analysis

Phases: semantic analysis
Source program
Lexical analysis
Syntax analysis
Semantic analysis, e.g. type checking
Intermediate code generation
Optimisation
Code generation
Translated program

Phases: semantic analysis

Phases: semantic analysis
While parsing can reject syntactically invalid programs, it cannot reject semantically invalid programs, e.g. programs with more complicated ’semantic’ mistakes are harder to catch. Examples.

Phases: semantic analysis
While parsing can reject syntactically invalid programs, it cannot reject semantically invalid programs, e.g. programs with more complicated ’semantic’ mistakes are harder to catch. Examples.
void main() { i=7
int i = 7 …

Phases: semantic analysis
While parsing can reject syntactically invalid programs, it cannot reject semantically invalid programs, e.g. programs with more complicated ’semantic’ mistakes are harder to catch. Examples.
void main() { i=7
int i = 7 …
if ( 3 + true ) > “hello” then …

Phases: semantic analysis
While parsing can reject syntactically invalid programs, it cannot reject semantically invalid programs, e.g. programs with more complicated ’semantic’ mistakes are harder to catch. Examples.
void main() { i=7
int i = 7 …
if ( 3 + true ) > “hello” then …
They are caught with semantic analysis. The key technology are types. Modern languages like Scala, Haskell, Ocaml, F# employ type inference.

Phases: intermediate code generation

Phases: intermediate code generation
Source program
Lexical analysis
Syntax analysis
Semantic analysis, e.g. type checking
Intermediate code generation
Optimisation
Code generation
Translated program

Phases: intermediate code generation

Phases: intermediate code generation
There are many different CPUs with different machine languages. Often the machine language changes subtly from CPU version to CPU version. It would be annoying if we had to rewrite large parts of the compiler. Fortunately, most machine languages are rather similar. This helps us to abstract almost the whole compiler from the details of the target language. The way we do this is by using in essence two compilers.

Phases: intermediate code generation
There are many different CPUs with different machine languages. Often the machine language changes subtly from CPU version to CPU version. It would be annoying if we had to rewrite large parts of the compiler. Fortunately, most machine languages are rather similar. This helps us to abstract almost the whole compiler from the details of the target language. The way we do this is by using in essence two compilers.
􏹩 Developanintermediatelanguagethatcapturesthe essence of almost all machine languages.

Phases: intermediate code generation
There are many different CPUs with different machine languages. Often the machine language changes subtly from CPU version to CPU version. It would be annoying if we had to rewrite large parts of the compiler. Fortunately, most machine languages are rather similar. This helps us to abstract almost the whole compiler from the details of the target language. The way we do this is by using in essence two compilers.
􏹩 Developanintermediatelanguagethatcapturesthe essence of almost all machine languages.
􏹩 Compiletothisintermediatelanguage.

Phases: intermediate code generation
There are many different CPUs with different machine languages. Often the machine language changes subtly from CPU version to CPU version. It would be annoying if we had to rewrite large parts of the compiler. Fortunately, most machine languages are rather similar. This helps us to abstract almost the whole compiler from the details of the target language. The way we do this is by using in essence two compilers.
􏹩 Developanintermediatelanguagethatcapturesthe essence of almost all machine languages.
􏹩 Compiletothisintermediatelanguage.
􏹩 Docompileroptimisationsintheintermediatelanguage.

Phases: intermediate code generation
There are many different CPUs with different machine languages. Often the machine language changes subtly from CPU version to CPU version. It would be annoying if we had to rewrite large parts of the compiler. Fortunately, most machine languages are rather similar. This helps us to abstract almost the whole compiler from the details of the target language. The way we do this is by using in essence two compilers.
􏹩 Developanintermediatelanguagethatcapturesthe essence of almost all machine languages.
􏹩 Compiletothisintermediatelanguage.
􏹩 Docompileroptimisationsintheintermediatelanguage.
􏹩 Translatetheintermediaterepresentationtothetarget
machine language. This step can be seen as a mini-compiler.

Phases: intermediate code generation
There are many different CPUs with different machine languages. Often the machine language changes subtly from CPU version to CPU version. It would be annoying if we had to rewrite large parts of the compiler. Fortunately, most machine languages are rather similar. This helps us to abstract almost the whole compiler from the details of the target language. The way we do this is by using in essence two compilers.
􏹩 Developanintermediatelanguagethatcapturesthe essence of almost all machine languages.
􏹩 Compiletothisintermediatelanguage.
􏹩 Docompileroptimisationsintheintermediatelanguage.
􏹩 Translatetheintermediaterepresentationtothetarget
machine language. This step can be seen as a
mini-compiler.
􏹩 Ifwewanttoretargetthecompilertoanewmachine
language, only this last step needs to be rewritten. Nice data abstraction.

Phases: optimiser

Phases: optimiser
Source program
Lexical analysis
Syntax analysis
Semantic analysis, e.g. type checking
Intermediate code generation
Optimisation
Code generation
Translated program

Phases: optimiser

Phases: optimiser
Translating a program often introduces various inefficiencies, make the program e.g. run slow, or use a lot of memories, or use a lot of power (important for mobile phones). Optimisers try to remove these inefficiencies, by replacing the inefficient program with a more efficient version (without changing the meaning of the program).

Phases: optimiser
Translating a program often introduces various inefficiencies, make the program e.g. run slow, or use a lot of memories, or use a lot of power (important for mobile phones). Optimisers try to remove these inefficiencies, by replacing the inefficient program with a more efficient version (without changing the meaning of the program).
Most code optimisations are problems are difficult (NP complete or undecidable), so optimisers are expensive to run, often (but not always) lead to modest improvements only. They are also difficult algorithmically. These difficulties are exacerbate for JITs because the are executed at program run-time.

Phases: optimiser
Translating a program often introduces various inefficiencies, make the program e.g. run slow, or use a lot of memories, or use a lot of power (important for mobile phones). Optimisers try to remove these inefficiencies, by replacing the inefficient program with a more efficient version (without changing the meaning of the program).
Most code optimisations are problems are difficult (NP complete or undecidable), so optimisers are expensive to run, often (but not always) lead to modest improvements only. They are also difficult algorithmically. These difficulties are exacerbate for JITs because the are executed at program run-time.
However, some optimisations are easy, e.g. inlining of functions: if a function is short (e.g. computing sum of two numbers), replacing the call to the function with its code, can lead to faster code. (What is the disadvantage of this?)

Phases: code generation

Phases: code generation
Source program
Lexical analysis
Syntax analysis
Semantic analysis, e.g. type checking
Intermediate code generation
Optimisation
Code generation
Translated program

Phases: code generation

Phases: code generation
This straighforward phase translates the generated intermediate code to machine code. As machine code and intermediate code are much alike, this ’mini-compiler’ is simple and fast.

Compilers vs interpreters

Compilers vs interpreters
Interpreters are a second way to run programs.

Compilers vs interpreters
Interpreters are a second way to run programs.
Source program Compiler
Data
Executable Output
At runtime. Data
Interpreter Output
Source program

Compilers vs interpreters
Interpreters are a second way to run programs.
􏹩 Theadvantageofcompilersis that generated code is faster, because a lot of work has to be done only once (e.g. lexing, parsing, type-checking, optimisation). And the results of this work are shared in every execution. The interpreter has to redo this work everytime.
Source program Compiler
Data
Executable Output
At runtime. Data
Interpreter Output
Source program

Compilers vs interpreters
Interpreters are a second way to run programs.
􏹩 Theadvantageofcompilersis that generated code is faster, because a lot of work has to be done only once (e.g. lexing, parsing, type-checking, optimisation). And the results of this work are shared in every execution. The interpreter has to redo this work everytime.
􏹩 Theadvantageofinterpreters is that they are much simpler than compilers.
Source program Compiler
Data
Executable Output
At runtime. Data
Interpreter Output
Source program

Compilers vs interpreters
Interpreters are a second way to run programs.
􏹩 Theadvantageofcompilersis that generated code is faster, because a lot of work has to be done only once (e.g. lexing, parsing, type-checking, optimisation). And the results of this work are shared in every execution. The interpreter has to redo this work everytime.
􏹩 Theadvantageofinterpreters is that they are much simpler than compilers.
We won’t say much more about interpreters in this course.
Source program Compiler
Data
Executable Output
At runtime. Data
Interpreter Output
Source program

Literature

Literature
Compilers are among the most studied and most well understood parts of informatics. Many good books exist. Here are some of my favourites, although I won’t follow any of them closely.

Literature
Compilers are among the most studied and most well understood parts of informatics. Many good books exist. Here are some of my favourites, although I won’t follow any of them closely.
􏹩 ModernCompilerImplementationinJava(second edition) by Andrew Appel and Jens Palsberg. Probably closest to our course. Moves quite fast.

Literature
Compilers are among the most studied and most well understood parts of informatics. Many good books exist. Here are some of my favourites, although I won’t follow any of them closely.
􏹩 ModernCompilerImplementationinJava(second edition) by Andrew Appel and Jens Palsberg. Probably closest to our course. Moves quite fast.
􏹩 Compilers-Principles,TechniquesandTools(second edition) by Alfred V. Aho, Monica Lam, Ravi Sethi, and Jeffrey D. Ullman. The first edition of this book is is the classic text on compilers, known as the “Dragon Book”, but its first edition is a bit obsolete. The second edition is substantially expanded and goes well beyond the scope of our course. For my liking, the book is a tad long.

Literature

Literature
Some other material:

Literature
Some other material:
􏹩 EngineeringaCompiler,byKeithCooper,LindaTorczon.

Literature
Some other material:
􏹩 EngineeringaCompiler,byKeithCooper,LindaTorczon. 􏹩 TheCourseraonlinecourseoncompilers,byAlex
Aiken. This course coveres similar ground as ours, but goes more in-depth. I was quite influenced by Aiken’s course when I designed our’s.

Literature
Some other material:
􏹩 EngineeringaCompiler,byKeithCooper,LindaTorczon. 􏹩 TheCourseraonlinecourseoncompilers,byAlex
Aiken. This course coveres similar ground as ours, but goes more in-depth. I was quite influenced by Aiken’s course when I designed our’s.
􏹩 ComputerArchitecture-AQuantitativeApproach(fifth edition) by John Hennessey and David Patterson. This is the ’bible’ for computer architecture. It goes way beyond what is required for our course, but very well written by some of the world’s leading experts on computer architecture. Well worth studying. This book contains a great deal of information about the MIPS architecture used in this course.

Literature
Some other material:
􏹩 EngineeringaCompiler,byKeithCooper,LindaTorczon. 􏹩 TheCourseraonlinecourseoncompilers,byAlex
Aiken. This course coveres similar ground as ours, but goes more in-depth. I was quite influenced by Aiken’s course when I designed our’s.
􏹩 ComputerArchitecture-AQuantitativeApproach(fifth edition) by John Hennessey and David Patterson. This is the ’bible’ for computer architecture. It goes way beyond what is required for our course, but very well written by some of the world’s leading experts on computer architecture. Well worth studying. This book contains a great deal of information about the MIPS architecture used in this course.
􏹩 High-LevelLanguagesandTheirCompilersbyDes Watson, who used to teach this course in the past years at Sussex. The text of the book is available to students via Study Direct as a pdf.

How to enjoy and benefit from this course

How to enjoy and benefit from this course
􏹩 Assessedcourseworkisdesignedtoreinforceand integrate lecture material; it’s designed to help you pass the exam

How to enjoy and benefit from this course
􏹩 Assessedcourseworkisdesignedtoreinforceand integrate lecture material; it’s designed to help you pass the exam
􏹩 Golookatthepastpapers-now.

How to enjoy and benefit from this course
􏹩 Assessedcourseworkisdesignedtoreinforceand integrate lecture material; it’s designed to help you pass the exam
􏹩 Golookatthepastpapers-now.
􏹩 Usethetutorialstogetfeedbackonyoursolutions

How to enjoy and benefit from this course
􏹩 Assessedcourseworkisdesignedtoreinforceand integrate lecture material; it’s designed to help you pass the exam
􏹩 Golookatthepastpapers-now.
􏹩 Usethetutorialstogetfeedbackonyoursolutions
􏹩 Substantiallabexerciseshouldbringitalltogether

How to enjoy and benefit from this course
􏹩 Assessedcourseworkisdesignedtoreinforceand integrate lecture material; it’s designed to help you pass the exam
􏹩 Golookatthepastpapers-now.
􏹩 Usethetutorialstogetfeedbackonyoursolutions
􏹩 Substantiallabexerciseshouldbringitalltogether
􏹩 Askquestions,inthelectures,inthelabs,onStudyDirect or in person!

How to enjoy and benefit from this course
􏹩 Assessedcourseworkisdesignedtoreinforceand integrate lecture material; it’s designed to help you pass the exam
􏹩 Golookatthepastpapers-now.
􏹩 Usethetutorialstogetfeedbackonyoursolutions
􏹩 Substantiallabexerciseshouldbringitalltogether
􏹩 Askquestions,inthelectures,inthelabs,onStudyDirect or in person!
􏹩 Designyourownmini-languagesandwritecompilersfor them.

How to enjoy and benefit from this course
􏹩 Assessedcourseworkisdesignedtoreinforceand integrate lecture material; it’s designed to help you pass the exam
􏹩 Golookatthepastpapers-now.
􏹩 Usethetutorialstogetfeedbackonyoursolutions
􏹩 Substantiallabexerciseshouldbringitalltogether
􏹩 Askquestions,inthelectures,inthelabs,onStudyDirect or in person!
􏹩 Designyourownmini-languagesandwritecompilersfor them.
􏹩 Havealookatrealcompilers.Therearemanyfree, open-source compilers, g.g. GCC, LLVM, TCC, MiniML, Ocaml, the Scala compiler, GHC, the Haskell compiler.

Questions?