程序代写代做代考 interpreter html assembler computer architecture Java ocaml compiler algorithm c/c++ Haskell F# data structure game x86 Compilers and computer architecture: introduction

Compilers and computer architecture: introduction
Martin Berger 1 September 2019
1Email: M.F.Berger@sussex.ac.uk, Office hours: Wed 12-13 in Chi-2R312
1/41

Administrative matters: lecturer
􏰉 Name:MartinBerger
􏰉 Email: M.F.Berger@sussex.ac.uk
􏰉 Web: http://users.sussex.ac.uk/~mfb21/compilers
􏰉 Lecturenotesetc:http://users.sussex.ac.uk/ ~mfb21/compilers/material.html Linked from Canvas
􏰉 Officehour:aftertheWednesdayslectures,andon request (please arrange by email, see http://users.sussex.ac.uk/~mfb21/cal for available time-slots)
􏰉 Myroom:ChichesterII,312
2/41

Administrative matters: dates, times and assessment
􏰉 Lectures:Twolecturesperweek, Wednesday: 11-12 Lec PEV1-1A7 Friday: 17-18 RICH-AS3
􏰉 Tutorials:pleaseseeyourtimetables.TheTAisShaun Ring sr410@sussex.ac.uk
􏰉 Therewill(probably)bePALsessions,moresoon.
􏰉 Assessment:coursework(50%)andbyunseen examination (50%). Both courseworks involve writing parts of a compiler. Due dates for courseworks: Fri, XXX Nov 2019, and Fri, XXX Dec 2019, 18:00, both 18:00.
3/41

Questions welcome!
Please, ask questions …
􏰉 duringthelesson
􏰉 attheendofthelesson
􏰉 inmyofficehours(see http://users.sussex.ac.uk/~mfb21/cal for available time-slots)
􏰉 byemailM.F.Berger@sussex.ac.uk
􏰉 onCanvas
􏰉 inthetutorials
􏰉 inthecourse’sDiscordchannel(inviteisonCanvas)
􏰉 anyotherchannels(e.g.Telegram,TikTok…)?
Please, don’t wait until the end of the course to tell me about any problems you may encounter.
4/41

Prerequisites
Good Java programming skills are indispensable.This course is not about teaching you how to program. “Good” in this context means you can do most questions on e.g.
https://leetcode.com/
classified as “Easy” without problems (= without looking up the answer, and in 1 hour or less).
It helps if you have already seen e.g. regular expressions, FSMs etc. But we will cover all this from scratch.
It helps if you have already seen a CPU, e.g. know what a register is or a stack pointer.
5/41

Course content
I’m planning to give a fairly orthodox compilers course that show you all parts of a compiler. At the end of this course you should be able to write a fully blown compiler yourself and implement programming languages.
We will also look at computer architecture, although more superficially.
This will take approximately 9 weeks, so we have time at the end for some advanced material. I’m happy to tailor the course to your interest, so please let me know what you want to hear about.
6/41

Coursework
Evaluation of assessed courseworks will (largely) be by automated tests. This is quite different from what you’ve seen so far. The reason for this new approach is threefold.
􏰉 Compilersarecomplicatedalgorithmsandit’sbeyond human capabilities to find subtle bugs.
􏰉 Realism.Inindustryyoudon’tgetpaidforbeingnice,orfor having code that “almost” works.
􏰉 Fairness.Automatictestingremovessubjectiveelement.
Note that if you make a basic error in your compiler then it is quite likely that every test fails and you will get 0 points. So it is really important that you test your code before submission thoroughly. I encourage you to share tests and testing frameworks with other students: as tests are not part of the deliverable, you make share them. Of course the compiler must be written by yourself.
7/41

Plan for today’s lecture
Whirlwind overview of the course.
􏰉 Whystudycompilers? 􏰉 Whatisacompiler?
􏰉 Compilerstructure
􏰉 Lexicalanalysis
􏰉 Syntaxanalysis
􏰉 Semanticanalysis,type-checking 􏰉 Codegeneration
8/41

Why study compilers?
To become a good programmer, you need to understand what happens ’under the hood’ when you write programs in a high-level language.
To understand low-level languages (assembler, C/C++, Rust, Go) better. Those languages are of prime importance, e.g. for writing operating systems, embedded code and generally code that needs to be fast (e.g. computer games, ML e.g. TensorFlow).
Most large programs have a tendency to embed a programming language. The skill quickly to write an interpreter or compiler for such embedded languages is invaluable.
But most of all: compilers are extremely amazing, beautiful and one of the all time great examples of human ingenuity. After 70 years of refinement compilers are a paradigm case of beautiful software structure (modularisation). I hope it inspires you.
9/41

Overview: what is a compiler?
10/41

Overview: what is a compiler?
A compiler is a program that translates programs from one programming language to programs in another programming language. The translation should preserve meaning (what does “preserve” and “meaning” mean in this context?).
Source program Compiler Target program
Error messages
Typically, the input language (called source language) is more high-level than the output language (called target language) Examples
􏰉 Source:Java,target:JVMbytecode.
􏰉 Source:JVMbytecode,target:ARM/x86machinecode 􏰉 Source:TensorFlow,target:GPU/TPUmachinecode.
11/41

Example translation: source program
Here is a little program. (What does it do?)
int testfun( int n ){
int res = 1;
while( n > 0 ){
n–;
res *= 2; }
return res; }
Using clang -S this translates to the following x86 machine code …
12/41

Example translation: target program
_testfun: .cfi_startproc
pushq %rbp Ltmp0:
.cfi_def_cfa_offset 16 Ltmp1:
.cfi_offset %rbp, -16
movq %rsp, %rbp Ltmp2:
.cfi_def_cfa_register %rbp movl %edi, -4(%rbp) movl $1, -8(%rbp)
LBB0_1:
cmpl $0, -4(%rbp)
jle LBB0_3
movl -4(%rbp), %eax addl $4294967295, %eax movl %eax, -4(%rbp) movl -8(%rbp), %eax shll $1, %eax
movl %eax, -8(%rbp) jmp LBB0_1
LBB0_3:
movl -8(%rbp), %eax
popq %rbp retq .cfi_endproc
## @testfun ## BB#0:
## =>This Inner Loop Header: Depth=1
## BB#2:
## in Loop: Header=BB0_1 Depth=1
## imm = 0xFFFFFFFF
13/41

Compilers have a beautifully simple structure
Source program
Analysis phase
Code generation
Generated program
In the analysis phase two things happen:
􏰉 Analysingiftheprogramiswell-formed(e.g. checking for syntax and type errors).
􏰉 Creatingaconvenient(foracomputer) representation of the source program structure for further processing. (Abstract syntax tree (AST), symbol table).
The executable program is then generated from the AST in the code generation phase.
Let’s refine this.
14/41

Compiler structure
Compilers have a beautifully simple structure. This structure was arrived at by breaking a hard problem (compilation) into several smaller problems and solving them separately. This has the added advantage of allowing to retarget compilers (changing source or target language) quite easily.
15/41
Source program
Lexical analysis
Syntax analysis
Semantic analysis, e.g. type checking
Intermediate code generation
Optimisation
Code generation
Translated program

Compiler structure
Source program
Lexical analysis
Syntax analysis
Semantic analysis, e.g. type checking
Intermediate code generation
Optimisation
Code generation
Translated program
Interesting question: when do these phases happen?
In the past, all happend at … compile-time. Now some happen at run-time in Just-in-time compilers (JITs). This has profound influences on choice of algorithms and performance.
16/41

Compiler structure
Source program
Lexical analysis
Syntax analysis
Semantic analysis, e.g. type checking
Intermediate code generation
Optimisation
Code generation
Translated program
Another interesting question: do you note some thing about all these phases?
The phases are purely functional, in that they take one input, and return one output. Modern programming languages like Haskell, Ocaml, F#, Rust or Scala are ideal for writing compilers.
17 / 41

Phases: Overview
􏰉 Lexicalanalysis
􏰉 Syntacticanalysis(parsing)
􏰉 Semanticanalysis(type-checking) 􏰉 Intermediatecodegeneration
􏰉 Optimisation
􏰉 Codegeneration
18/41

Phases: Lexical analysis
Source program
Lexical analysis
Syntax analysis
Semantic analysis, e.g. type checking
Intermediate code generation
Optimisation
Code generation
Translated program
19/41

Phases: Lexical analysis
What is the input to a compiler?
A (often long) string, i.e. a sequence of characters.
Strings are not an efficient data-structure for a compiler to work with (= generate code from). Instead, compilers generate code from a more convenient data structure called “abstract syntax trees” (ASTs). We construct the AST of a program in two phases:
􏰉 Lexicalanlysis.Wheretheinputstringisconvertedintoa list of tokens.
􏰉 Parsing.WheretheASTisconstructedfromatokenlist.
20/41

Phases: Lexical analysis
In the lexical analysis, a string is converted into a list of tokens. Example: The program
int testfun( int n ){
int res = 1;
while( n > 0 ){
n–;
res *= 2; }
return res; }
Is (could be) represented as the list
T_int, T_ident ( “testfun” ), T_left_brack,
T_int, T_ident ( “n” ), T_rightbrack,
T_left_curly_brack, T_int, T_ident ( “res” ),
T_eq, T_num ( 1 ), T_semicolon, T_while, …
21/41

Phases: Lexical analysis
T_int, T_ident ( “testfun” ), T_left_brack,
T_int, T_ident ( “n” ), T_rightbrack,
T_left_curly_brack, T_int, T_ident ( “res” ),
T_eq, T_num ( 1 ), T_semicolon, T_while, …
Why is this interesting?
􏰉 Abstractsfromirrelevantdetail(e.g.syntaxofkeywords, whitespace, comments).
􏰉 Makesthenextphase(parsing)mucheasier.
22/41

Phases: syntax analysis (parsing)
Source program
Lexical analysis
Syntax analysis
Semantic analysis, e.g. type checking
Intermediate code generation
Optimisation
Code generation
Translated program
23/41

Phases: syntax analysis (parsing)
This phase converts the program (list of tokens) into a tree, the AST of the program (compare to the DOM of a webpage). This is a very convenient data structure because syntax-checking (type-checking) and code-generation can be done by walking the AST (cf visitor pattern). But how is a program a tree?
while( n > 0 ){
n–;
res *= 2; }
T_while
T_greater
T_semicolon
T_var ( n )
T_num ( 0 )
T_decrement
T_var ( n )
T_var ( res )
T_update
T_mult
T_num ( 2 )
T_var ( res )
24/41

Phases: syntax analysis (parsing)
T_while
T_greater
T_var ( n ) T_num ( 0 )
T_semicolon
T_decrement
T_var ( n )
T_var ( res )
T_var ( res )
T_update
T_mult
T_num ( 2 )
􏰉 TheASTisoftenimplementedasatreeoflinkedobjects.
􏰉 ThecompilerwritermustdesigntheASTdatastructure
carefully so that it is easy to build (during syntax analysis),
and easy to walk (during code generation).
􏰉 Theperformanceofthecompilerstronglydependsonthe
AST, so a lot of optimisation goes here for instustrial strength compilers.
25/41

Phases: syntax analysis (parsing)
T_while
T_greater
T_var ( n ) T_num ( 0 )
T_semicolon
T_decrement
T_var ( n )
T_var ( res )
T_var ( res )
T_update
T_mult
T_num ( 2 )
The construction of the AST has another important role: syntax checking, i.e. checking if the program is syntactically valid!
This dual role is because the rules for constructing the AST are essentially exactly the rules that determine the set of syntactically valid programs. Here the theory of formal languages (context free, context sensitive, and finite automata) is of prime importance. We will study this in detail.
26 / 41

Phases: syntax analysis (parsing)
T_while
T_greater
T_var ( n ) T_num ( 0 )
T_semicolon
T_decrement
T_var ( n )
T_var ( res )
T_var ( res )
T_update
T_mult
T_num ( 2 )
Great news: the generation of lexical analysers and parsers can be automated by using parser generators (e.g. lex, yacc). Decades of research have gone into parser generators, and in practise they generate better lexers and parsers than most programmers would be able to. Alas, parser generators are quite complicated beasts, and in order to understand them, it is helpful to understand formal languages and lexing/parsing. The best way to understand this is to write a toy lexer and parser.
27/41

Phases: semantic analysis
Source program
Lexical analysis
Syntax analysis
Semantic analysis, e.g. type checking
Intermediate code generation
Optimisation
Code generation
Translated program
28/41

Phases: semantic analysis
While parsing can reject syntactically invalid programs, it cannot reject semantically invalid programs, e.g. programs with more complicated ’semantic’ mistakes are harder to catch. Examples.
void main() { i=7
int i = 7 …
if ( 3 + true ) > “hello” then …
They are caught with semantic analysis. The key technology are types. Modern languages like Scala, Rust, Haskell, Ocaml, F# employ type inference.
29/41

Phases: intermediate code generation
Source program
Lexical analysis
Syntax analysis
Semantic analysis, e.g. type checking
Intermediate code generation
Optimisation
Code generation
Translated program
30/41

Phases: intermediate code generation
There are many different CPUs with different machine languages. Often the machine language changes subtly from CPU version to CPU version. It would be annoying if we had to rewrite large parts of the compiler. Fortunately, most machine languages are rather similar. This helps us to abstract almost the whole compiler from the details of the target language. The way we do this is by using in essence two compilers.
􏰉 Developanintermediatelanguagethatcapturesthe essence of almost all machine languages.
􏰉 Compiletothisintermediatelanguage.
􏰉 Docompileroptimisationsintheintermediatelanguage.
􏰉 Translatetheintermediaterepresentationtothetarget
machine language. This step can be seen as a
mini-compiler.
􏰉 Ifwewanttoretargetthecompilertoanewmachine
language, only this last step needs to be rewritten. Nice data abstraction.
31/41

Phases: optimiser
Source program
Lexical analysis
Syntax analysis
Semantic analysis, e.g. type checking
Intermediate code generation
Optimisation
Code generation
Translated program
32/41

Phases: optimiser
Translating a program often introduces various inefficiencies, make the program e.g. run slow, or use a lot of memories, or use a lot of power (important for mobile phones). Optimisers try to remove these inefficiencies, by replacing the inefficient program with a more efficient version (without changing the meaning of the program).
Most code optimisations are problems are difficult (NP complete or undecidable), so optimisers are expensive to run, often (but not always) lead to modest improvements only. They are also difficult algorithmically. These difficulties are exacerbate for JITs because the are executed at program run-time.
However, some optimisations are easy, e.g. inlining of functions: if a function is short (e.g. computing sum of two numbers), replacing the call to the function with its code, can lead to faster code. (What is the disadvantage of this?)
33/41

Phases: code generation
Source program
Lexical analysis
Syntax analysis
Semantic analysis, e.g. type checking
Intermediate code generation
Optimisation
Code generation
Translated program
34/41

Phases: code generation
This straighforward phase translates the generated intermediate code to machine code. As machine code and intermediate code are much alike, this ’mini-compiler’ is simple and fast.
35/41

Compilers vs interpreters
Interpreters are a second way to run programs.
􏰉 Theadvantageofcompilersis that generated code is faster, because a lot of work has to be done only once (e.g. lexing, parsing, type-checking, optimisation). And the results of this work are shared in every execution. The interpreter has to redo this work everytime.
􏰉 Theadvantageofinterpreters is that they are much simpler than compilers.
We won’t say much more about interpreters in this course.
Source program
Compiler
Syntax error?
Data
Executable
Output
At runtime. Data
Source program
Interpreter
Syntax error?
Output
36/41

Literature
Compilers are among the most studied and most well understood parts of informatics. Many good books exist. Here are some of my favourites, although I won’t follow any of them closely.
􏰉 ModernCompilerImplementationinJava(second edition) by Andrew Appel and Jens Palsberg. Probably closest to our course. Moves quite fast.
􏰉 Compilers-Principles,TechniquesandTools(second edition) by Alfred V. Aho, Monica Lam, Ravi Sethi, and Jeffrey D. Ullman. The first edition of this book is is the classic text on compilers, known as the “Dragon Book”, but its first edition is a bit obsolete. The second edition is substantially expanded and goes well beyond the scope of our course. For my liking, the book is a tad long.
37/41

Literature
Some other material:
􏰉 EngineeringaCompiler,byKeithCooper,LindaTorczon.
􏰉 TheAlexAiken’sStanfordUniversityonlinecourseon compilers. This course coveres similar ground as ours, but goes more in-depth. I was quite influenced by Aiken’s course when I designed our’s.
􏰉 ComputerArchitecture-AQuantitativeApproach(sixth edition) by John Hennessey and David Patterson. This is the ’bible’ for computer architecture. It goes way beyond what is required for our course, but very well written by some of the world’s leading experts on computer architecture. Well worth studying.
38/41

How to enjoy and benefit from this course
􏰉 Assessedcourseworkisdesignedtoreinforceand integrate lecture material; it’s designed to help you pass the exam
􏰉 Golookatthepastpapers-now.
􏰉 Usethetutorialstogetfeedbackonyoursolutions
􏰉 Substantiallabexerciseshouldbringitalltogether
􏰉 Askquestions,inthelectures,inthelabs,onCanvasorin person!
􏰉 Designyourownmini-languagesandwritecompilersfor them.
􏰉 Havealookatrealcompilers.Therearemanyfree, open-source compilers, g.g. GCC, LLVM, TCC, MiniML, Ocaml, the Scala compiler, GHC, the Haskell compiler.
39/41

Feedback
In this module, you will receive feedback through:
􏰉 Themarkandcommentsonyourassessment
􏰉 Feedbacktothewholeclassonassessmentandexams 􏰉 Feedbacktothewholeclassonlectureunderstanding
􏰉 Modelsolutions
􏰉 Workedexamplesinclassandlecture
􏰉 Verbalcommentsanddiscussionswithtutorsinclass
􏰉 Discussionswithyourpeersonproblems
􏰉 Onlinediscussionforums
􏰉 Onetoonesessionswiththetutors
The more questions you ask, the more you participate in discussions, the more you engage with the course, the more feedback you get.
40/41

Questions?
41/41