CS计算机代考程序代写 scheme python data structure compiler Java c++ Fortran assembly assembler algorithm interpreter Corporate PowerPoint Template

Corporate PowerPoint Template

Compiler Design

Week 1

20/07/2021

2

Detailed content

Weekly program

 Week 1 – Introduction to Compiler Design

 Week 2 – CD Programming Language

 Week 3 – Lexical Analysis

 Week 4 – Syntax Analysis

 Week 5 – Top Down Parsing

 Week 6 – Symbol Table and Error Recovery

 Week 7 – Bottom-Up Parsing

 Week 8 – Semantic Analysis

 Week 9 – Runtime Environment: Stack Machine (SM) Architecture

 Week 10 – Code Generation

 Week 11 – Code Optimisation

 Week 12 – Revision

 Week 13 – Extra revision (if needed)

COMP3290 – Semester 2 – 2021| www.newcastle.edu.au

Week 01 Lecture Outline
Introduction to Compiler Design

 Compilers and Programming languages

 Basic Structure of a Compiler

 Lexical Analysis

 Syntactic Analysis

 Semantic Analysis

 Code Generation

 Code Optimisation

 Analysis-Synthesis model of compiler

 Compiler Construction tools

 Complier Design for COMP3290

COMP3290 – Semester 2 – 2021| www.newcastle.edu.au

3

20/07/2021

20/07/2021

COMP3290 – Semester 2 – 2021| www.newcastle.edu.au

4What is a compiler ?

• “A compiler is a program that reads a program written in a given language and

translates it into an equivalent program in another language“, it is a translator.

Source program Compiler Target program

error messages

Source

Language

Translator Target

Language

20/07/2021

COMP3290 – Semester 2 – 2021| www.newcastle.edu.au

5Examples

Java Compiler Java Byte Code

C++ Compiler RISC object code

Polish
Translator

at the UN
Estonian

All compilers basically does the same thing. Difference lies in the choice of source

and target languages.

20/07/2021

COMP3290 – Semester 2 – 2021| www.newcastle.edu.au

6Programming Languages

• Programming languages are notations to specify algorithms

– to human

– to machine

• Programming languages are very young, not even a century has

passed since the dawn of electro-mechanical computing era.

20/07/2021

COMP3290 – Semester 2 – 2021| www.newcastle.edu.au

7Programming Languages

Machine Language:

• Simple machine code (all 0’s and 1’s)

• Instructions very closely resemble the operations performed in the

underlying hardware

• Writing complex algorithm is severely challenging.

Low-level Programming Languages:

• Assembly languages (mnemonics like ADD A, B, C)

• Painfully huge number of lines even for really, small algorithms

High-level Programming Languages:

• High-level languages – e.g., Fortran, C, C++, Java, Python

• Programmers were happier to code, but compilers got complex

20/07/2021

COMP3290 – Semester 2 – 2021| www.newcastle.edu.au

8Type of Machine Codes

Compilers may generate three types of machine code

1. Pure Machine Code

– Generates code in a particular machine instruction set

– The code runs on bare hardware – No support for operating system of library

routines

– Not common. Used for implementing operating systems or embedded

applications

2. Augmented Machine Code

– Generates code for a machine architecture that is augmented with OS

routines and runtime language support routines

– The code needs a particular OS in the target machine and language specific

runtime support routines (for I/O, memory allocation, mathematical functions

etc.)

20/07/2021

COMP3290 – Semester 2 – 2021| www.newcastle.edu.au

9Type of Machine Codes

3. Virtual Machine Code

– Generates virtual instructions for a VM

– Attractive for better portability (via an interpreter for the VM)

• Can run on any machine if the VM interpreter is available

– Can make the compiler itself easy to port. How?

20/07/2021

COMP3290 – Semester 2 – 2021| www.newcastle.edu.au

10HLL to LLL – CD to SM

• Hardware operations are usually quite limited

• One statement in a HLL = many lines of code in a low-level language =

humongous amount of 0’s and 1’s in machine code

Compilers and Interpreters

20/07/2021

COMP3290 – Semester 2 – 2021| www.newcastle.edu.au

11

• Share some of the functionalities of compilers

• Executes the programs with explicitly performing much translation

Compilers and Interpreters

20/07/2021

COMP3290 – Semester 2 – 2021| www.newcastle.edu.au

12

• Special capabilities in interpreters

– Programs can be easily modified during execution – interactive

debugging capability

– Can easily support languages in which type of objects change

dynamically (though re-examination of the program as execution

proceeds)

– Significant degree of machine independence

• What is the cost?

20/07/2021

COMP3290 – Semester 2 – 2021| www.newcastle.edu.au

13Compiler Writing – How difficult?

• A compiler is a very complex program the requires:

– Knowledge of programming languages

– Machine architecture

– Language theory

– Algorithms and Data Structure

– Software engineering

20/07/2021

COMP3290 – Semester 2 – 2021| www.newcastle.edu.au

14Compiler Writing – How difficult?

• A compiler is a very complex program that involves a lot of task:

– The first FORTRAN compiler took 18 staff-years to implement

(1957)

– Requirements for implementation of C++ standard:

27 chapters, 5 annexes.

COMP3290: ~ 140 hours* – roughly 4 weeks full-time

Compiler Writing – How difficult?

• A compiler is a very complex program, but:

– Since the first compilers, systematic methods for writing compilers

have been developed

– There is a traditional compiler structure: the compiler is

implemented as a number of different phases that work together

20/07/2021

COMP3290 – Semester 2 – 2021| www.newcastle.edu.au

15

Cousins of Compiler

Preprocessor

Compiler

Assembler

Linker

Source Program

Modified Source Program

Target Assembly Program

Relocatable Object Code

Absolute Machine Code

Libraries and

Relocatable Object Files

20/07/2021

COMP3290 – Semester 2 – 2021| www.newcastle.edu.au

16

20/07/2021

COMP3290 – Semester 2 – 2021| www.newcastle.edu.au

17Phases of a Compiler

• Compilation is generally divided into at least 4 simpler processes, often

referred to as Phases

– Lexical Analysis (The Scanner)

– Syntactic Analysis (The Parser)

– Semantic Analysis

– Code Generation

• And another added towards the end called

– Optimisation

• Where each phase begins its function and where it passes duties over

to the next phase can vary significantly according to the languages

involved.

20/07/2021

COMP3290 – Semester 2 – 2021| www.newcastle.edu.au

18Lexical Analysis

• The first thing we do with “Time flies like an arrow” is think of it as 5

words rather than a string of 20 characters. Before we group words into

syntactic phrases we group letters into words.

• The lexical analyser reads the stream of characters making up the

source program and groups the characters into meaningful sequences

called lexemes.

• For example:

position = initial + rate * 60

20/07/2021

COMP3290 – Semester 2 – 2021| www.newcastle.edu.au

19

Token – a token is a tuple of the form

Token_Classification is what is needed to test the syntax. E.g. all id’s are

syntactically equivalent. So in order to check syntax, all we need to know is that

the lexeme is an identifier, there is no need to know which id it actually is.

Extra_Data for most lexemes is “NULL”, once we have found a keyword such as if

or then or an operator such as +, we don’t need to know anything more about the

lexeme. However, for lexemes such as id’s, integer constants, etc. we need to

carry forward information about “which” id or constant this lexeme actually is, so

that it is available for semantic analysis and code generation in later phases.

Lexemes and Tokens

Syntactic Analysis

• Syntactic Analysis of language structure is obviously important in both

natural language and computer languages.

• Syntax analyser checks if the program is grammatically correct.

• A complete set of rules covering all the structures possible within the

language is needed.

• It is generally easier with programming languages than with natural

language as its unlikely that a PL will use the same word as both a

noun and a verb (e.g. flies), or as both a verb and a preposition (e.g.

like).

20/07/2021

COMP3290 – Semester 2 – 2021| www.newcastle.edu.au

20

20/07/2021

COMP3290 – Semester 2 – 2021| www.newcastle.edu.au

21Basic Structure of a Compiler

Lexical

Analyser

Syntax

Analyser

Code

Gen.

Symbol

Table

Source

Code

Tokens Syntax

Tree

Object

Code

20/07/2021

COMP3290 – Semester 2 – 2021| www.newcastle.edu.au

22Syntax Analyser and Syntax Trees

• The syntax analyser takes tokens from the lexical analyser and
determines how they should be structured in to expressions, statements
and other “phrases” of the programming language.

• Parser creates a tree-like intermediate representation that depicts the
grammatical structure of the token stream.

• Syntax Trees – are one of many different schemes for describing the
grammatical structure of a program.

• Syntax Trees are a good method because they encode structural
information in a nice way – easily understood, yet containing only
necessary information.

• Note (on following examples) that the only items which appear in syntax
trees are identifiers and constants. Keywords and delimiters never
appear as leaves but determine the structure of the tree. In English
ALL words would be leaves of the tree.

20/07/2021

COMP3290 – Semester 2 – 2021| www.newcastle.edu.au

23Syntax Tree Examples

A * ( B + C ) has a self-evident

structure
if x <= 0 then flag := 1 * + CB A if then stmt asgn stmt<= x 0 flag 1 20/07/2021 COMP3290 - Semester 2 - 2021| www.newcastle.edu.au 24Semantic Analysis • Knowing the lexical and syntactic structuring that should be associated with a program’s parts does not imply that the meaning of the program is known. • Eg. x := 8 is an assignment statement with target x. The meaning of the statement cannot be determined until the meaning of x is known. If x was never declared or was declared to be of type boolean, then the statement would be meaningless. • Determining the meaning is Semantic Analysis. With compilers the goal is not just to understand a program but to translate it. Semantic Analysis and Code Generation are tightly integrated. 20/07/2021 COMP3290 - Semester 2 - 2021| www.newcastle.edu.au 25Semantic Analysis • Deciding what as actually meant by the particular example of the source language. • Some meaning can be gained syntactically, e.g. in Pascal, x=3 in the context : const x=3 … means declare a constant x and give it the value 3. On the other hand in the context if x=3 then … means do the … if the value of x is currently 3. However there are aspects of meaning that cannot be decided solely on syntax. • What does const x=3 mean if x has already been declared? • What does if x=3 then … mean if x has never been declared or has been declared to be an array? 20/07/2021 COMP3290 - Semester 2 - 2021| www.newcastle.edu.au 26Semantic Analysis • Semantic analysis involves the further inspection of the syntax tree, using information from the Symbol Table to determine if the source HAS A SINGLE USEFUL MEANING. • Semantic analysis also gather type information and stores in syntax tree or symbol table for later phases • Type-checking is an important part of semantic analysis. – E.g. is the array index an integer? • Coersion – another task often performed by semantic analyser. – E.g if a binary arithmetic operator is applied to a floating-point number and an integer, the compiler may convert or coerce the integer into a floating-point number. 20/07/2021 COMP3290 - Semester 2 - 2021| www.newcastle.edu.au 27Code Generation • Takes as input an intermediate representation and maps it into the target language. • The intermediate instructions are translated into sequences of machine instructions that perform the same task. • If the target language is machine code, registers or memory locations are selected for each of the variables. • A crucial aspect of code generation is the judicious assignment of registers to hold variables. Code Generation • Requires an intricate knowledge of the target language. • If you wish to translate Polish into Estonian for the UN then you must know both languages VERY well. – Knowledge of idiom in the target language (as well as the source language in the case of natural language) can help with providing the closest meaning possible for the target. – Such knowledge can make the translation much more effective (optimised). • The same is true in code generation – intricacies of the architecture and particular syntactic structures within the source language can make a big difference to the performance of the target code. 20/07/2021 COMP3290 - Semester 2 - 2021| www.newcastle.edu.au 28 20/07/2021 COMP3290 - Semester 2 - 2021| www.newcastle.edu.au 29Code Generation • With compilers, the requirement of Semantics is to have one EXACT meaning – Idiom still plays a part in code generation in the form of more efficient ways of translating for the target machine, in the form of Code Optimisation. Phases of a Compiler Lexical Analyzer Syntax Analyzer Semantic Analyzer Intermediate Code Generator Machine Independent Code Optimiser Code Generator Machine Dependent Code Optimiser character stream token stream syntax tree syntax tree intermediate representation intermediate representation target machine code target machine code Symbol Table 20/07/2021 COMP3290 - Semester 2 - 2021| www.newcastle.edu.au 30 Code Optimisation • Machine-independent code-optimisation phase attempts to improve the intermediate code so that better target code will result. – Usually better means faster – Other objectives may be desired, such as shorter code, or target code that consumes less power. • Usually is a time consuming process • The problem of generating the optimal target code from a source program is undecidable in general – Some simple algorithms can help to improve a lot 20/07/2021 COMP3290 - Semester 2 - 2021| www.newcastle.edu.au 31 Code Optimisation • Suppressing code generation of unreachable code segments, • Ridding of unused variables, • Eliminating multiplication by 1 and addition by 0, • Loop optimisation (e.g., remove statements that are not modified in the loop), • Common subexpression elimination 20/07/2021 COMP3290 - Semester 2 - 2021| www.newcastle.edu.au 32 Code Optimisation Example: 20/07/2021 COMP3290 - Semester 2 - 2021| www.newcastle.edu.au 33 _t1 = b * c (eliminate add 0) (eliminate common subexp, _t1 is same as _t3) _t2 = _t1 + _t1 a = _t2 _t1 = b * c _t2 = _t1 + 0 _t3 = b * c _t4 = _t2 + _t3 a = _t4 Code Optimisation • Compiler optimisations must meet the following design objectives: – The optimisation must be correct, that is, preserve the meaning of the compiled program, – The optimisation must improve the performance of many programs, – The compilation time must be kept reasonable, and – The engineering effort required must be manageable. 20/07/2021 COMP3290 - Semester 2 - 2021| www.newcastle.edu.au 34 Code Optimisation • Idioms in natural language translation - equivalent to knowledge of architecture of the target language • Recognise sets of instructions that can be replaced by shorter or faster sets of instructions, based on the target machine architecture. • Often instructions are pipelined, yielding out-of-order execution and decreasing total time of execution 20/07/2021 COMP3290 - Semester 2 - 2021| www.newcastle.edu.au 35 Code Optimisation • Redundant-instruction elimination • Flow-of-control optimisations • Algebraic simplifications • Use of machine idioms • These optimisations are often coined as Peephole optimisation techniques, since they consider a small window/segment of object codes for optimisation. 20/07/2021 COMP3290 - Semester 2 - 2021| www.newcastle.edu.au 36 Important Sub-process Symbol Table Generation • A symbol table contains information about all the identifiers in the program along with important attributes such as type and scope. • An entry for each identifier is created during syntax analysis. • During semantic analysis, type and scope are added for each identifier. • During code generation, different instructions are often required to be emitted based on the type of the identifier. • Different instructions are used for integer and floating point arithmetic 20/07/2021 COMP3290 - Semester 2 - 2021| www.newcastle.edu.au 37 Important Sub-process Error Handling • Lexical analysis - bad tokens • Syntax Analysis - bad combination of valid tokens • Semantic Analysis - declare before use, type mismatch in assignments 20/07/2021 COMP3290 - Semester 2 - 2021| www.newcastle.edu.au 38 20/07/2021 COMP3290 - Semester 2 - 2021| www.newcastle.edu.au 39 The Analysis-Synthesis Model of Compilation • There are two parts to compilation: – Analysis (Front End): High level language to an intermediate representation • Analysis determines the operations implied by the source program which are recorded in a tree structure • Checks for syntax and semantic errors • Creates an intermediate representation & symbol table – Synthesis (Back End): From the intermediate representation, synthesise the target machine code • Synthesis takes the tree structure and translates the operations therein into the target program • Benefit? 20/07/2021 COMP3290 - Semester 2 - 2021| www.newcastle.edu.au 40 20/07/2021 COMP3290 - Semester 2 - 2021| www.newcastle.edu.au 41Benefit of Modularising of a Compiler • Problem becomes easier to handle • Decouples analysis from synthesis • Re-use the same intermediate representation as an input to different code generators for different target machine architectures Compiler-Construction Tools • Software development tools are available to implement one or more compiler phases – Scanner generators – Parser generators – Syntax-directed translation engines – Automatic code generators – Data-flow engines 20/07/2021 COMP3290 - Semester 2 - 2021| www.newcastle.edu.au 42 20/07/2021 COMP3290 - Semester 2 - 2021| www.newcastle.edu.au 43Benefit of Building a Compiler • How complicated real-world problems are solved by abstracting the essence of the problem mathematically. – Identify the problem – Formulate a mathematical abstraction – Solve using mathematical techniques. • We will see application on every phase of compiler • We will learn the general methodology of solving complex and open- ended problems Impact on the design of Compiler • Design of programming languages and compilers are intimately related • Advances in programming languages placed new demands on compiler writers – to devise algorithms and representations to translate and support the new language features • Compilers are critical in making high-performance computer architectures effective – to devise translation algorithms that would take maximal advantage of the new hardware capabilities. – Compilers are also critical in making high-performance computer architectures effective on users' applications. 20/07/2021 COMP3290 - Semester 2 - 2021| www.newcastle.edu.au 44 20/07/2021 COMP3290 - Semester 2 - 2021| www.newcastle.edu.au 45 COMP3290: CD to SM • In order to translate we need to know the language we are translating (CD). • In order to translate into machine code, we need to look quite closely at the architecture of our target machine (SM). 20/07/2021 COMP3290 - Semester 2 - 2021| www.newcastle.edu.au 46COMP3290: CD • CD has features that are similar to Ada/Algol60. • It is a free format language & strongly typed language. • It contains a set of features that’s simple to translate to the target SM code. • Any language requires a complete and unambiguous language definition - both for the programmer and the compiler writer. 20/07/2021 COMP3290 - Semester 2 - 2021| www.newcastle.edu.au 47CD Example CD Ex1 main i : int, j : int begin i = 10; j = i / 2 ; print j ; end CD Ex1 20/07/2021 COMP3290 - Semester 2 - 2021| www.newcastle.edu.au 48COMP3290: Lexical Analysis • Scan a CD program character by character • Check for juxtaposed characters forming words • Split them as meaningful words in the language or so-called tokens • Report errors if they are not allowed in CD. • Such a tool is called lexer/scanner. 20/07/2021 COMP3290 - Semester 2 - 2021| www.newcastle.edu.au 49COMP3290: Lexcial Analysis TCD TIDEN Ex1 TMAIN TIDEN i TCOLN TINTG TCOMA TIDEN j TCOLN TINTG TBEGN TIDEN i TEQUL TILIT 10 TSEMI TIDEN j TEQUL TIDEN i TDIVD TILIT 2 TSEMI TPRIN TIDEN j TSEMI TEND TCD TIDEN Ex1 TEOF CD Ex1 main i : int, j : int begin i = 10; j = i / 2 ; print j ; end CD Ex1 20/07/2021 COMP3290 - Semester 2 - 2021| www.newcastle.edu.au 50COMP3290: Parsing Fragment of CD Language Syntax ::= + | |

::= * | / | % |

::= ^ |

::= | | | | true | false

::= ( )

20/07/2021

COMP3290 – Semester 2 – 2021| www.newcastle.edu.au

51COMP3290: Semantic Analysis

For example,

j = i / 2 ;

• Have all variables been declared?

• Is there a type mismatch in assignments?

• Is it a valid operation on that identifier?

20/07/2021

COMP3290 – Semester 2 – 2021| www.newcastle.edu.au

52COMP3290: Code Generation

SM:

• Stack machine (SM) hardware is used in a small hand-held device

designed to work in extreme conditions.

• Most of its instructions do not carry any operand information within the

instruction itself but operate on a stack of register. Assumes that

operands will be from the stack, and results placed in the stack.

20/07/2021

COMP3290 – Semester 2 – 2021| www.newcastle.edu.au

53COMP3290: Code Generation

• Generate assembly code and object code for the SM simulator from

any given CD program

LA 1, j

LV 1, i

LB 2

DIV

/– divide i by 2

ST

/– store the result at j

20/07/2021

COMP3290 – Semester 2 – 2021| www.newcastle.edu.au

54COMP3290: Code Generation

• Generate assembly code and object code for the SM simulator from

any given CD program

5

41 2 52 91 0 0 0 8

41 10 43 91 0 0 0 0

81 0 0 0 8 41 2 14

43 81 0 0 0 0 64 67

0 0 0 0 0 0 0 0

3

32

10

2

1

2.500000

0

References

• Compilers: Principles, Techniques, and Tools

(2nd Edition)

• By A.V. Aho, Monica S Lam, R. Sethi, Jeffrey D.

Ullman

• Chapter 1

COMP3290 – Semester 2 – 2021| www.newcastle.edu.au

55

20/07/2021