CS计算机代考程序代写 jvm compiler Java Fortran c# cache assembly L16-CompilerArchitecture

L16-CompilerArchitecture

HPC ARCHITECTURES
Compiler Architecture

What is a compiler?
• A compiler is a program which translates source code to

machine code.
• source code is (usually) human readable, and mainly human

written.
• machine code can be executed on a particular instruction set

architecture (ISA).
• machine code will normally require linking with libraries before in

can be executed.

Compiler Linkersourcecode
object code

executable

What makes a good compiler?

• Compiler should be bug-free.
• It should generate correct machine code.
• It should generate fast machine code.
• Optimisations should be consistent and predictable
• Machine code should be debuggable.
• Time to compile should be short.
• Compiler should give good diagnostics and error

messages.
• Compiler should be modular, if it needs to cope with

multiple ISAs and/or multiple source languages..

Intermediate representations
• Most compilers don’t translate directly from source code

to machine code.
• Translation is done first to an intermediate representation

(IR), then to machine code.
• helps modularity (one IR, many ISAs, many source languages)
• better representation for optimising transformations.

• The part of the compiler which translates source code to
IR is called the front end.

• The part which translates IR to machine code is called the
back end.

• In practise, both ends are composed of several stages.

Front end
• Front end consists of several stages…

Lexical analyzer

source code

parse tree

Parser

Semantic analyzer

IR generator

token stream

parse tree

IR

Lexical analyzer

source code

parse tree

Parser

Semantic analyzer

IR generator

token stream

parse tree

IR

Lexical analyzer
• Turns ASCII text into a stream of tokens.

Example:

x = x*(b+1);

id(x)
=
id(x)
*
(
id(b)
+
num(1)
)
;

Lexical analyzer
• Tokens can be

• identifiers (e.g. variables)
• numbers (e.g. 3.0)
• strings
• special characters/character groups (e.g. *, ==)
• keywords (e.g. for)

• Whitespace and comments are stripped out
• Tokens specified by regular expressions, e.g.

• Implemented as deterministic finite state automata

letter = [a-zA-Z]
digit = [0-9]

identifier = letter (letter|digit)*

Lexical analyzer

source code

parse tree

Parser

Semantic analyzer

IR generator

token stream

parse tree

IR

Parser
• Parser converts the token stream into a parse tree (or

abstract syntax tree (AST)).
id(x)
=
id(x)
(
id(b)
+
num(1)
)

id(x)

*

id(b)

+

=

num(1)

id(x)

Parser (cont.)
• Parser determines whether the token stream is

meaningful in the language.

• To do this it uses a grammar
• set of rules describing the allowed combinations of tokens which

constitute a meaningful program in the source language.

• Parser also builds a symbol table
• lists of identifiers, with their type and program context

• Parser detects syntax errors

Lexical analyzer

source code

parse tree

Parser

Semantic analyzer

IR generator

token stream

parse tree

IR

Semantic analyzer
• Uses the parse tree/AST and symbol table to check that

identifiers are being used correctly

• Mainly concerned with type checking.

• Catches semantic errors
e.g.

a=a+b

is syntactically correct in Fortran, but it is an error
if a is a string and b an array, for example.

Lexical analyzer

source code

parse tree

Parser

Semantic analyzer

IR generator

token stream

parse tree

IR

IR generator
• IR is a language which is at a level which can represent

many high level languages, but is machine independent.
• can represent source variables, temporaries and registers (no

arrays).
• control flow consists of calls, returns, gotos and branches (no

loops)

id(x)

*

id(b)

+

=

num(1)

id(x) t1 := b+1
x := x*t1

IR generator

• AST to IR translation is quite straightforward.

• Some compilers also use a higher level IR (with loops and
arrays)
• useful for some cache optimisations.

• Some compilers may use a lower level IR
• closer to assembly language
• no variables, only registers and addresses.

Back end
• Back end also contains multiple phases:

IR Optimiser

IR

assembly code

Code generator

Low-level optimiser

IR

object code

IR Optimiser

IR

assembly code

Code generator

Low-level optimiser

IR

object code

IR optimiser
• Performs optimisations which are generic (non machine-

dependent).

• Optimisations are transformations of IR code.

• Choosing a suitable order of transformations is an
important part of compiler design.
• some transformations may need to be done more than once

Optimisations
• Induction variable analysis
• Loop fusion
• Loop fission
• Loop inversion
• Loop interchange
• Loop-invariant code motion
• Common sub-expression elimination
• Constant folding and propagation
• Etc….

Optimisation?
• Note that the term optimisation is something of a

misnomer.
• The resulting code is (almost) never optimal.
• There is seldom any iterative process which corresponds

to classical optimisation.
• There is seldom even much attempt to quantify the

effectiveness of transformations.
• Compiler optimisation is often just a predetermined

sequence of transformations which is heuristically known
to produce some performance gains….

IR Optimiser

IR

assembly code

Code generator

Low-level optimizer

IR

object code

Code generator
• Turns IR into assembly code

• In practice this is not trivial
• finite number of registers
• scheduling and resource constraints in ISA

t1 := b+1
x := x*t1

ld [%o1],%i1
add %i1,1,%i2
ld [%o2],%i3
mul %i2,%i3,%i4
st [%o2],%i4

IR Optimiser

IR

assembly code

Code generator

Low-level optimiser

IR

object code

Low-level optimiser

• Performs machine specific optimisations
• register allocation
• instruction scheduling

• Again need to choose sensible ordering of
transformations

• Final product is optimised object code.

Alternative strategy
• Some compilers do all optimisation on assembly language

or low-level IR.

• This can reduce the problems of transformation ordering.

• Address computations are visible to all optimisations.

• Less modular
• harder to target multiple ISAs

Compiler architecture for Java

• Java is a platform-neutral language
• Java source is compiled to Java byte code
• Byte code is similar to an IR

• uses a stack-based model of computation
• More compact, lower-level than source code

• better distribution medium than source code
• Byte code can be executed on any platform running a

Java Virtual Machine (JVM).
• C# works much the same way

• This can be entirely done in software (interpretation)
• highly portable
• rather slow

• Part (or all) of the byte code can be compiled at execution
time to native machine code.
• so called just-in-time (JIT) compilation.
• machine code is then executed instead of interpreting byte code.
• compilation time cannot be too long.

Source to byte code
• The source to byte code stage (javac) is very similar to

traditional front end:

Lexical analyzer

Java source code

parse tree

Parser

Semantic analyzer

Byte code generator

token stream

parse tree

Java byte code

Just-in-time compilation
• JIT has functionality of a traditional back end:

IR Optimiser

Java byte code

assembly code

Code generator

Low-level optimiser

IR

object code

IR generator
IR

Adaptive compilation
• Many JVMs use an adaptive compilation strategy.
• First time a method is executed, either use interpretation

or compilation with minimal optimisation.
• JVM keeps a running profile of executing code
• If a method is consuming significant portion of execution

time, it may be profitable to recompile with more
optimisation.
• trade off between increased compile time and reduced execution

time
• can have hierarchy of optimisation levels