Computer Systems Week 3 (part 1)
Compilation, Interpretation & Overview of Java Virtual Machine
Lecture Objective
To introduce the basic concepts of compilation, interpretation and Java Virtual Machine.
Slide #2 of 38
Lecture Outline
u Levels of Programming Languages u High Level to Low Level Translation u High Level Program Execution
u Compilation vs. Interpretation
u Combined Compilation & Interpretation
u Compilation and Execution on Virtual Machines
Slide #3 of 38
Levels of Programming Languages
u High level languages
ne.g. Java, C/C++/C#, Fortran, Cobol, Pascal, etc nEasier for humans
u Low level languages
nMachine code – instructions stored in memory (opcodes) nHard to read and write by humans
u Next level up: Assembly code
nCan be written or read by humans (using mnemonics)
Watch on Youtube:
Most Popular Programming Languages 1965 – 2019
Slide #4 of 38
Levels of Programming Languages
Slide #5 of 38
Language Processing Systems
u How a program using C compiler is executed on a host machine ?
uThe high-level language is converted into binary language in various phases.
Slide #6 of 38
Converting High Level to Low Level
u To execute on a computer we must have machine code! u Assembly code is translated to machine code to run
nAssembler does this (e.g. works out the relative addresses for jumps etc.). Relocatable Code.
nLinker: combines different assembled parts into a Whole nLoader: loads into memory at a given location
Slide #7 of 38
Executing High Level Programs
u A program written in a high level language can be run in two different ways:
nCompiled into a program in the native machine language and then run on the target machine
nDirectly interpreted and the execution is simulated within an interpreter
uWhich approach is more efficient? nThink of C++ vs. JavaScript
Slide #8 of 38
Compilation
uCompiler: converts source code (text of a program) into object code – e.g. machine code – that does the same thing as the original program
uUsually object code is relocatable, so can be later linked and loaded into memory.
u Advantages:
nDone once for each program
nWith clever tricks to optimize object code (by exploiting hardware features) so that it will run fast
u Disadvantages:
nHarder than interpreting
nHardware dependent i.e. cannot run of different platforms
Slide #9 of 38
Compilation
uTake a stream of input and convert it into instructions a machine can execute.
uThese instructions will vary depending on the machine in use.
uHigh-level languages isolate you from the details.
Slide #10 of 38
Compilation
uCompiler runs on the same platform X as the target code
Slide #11 of 38
Cross Compilation
uCompiler runs on platform X, target code runs on platform Y
Slide #12 of 38
You compile for x86
Slide #13 of 38
..or for SPARC
Slide #14 of 38
Anatomy of a compiler
uA compiler can be divided into 2 parts (Front-End and Back-End).
uBoth the front end and the back end perform their operations in a sequence of phases.
uEach phase generates a particular data structure from another data structure emitted by the phase before it.
• Front-end includes lexical analysis, syntax analysis, semantic analysis and intermediate code generation.
• Back-End includes optimization and code generation.
Slide #15 of 38
Anatomy of a compiler
Slide #16 of 38
First phase, Lexical analysis
Lets explain how a compiler understands these with an analogy to how humans understand English!.
First step: recognize words. “Hello world, I’m a human.”
“Hello world, I’m a human.”
How about
“im hello,worldahuman.”
Slide #17 of 38
First phase, Lexical analysis
The compiler breaks the submitted source code into meaningful words called lexemes and generates a sequence of tokens from the lexemes.
Slide #18 of 38
First phase, Lexical analysis
• A token is an object describing a lexeme.
• Along with the value of the lexeme, it contains information such as its type (is it a
keyword? an identifier? an operator? …)
•
Slide #19 of 38
Second phase, Syntax Analysis (parsing)
The next step is to understand the structure of the sentence, and this is called parsing.
“My dog also likes eating sausage.”
Possessive pronoun Noun Adverb Verb gerund
Noun
The actual work of parsing is to group these words together into higher level constructs.
Slide #20 of 38
Second phase, Syntax Analysis (parsing)
• The next step is to understand the structure of the sentence, and this is called parsing.
“My dog Possessive pronoun Noun
Subject
also likes Adverb Verb
eating gerund
sausage.” Noun
Object
Verb
entire sentence
Slide #21 of 38
Second phase, Syntax Analysis (parsing)
• The next step is to understand the structure of the sentence, and this is called parsing.
“My dog also likes eating sausage.”
Slide #22 of 38
Syntax Analysis (parsing)
Abstract Syntax Tree generated (AST) after syntax analysis.
Syntax analysis is also the phase where eventual syntax errors are detected.
If no error is found during this phase, the compiler moves to the semantic analysis phase.
Slide #23 of 38
Third phase, Semantic Analysis
Next step is to try to understand the meaning of what has been parsed (AST).
Compiler checks if the program is consistent with all the rules of the source programming language.
Letting the compiler to understand the meaning is a complex matter.
We don’t know how this works for humans.
Slide #24 of 38
Third phase, Semantic Analysis
“Sarah gave a bath to her dog wearing a pink t-shirt” Is the dog wearing the pink t-shirt?
How does compile deal with ambiguity ?
Type inference: the compiler will annotate the corresponding node in the AST with the inferred type information.
Type checking: compiler checks that all values being assigned to variables and all arguments involved in an operation have the correct type.
Symbol management: The compiler uses the symbol table to answer questions such as:
Is this variable declared before use?, Are there 2 variables with the same name in the same scope?
Slide #25 of 38
Third phase, Semantic Analysis
The semantic analyser produces an annotated syntax tree as an output.
Slide #26 of 38
Fourth phase, Intermediate Code Generation
It is in between the high-level language and the machine language.
Slide #27 of 38
Fifth phase, Optimization
In natural language it’s like professional editing or proofreading.
For the source code, removes unnecessary code lines. Run faster or use less memory/resources.
Slide #28 of 38
Final phase, Code Generation
The code generator takes the optimized representation of the intermediate code and maps it to the target machine language.
Slide #29 of 38
… Playback
Slide #30 of 38
Compilation is a Compute Intensive process!
https://xkcd.com/303/
Slide #31 of 38
Interpretation
uInterpreter = another program that follows the source code (text of program) and does appropriate actions
uSame principle as:
nHumans running through instructions of a program
nA processor (CPU) can be viewed as a hardware implementation of an interpreter for machine code
u Advantages:
nFacilitates interactive debugging & testing
nUser can modify the values of variables; can invoke procedures from the command line
u Disadvantages:
nSlow Execution (as compared to compilation)
Slide #32 of 38
Interpretation
uRunning high-level code by an interpreter
Watch on Youtube:
Compiled vs. Interpreted Languages
Slide #33 of 38
Interpreters
• Runs slowly
• Starts right a way
• Lets you see the results
• Online
Compilers
• •
• •
Extra preparation time
The program runs very quickly.
Offline
To compile = To pile together
Slide #34 of 38
Combined Compilation & Interpretation
Executing high level programs
uCompile to an intermediate level (between high and
low) language that can be efficiently interpreted nSlower than pure compilation
nFaster than pure interpretation
nA single compiler, independent of CPU
nSeparate task for each CPU is to interpret the intermediate language
Slide #35 of 38
Combined Compilation & Interpretation
Slide #36 of 38
Source Code
Example: Java .java files Java bytecode .class files
Executing high level programs
uCompile to an intermediate level (between high and
low) language that can be efficiently interpreted
nSlower than pure compilation
nFaster than pure interpretation
nA single compiler, independent of CPU
javac
nSeparate task for each CPU is to interpret the intermediate
language
Java Runtime Environment (JRE) using Java Virtual Machine (JVM)
The command “java” calls the JRE
Slide #37 of 38
Combined Compilation & Interpretation
Slide #38 of 38
Virtual Machines
uA virtual machine executes an instruction stream in software (instead of hardware)
uAdopted by Pascal, Java, Smalltalk-80, C#, functional and logic languages, and some scripting languages
uPascal compilers generate P-code that can be interpreted or compiled into object code (https://en.wikipedia.org/wiki/P-
code_machine)
uJava compilers generate bytecode that is interpreted by the Java Virtual Machine (JVM)
uThe JVM may translate bytecode into machine code by Just-In-Time (JIT) compilation
Slide #39 of 38
Compilation and Execution on Virtual Machines
uCompiler generates intermediate program (language) uVirtual machine interprets the intermediate program
nWe need to have virtual machine on each platform
Slide #40 of 38
Summary
uCompilation vs. Interpretation uInterpreted languages
nexecute with the help of a layer of software, not directly on a CPU
nusually translated into intermediate code u Java
nconceived as an interpreted language
Slide #41 of 38