CS计算机代考程序代写 data structure c/c++ compiler c++ assembly assembler interpreter 1_introduction

1_introduction

INTRODUCTION TO COMPILER
Baishakhi Ray

Programming Languages & Translators

These slides are motivated from Prof. Calvin Lin, UT Austin

What is a Compiler?

Compiler

#include

int main() {
std::cout << "Hello World!"; return 0; } 001100110011 001001001100 110000000111 001000110000 000110110011 Source Program Target Program Input Output What is a Compiler? CompilerSource Program Target Program Input Output Interpreter Source Program Input Output A Hybrid Compiler Translator Source Program Interpreter Intermediate Program Input Output A language processing system Preprocessor Compiler Assembler Linker/Loader Source Program Modified Source Program Target Assembly Program Relocatable Machine Code Target Machine CodeLibrary files Relocatable object files A source code typically have many modules written in different files. Collect source program from different files, expand macros, etc. Assembly code is easier to produce and debug. A linker resolves external memory addresses, where one file’s code refers another file’s location. Loader puts all the executive object files to memory for execution. What is a Compiler? Compiler #include

int main() {
std::cout << "Hello World!"; return 0; } 001100110011 001001001100 110000000111 001000110000 000110110011 Intermediate Code Generation optimization Code Generation Lexical Analysis Syntactic Analysis Semantic Analysis Interpreter Character stream Token stream Syntax trees Syntax trees IR IR Target Language Structure of a Typical Compiler Intermediate Code Generation optimization Code Generation Lexical Analysis Syntactic Analysis Semantic Analysis Interpreter Character stream Token stream Syntax trees Syntax trees IR IR Target Language Analysis Phase Front End Synthesis Phase Back End Input to Compiler for i = 1 to 10 do a[i] = x * 5; f o r i = 1 t o 1 0 d o a [ i ] = x * 5 ; Intermediate Code Generation optimization Code Generation Lexical Analysis Syntactic Analysis Semantic Analysis Interpreter Character stream Token stream Syntax trees Syntax trees IR IR Target Language Lexical Analysis Intermediate Code Generation optimization Code Generation Lexical Analysis Syntactic Analysis Semantic Analysis Interpreter Character stream Token stream Syntax trees Syntax trees IR IR Target Language for i = 1 to 10 do a[i] = x * 5; Break character stream into tokens (“words”) for id(i) <=> number(1) to number(10) do
id(a) <[> id(i) <]> <=> id(x) <*> number(5) <;>

Compiler Data Structure

▪ Symbol Tables
▪ Compile-time data structures
▪ Hold names, type information, and scope information for variables

▪ Scopes
▪ A name space
e.g., In C/C++, each set of curly braces defines a new scope
▪ Can create a separate symbol table for each scope

Lexical Analysis

Intermediate Code
Generation

optimization

Code Generation

Lexical Analysis

Syntactic Analysis

Semantic Analysis

Interpreter

Character stream

Token stream

Syntax trees

Syntax trees

IR

IR

Target Language

for i = 1 to 10 do
a[i] = x * 5;

Break character stream into tokens (“words”)

for id(i) <=> number(1) to number(10) do
id(a) <[> id(i) <]> <=> id(x) <*> number(5) <;>

for <=> number(1) to number(10) do
<[> <]> <=> <*> number(5) <;>

1 i …
2 a …
3 x …

Symbol Table

Syntactic Analysis (Parsing)

Intermediate Code
Generation

optimization

Code Generation

Lexical Analysis

Syntactic Analysis

Semantic Analysis

Interpreter

Character stream

Token stream

Syntax trees

Syntax trees

IR

IR

Target Language

for i = 1 to 10 do
a[i] = x * 5;

Impose Structure to Token Stream

for

i 1 10 assign

a i x 5

array times
In a typical syntax tree, intermediate nodes represent operations and
Leaf node represent the arguments of the operations.

Semantic Analysis

Intermediate Code
Generation

optimization

Code Generation

Lexical Analysis

Syntactic Analysis

Semantic Analysis

Interpreter

Character stream

Token stream

Syntax trees

Syntax trees

IR

IR

Target Language

for i = 1 to 10 do
a[i] = x * 5;

Determine whether source is meaningful

▪ Check for semantic errors
▪ Check for type errors
▪ Gather type information for subsequent stages

▪ Relate variable uses to their declarations

Usage of Symbol Tables

▪ For each variable declaration:
▪ Check for symbol table entry
▪ Add new entry (parsing)
▪ add type info (semantic analysis)

▪ For each variable use:
▪ Check symbol table entry (semantic analysis)

Intermediate Code Generation

Intermediate Code
Generation

optimization

Code Generation

Lexical Analysis

Syntactic Analysis

Semantic Analysis

Interpreter

Character stream

Token stream

Syntax trees

Syntax trees

IR

IR

Target Language

Transform AST into low-level intermediate representation (IR)

Simplifies the IR

• Removes high-level control structures:
– for, while, do, switch
• Removes high-level data structures:
– arrays, structs, unions, enums

Intermediate Code Generation

Intermediate Code
Generation

optimization

Code Generation

Lexical Analysis

Syntactic Analysis

Semantic Analysis

Interpreter

Character stream

Token stream

Syntax trees

Syntax trees

IR

IR

Target Language

Transform AST into low-level intermediate representation (IR)

One possible result is assembly-like code

• Semantic lowering
• Control-flow expressed in terms of “gotos”
• Each expression is very simple
(three-address code)
e.g., x := a * b * c

t := a * b
x := t * c

Intermediate Code Generation

Intermediate Code
Generation

optimization

Code Generation

Lexical Analysis

Syntactic Analysis

Semantic Analysis

Interpreter

Character stream

Token stream

Syntax trees

Syntax trees

IR

IR

Target Language

for i = 1 to 10 do
a[i] = x * 5;

i := 1
loop1:
t1 := x * 5
t2 := &a
t3 := sizeof(int)
t4 := t3 * i
t5 := t2 + t4
*t5 := t1
i := i + 1
if i <= 10 goto loop1 Optimization Intermediate Code Generation optimization Code Generation Lexical Analysis Syntactic Analysis Semantic Analysis Interpreter Character stream Token stream Syntax trees Syntax trees IR IR Target Language Mostly machine independent optimization Phase aims to generate better code. Better can be • Faster • Shorter • Energy efficient • … Optimization Intermediate Code Generation optimization Code Generation Lexical Analysis Syntactic Analysis Semantic Analysis Interpreter Character stream Token stream Syntax trees Syntax trees IR IR Target Language for i = 1 to 10 do a[i] = x * 5; i := 1 t3 := sizeof(int) loop1: t1 := x * 5 t2 := &a t3 := sizeof(int) t4 := t3 * i t5 := t2 + t4 *t5 := t1 i := i + 1 if i <= 10 goto loop1 Low Level Code Generation Intermediate Code Generation optimization Code Generation Lexical Analysis Syntactic Analysis Semantic Analysis Interpreter Character stream Token stream Syntax trees Syntax trees IR IR Target Language Register Transfer Language (RTL) – Linear representation – Typically language-independent – Nearly corresponds to machine instructions Example operations • Assignment x := y • Unary op x := op y • Binary op x := y op z • Call x := f() • Cbranch if (x==3) goto L • Address of p := & y • Load x := *(p+4) • Store *(p+4) := y Exercise: a = b + c * 5 Compiler vs Interpreter Compiler Interpreter Optimization Compiler sees the entire program. Thus optimization is easy. Interpreter sees program line by line. Thus, optimization is not robust. Running time Compiled code runs faster Interpreted code runs slower Program Generation Compiler generates output program, which can be run independently at a later point of time. Do not generate output program. Evaluate each line one by one during program execution. Error Execution Emits compilation errors after the whole compilation process. Reads each line and shows errors, if any. Example C, C++ Command Lines Why studying compiler? Isn’t it a solved problem? ▪ Machines keep changing ▪ New features present new problems (e.g., GPU) ▪ Changing costs lead to different concerns ▪ Languages keep changing ▪ New ideas (e.g., OOP and GC) have gone mainstream ▪ Applications keep changing ▪ Interactive, real-time, mobile, machine-learning based applications Why studying compiler? ▪ Values keep changing ▪ We used to just care about run-time performance ▪ Now? ▪ Compile-time performance ▪ Code size ▪ Correctness ▪ Energy consumption ▪ Security ▪ Fault tolerance Value added compilation ▪ The more we rely on software, the more we demand more of it ▪ Compilers can help– treat code as data ▪ Analyze the code ▪ Correctness ▪ Security Correctness and Security ▪ Can we check whether pointers and addresses are valid? ▪ Can we detect when untrusted code accesses a sensitive part of a system? ▪ Can we detect whether locks are used properly? ▪ Can we use compilers to certify that code is correct? ▪ Can we use compilers to verify that a given compiler transformation is correct? Value-added Compilation ▪ The more we rely on software, the more we demand more of it ▪ Compilers can help– treat code as data ▪ Analyze the code ▪ Correctness ▪ Security ▪ Reliability ▪ Program understanding ▪ Program evolution ▪ Software testing ▪ Reverse engineering ▪ Program obfuscation ▪ Code compaction ▪ Energy efficiency Computation important understanding computation important Why studying compiler? ▪ Compilers are a fundamental building block of modern systems ▪ We need to understand their power and limitations ▪ Computer architects ▪ Language designers ▪ Software engineers ▪ OS/Runtime system researchers ▪ Security researchers ▪ Formal methods researchers (model checking, automated theorem proving) Due • Programming assignment 0 is due next Wednesday. • Set up Git environment