Fundamentals
Mitchell Chapter 4
slides copyright 2002-2020 Author John Mitchell, updated by Amy Felty
Syntax and Semantics of Programs
“…theoretical frameworks have had an impact on the design of programming languages and can be used to identify problem areas in programming languages.”
• Syntax
– Thesymbolsusedtowriteaprogram
• Semantics
– Theactionsthatoccurwhenaprogramisexecuted
• Programming language implementation
– Syntax®Semantics
– Transformprogramsyntaxintomachineinstructionsthatcanbe executed to cause the correct sequence of actions to occur
2
Input
Interpreters vs. Compilers
Source Program
Interpreter Output
Source Program
Compiler
Target Program Output
Input
Interpreters vs. Compilers
Source Program
Input Interpreter Output
Source Program
Compiler
A compiler translates the
entire program into
machine code before the
Target Program Output
program is run
Input
Interpreters vs. Compilers
Input
Source Program
Interpreter Output
An interpreter combines translation and program execution
Source Program
Compiler
A compiler translates the
entire program into
machine code before the
Target Program Output
program is run
Input
Interpreters vs. Compilers
Studying compilers makes it easier to separate the main issues and discuss them in a given order.
Source Program
Input Interpreter Output
An interpreter combines translation and program execution
Source Program
Compiler
A compiler translates the
entire program into
machine code before the
Target Program Output
program is run
Input
Source Program
A Typical Compiler
Lexical Analyzer Syntax Analyzer
Semantic Analyzer
Code Optimizer
Code Generator
Intermediate Code Generator
Target Program
7
Compiler Phases
– Inputsymbolsarescannedfromlefttorightandgroupedinto
meaningful units called tokens.
– Distinguishesnumbers,identifiers,symbolsandkeywords.
– Example:temp:=x+1 Tokens are: temp, :=, x, +, 1
• Syntax Analysis
– Parsing:tokensaregroupedintosyntacticunitssuchasexpressions, statements, and declarations that must conform to the grammatical rules of the programming language.
– Iftheprogramdoesnotmeetthesyntacticrequirementtobeawell- formed program, an error message is reported, and the compiler terminates.
– The result is a parse tree.
– Tobediscussedinmoredetail.
• Lexical Analysis
8
Compiler Phases
• Semantic Analysis
– Context information is used to augment the parse tree, i.e., type
information (from type inference)
– Note the difference between semantic analysis and program semantics (i.e. program meaning)
• Intermediate Code Generation
– It is difficult to generate efficient code in one phase.
– It is important to use an intermediate representation that is easy to produce and easy to translate into the target language.
• Code Optimization
– Different techniques are applied over and over to the
intermediate representation. (See next page.)
• Code Generation
– Converts the intermediate code into a target machine code.
– Involves choosing memory locations and registers for variables. – Efficiency is important.
9
Some Standard Code Optimizations
• Common Subexpression Elimination
– If a program calculates the same value more than once, then
calculate only once and store for later use.
• Copy Propagation
– If a program contains an assignment x=y then it may be possible to change later statements to refer to y instead of to x and remove the assignment.
• Dead-Code Elimination
– Eliminate sequences of code that can never be reached.
• Loop Optimizations
– Move expressions that occur inside a loop to outside the loop if
they don’t change value.
• In-lining Function Calls
– Substitute function calls with the body of the function when possible. This often allows further optimizations to be performed by removing jumps.
10
Syntax: Grammars and Parse Trees
• Grammar
e ::= n | e+e | e-e
n ::= d | nd
d ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
• Expressions in language generated by derivations, e.g.,
e ® e-e
® e-e+e ® n-n+n ® nd-d+d ® dd-d+d ®… ®27-4+3
Grammar defines a language
Expressions in language derived by sequence of productions
11
Syntax: Grammars and Parse Trees
• Grammar
e ::= n | e+e | e-e
n ::= d | nd
d ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
• A Grammar includes:
– Astartsymbol(einthiscase)
– Asetofnonterminals
– A set of terminals (which appear in the expressions of the language generated by the grammar)
• In this example:
– Nonterminals:e,n,d – Terminals:0,…9,+,-
• Examples:
– 0,1+3+5,2+4-6-8
Nonterminals keep track as a valid expression is being formed. They must
eventually be replaced.
12
Parse Trees (Derivation Trees)
• Derivation represented by tree
e ® e-e ® e-e+e ® n-n+n ® nd-d+d ® dd-d+d
®… ®27-4+3
e
e-e
e+e 43
27
Tree shows parenthesization of expression. A grammar is ambiguous if some expression has more than one parse tree.
13
Parse Trees (Derivation Trees) • Exercise: draw 2 parse trees for 10—15 + 12
• Grammar
s ::= v:=e | s;s | ifbthens| ifbthenselses v ::= x | y| z
e ::= v | 0 | 1 | 2 | 3 | 4
b ::= e=e
• Exercise: draw 2 parse trees for
if b1 then if b2 then s1 else s2
What happens when b1=true and b2=false? Mitchell, pp. 54-55
14
• Parsing
Parsing
– Given a language L defined by a grammar G, and a string of symbols s, an algorithm that decides whether s is in L, and constructs a parse tree if it is, is called a parsing algorithm for G.
• Ambiguity
– Expression 27 – 4 + 3 can be parsed two ways – Problem: 27-(4+3) 1 (27-4)+3
• Ways to resolve ambiguity
– Precedence
• By convention * has higher precedence than + or — • For example, parse 3*4 + 2 as (3*4) + 2
– Associativity
• Parenthesize operators of equal precedence to left (or right) • Parse 3-4+5as(3-4)+5
15