Compilers and computer architecture: Semantic analysis
Martin Berger Alex Jeffery
October 2015
Recall the function of compilers
Recall the structure of compilers
Source program
Lexical analysis
Intermediate code generation
Optimisation
Syntax analysis
Semantic analysis, e.g. type checking
Code generation
Translated program
Semantic analysis
One of the jobs of the compiler front-end is to reject ill-formed inputs. This is usually done in three stages.
Lexicalanalysis:detectsinputswithillegallexicalsyntax.
Parsing:detectsinputswithill-formedsyntax(no
parse-tree).
Semanticanalysis:catch’all’remainingerrors,
e.g. variable used before declared. ’Last line of defense’.
Semantic analysis
One of the jobs of the compiler front-end is to reject ill-formed inputs. This is usually done in three stages.
Lexicalanalysis:detectsinputswithillegallexicalsyntax.
Parsing:detectsinputswithill-formedsyntax(no
parse-tree).
Semanticanalysis:catch’all’remainingerrors,
e.g. variable used before declared. ’Last line of defense’.
Why do we need a separate semantic analysis phase at all?
Semantic analysis
One of the jobs of the compiler front-end is to reject ill-formed inputs. This is usually done in three stages.
Lexicalanalysis:detectsinputswithillegallexicalsyntax.
Parsing:detectsinputswithill-formedsyntax(no
parse-tree).
Semanticanalysis:catch’all’remainingerrors,
e.g. variable used before declared. ’Last line of defense’.
Why do we need a separate semantic analysis phase at all?
Answer: Some language constraints are not expressible using CFGs (too complicated).
The situation is similar to the split between lexing and parsing: not everything about syntactic well-formedness can be expressed by regular expressions & FSAs, so we use CFGs later.
What kinds of checks does semantic analysis do?
Some examples. The precise requirements depend on the language.
What kinds of checks does semantic analysis do?
Some examples. The precise requirements depend on the language.
Allidentifiersdeclaredbeforeuse?
What kinds of checks does semantic analysis do?
Some examples. The precise requirements depend on the language.
Allidentifiersdeclaredbeforeuse? Arealltypescorrectlydeclared?
What kinds of checks does semantic analysis do?
Some examples. The precise requirements depend on the language.
Allidentifiersdeclaredbeforeuse?
Arealltypescorrectlydeclared?
Dotheinheritancerelationshipsmakesense?
What kinds of checks does semantic analysis do?
Some examples. The precise requirements depend on the language.
Allidentifiersdeclaredbeforeuse?
Arealltypescorrectlydeclared?
Dotheinheritancerelationshipsmakesense? Areclassesandvariablesdefinedonlyonce?
What kinds of checks does semantic analysis do?
Some examples. The precise requirements depend on the language.
Allidentifiersdeclaredbeforeuse?
Arealltypescorrectlydeclared?
Dotheinheritancerelationshipsmakesense? Areclassesandvariablesdefinedonlyonce? Methodsdefinedonlyonce?
What kinds of checks does semantic analysis do?
Some examples. The precise requirements depend on the language.
Allidentifiersdeclaredbeforeuse?
Arealltypescorrectlydeclared?
Dotheinheritancerelationshipsmakesense?
Areclassesandvariablesdefinedonlyonce?
Methodsdefinedonlyonce?
Areprivatemethodsandmembersonlyusedwithinthe defining class?
What kinds of checks does semantic analysis do?
Some examples. The precise requirements depend on the language.
Allidentifiersdeclaredbeforeuse?
Arealltypescorrectlydeclared?
Dotheinheritancerelationshipsmakesense?
Areclassesandvariablesdefinedonlyonce?
Methodsdefinedonlyonce?
Areprivatemethodsandmembersonlyusedwithinthe defining class?
Stupidoperationslike3∗trueor”hello”/7?.
Caveat
When we say that semantic analysis catches ’all’ remaining errors, that does not include programmer errors. It means catching errors that violate the well-formedness constraints that the language imposes.
Naturally, those constraints are chosen by the language designers with a view towards efficient checkability by the compiler.
Rice’s Theorem and undecidability
Computer science suffers from a strong restriction: Rice’s theorem. No interesting property of programs is decidable.
Rice’s Theorem and undecidability
Computer science suffers from a strong restriction: Rice’s theorem. No interesting property of programs is decidable.
That means for essentially any property that program mighty have (e.g. does not crash, terminates, loops forever, uses more than 1782349 Bytes of memory) there cannot be a perfect checker, ie. a program that determines with perfect accuracy whether the chosen property holds of any input program or not.
Informally, one may summarise Rice’s theorem as follows: to work out with 100% certainty what programs do, you have to run them (with the possibility of non-termination), there is no shortcut.
Rice’s Theorem and undecidability
Computer science suffers from a strong restriction: Rice’s theorem. No interesting property of programs is decidable.
That means for essentially any property that program mighty have (e.g. does not crash, terminates, loops forever, uses more than 1782349 Bytes of memory) there cannot be a perfect checker, ie. a program that determines with perfect accuracy whether the chosen property holds of any input program or not.
Informally, one may summarise Rice’s theorem as follows: to work out with 100% certainty what programs do, you have to run them (with the possibility of non-termination), there is no shortcut.
All we can do is approximate a property of interest, and our approximation will have either false positives or false negatives (or both).
Rice’s Theorem and undecidability
So our semantic analysis must also approximate. This is always done is a conservative way: every program the semantic analysis accepts is guaranteed to have to properties that the semantic analysis check for, but the semantic analysis will reject a lot of safe programs (having the required property).
Rice’s Theorem and undecidability
So our semantic analysis must also approximate. This is always done is a conservative way: every program the semantic analysis accepts is guaranteed to have to properties that the semantic analysis check for, but the semantic analysis will reject a lot of safe programs (having the required property).
Example: Our semantic analysis guarantees that programs never try to add an integer and a string like 3 * “hello”. In this sense, the following program is safe (why?):
Rice’s Theorem and undecidability
So our semantic analysis must also approximate. This is always done is a conservative way: every program the semantic analysis accepts is guaranteed to have to properties that the semantic analysis check for, but the semantic analysis will reject a lot of safe programs (having the required property).
Example: Our semantic analysis guarantees that programs never try to add an integer and a string like 3 * “hello”. In this sense, the following program is safe (why?):
if ( x*x == -1 ) {
y = 3 * “hello” }
else
y = 3 * 43110 }
Yet any typing system in practical use will reject it.
Plan
For lexing and parsing we proceeded in two steps.
1. Specify constraint (RE for lexing, CFGs for parsing)
2. Invented algorithm to check constraints given in (1): FSA to decide REs, (top-down) parser to decide CFGs.
Plan
For lexing and parsing we proceeded in two steps.
1. Specify constraint (RE for lexing, CFGs for parsing)
2. Invented algorithm to check constraints given in (1): FSA to decide REs, (top-down) parser to decide CFGs.
For semantic analysis, we also proceed in two steps (specifying constraints and then giving an algorithm for checking the constraints), but we will be vague w.r.t. the former, and focus on the latter.
This is because the constraints we are going to check are somewhat hard to make precise other than by way of giving a program that checks for them.
We explain informally the bad things the semantic analysis will rule out, and then give an algorithm that does that.
Semantic analysis by (post-order) traversal of the AST
Just like code generation, semantic analysis will happen by walking the AST:
Semantic analysis by (post-order) traversal of the AST
Just like code generation, semantic analysis will happen by walking the AST:
AnalysesanodeNintheAST.
Semantic analysis by (post-order) traversal of the AST
Just like code generation, semantic analysis will happen by walking the AST:
AnalysesanodeNintheAST.
Recursively process the children of N.
Semantic analysis by (post-order) traversal of the AST
Just like code generation, semantic analysis will happen by walking the AST:
AnalysesanodeNintheAST.
Recursively process the children of N.
Aftertherecursionterminates,finishtheanalysisofNwith the information obtained from the children.
Examples of the board.
Semantic analysis by (post-order) traversal of the AST
Sometimes we will have to walk the AST multiple times (e.g. some kinds of type-inference).
Semantic analysis by (post-order) traversal of the AST
Sometimes we will have to walk the AST multiple times (e.g. some kinds of type-inference).
We want to be sure that every identifier is declared. With non-recursive definitions this is no problem, everything is declared before use, which is easy to check with one recursive descent of the AST.
Semantic analysis by (post-order) traversal of the AST
Sometimes we will have to walk the AST multiple times (e.g. some kinds of type-inference).
We want to be sure that every identifier is declared. With non-recursive definitions this is no problem, everything is declared before use, which is easy to check with one recursive descent of the AST.
With recursive definitions sometimes identifiers are declared after use. What do we do?
Semantic analysis by (post-order) traversal of the AST
Sometimes we will have to walk the AST multiple times (e.g. some kinds of type-inference). With recursive definitions sometimes identifiers are declared after use. What do we do?
abstract class A () { Bb}
abstract class B () { Aa}
Semantic analysis by (post-order) traversal of the AST
Sometimes we will have to walk the AST multiple times (e.g. some kinds of type-inference). With recursive definitions sometimes identifiers are declared after use. What do we do?
abstract class A () { Bb}
abstract class B () { Aa}
Answer:
WalktheASToncecollectingclassdefinitions.
WalktheASTasecondtimecheckingifeachuseofa class (type) identifier has a definition somewhere.
Alternatively,propagateinformationaboutneeded definitions up in a clever way and check if everything is OK.
Types
Types and typing systems are the key tool for semantic analysis.
What are types?
Types
Types and typing systems are the key tool for semantic analysis.
What are types?
Types are a rough classification of programs that rule out certain errors, e.g. 3 * “hello”!
Types
Types and typing systems are the key tool for semantic analysis.
What are types?
Types are a rough classification of programs that rule out certain errors, e.g. 3 * “hello”!
With each type t we have operators that we can apply to values of type t.
For example to values of type string, we can apply operations such as println, but we cannot multiply two strings.
Types
Types are weak (= not saying anything complicated) specifications of programs.
In languages such as Java, programs are annotated with types. This can be seen as a weak form of two-version programming: the programmer specifies twice what the program should do, once by the actual code, and a second time through the types.
By saying something twice, but in somewhat different languages (Java vs types) the probability that we make the same mistake in both expressions is lower than if we state our intention only once.
Types
Note that types can only prevent very stupid mistakes like “hello” * “world”.
They cannot (usually) prevent more complicated problems, like out-of-bounds indexing of arrays.
int [] a = new int [ 10 ]
a [ 20 ] = 3
Type-checking vs type-inference
An important distinction is that between type-checking and type-inference.
Type-checking vs type-inference
An important distinction is that between type-checking and type-inference.
Intype-checking(e.g.Java)weverifythatthe programmer-written type-annotations are consistent with the program code. E.g.
def f ( x : String ) : Int = {
if ( x == “Moon” )
true else
false }
is easy to see as inconsistent.
Type-checking vs type-inference
An important distinction is that between type-checking and type-inference.
Intype-checking(e.g.Java)weverifythatthe programmer-written type-annotations are consistent with the program code. E.g.
def f ( x : String ) : Int = {
if ( x == “Moon” )
true else
false }
is easy to see as inconsistent.
Intype-inference(e.g.Haskell,Ocaml,Scala,Go,Rust,…) we analyse the program code to see if it is possible to find types that we could assign to a program to make it type-check. (So we let the computer do most of the work.)
Type-checking vs type-inference
An important distinction is that between type-checking and type-inference.
def f ( y : ??? ) : ??? = {
if ( x == y )
y else
x+1 }
What types could you give to x,y?
Type-checking vs type-inference
An important distinction is that between type-checking and type-inference.
def f ( x : ??? ) : ??? = {
if ( x == y )
y else
x+1 }
What types could you give to x,y?
Clearly x has type integer, y has type integer.
Type-checking vs type-inference
That was easy. What about this program
def f ( x : ??? ) : ??? = {
while ( x.g ( y ) ) {
y = y+1 }
if ( y > z )
z = z+1 else
println ( “hello” ) }
What types could you give to x,y,z,g and f?
Type-checking vs type-inference
That was easy. What about this program
def f ( x : ??? ) : ??? = {
while ( x.g ( y ) ) {
y = y+1 }
if ( y > z )
z = z+1 else
println ( “hello” ) }
What types could you give to x,y,z,g and f?
y andz areintegers,x mustbeaclassAsuchthatAhasa method (should it be public or private?) g which takes and integer and returns a boolean.
Finally, f returns nothing, so should be of type Unit, or void (which is the same thing in many contexts).
Polymorphism
What about this program?
def f ( x : ??? ) : ??? = { return x }
Polymorphism
What about this program?
def f ( x : ??? ) : ??? = { return x }
We can use any type t, as long as the input and output both have t:
def f ( x : t ) : t = { return x }
This is called (parametric) polymorphism.
Polymorphism
But which concrete type t should we use for
def f ( x : t ) : t = { return x }
We want to be able to do f(17) and f(true) in the same program. Any concrete choice of t would prevent this. In order to deal with this we assign a type variable which can be instantiated with any type.
Advanced question (not exam relevant)
What types should/could we infer for e.g.
def f ( x : ??? ) : ??? = {
if ( x > 3.1415 ) then
true else
throw new Exception ( “…!” ) }
Dynamic vs static typing
An important distinction is that between dynamically typed languages, e.g. Python, Javascript, and statically typed languages such as Java, C, C++. The difference is when type-checking happens.
Indynamicallytypedlanguages,type-checkinghappensat run-time, at the last possible moment, e.g. just before we execute x+2 we check that x contains an integer.
Instaticallytypedlanguages,type-checkinghappensat compile-time
Dynamic vs static typing
Dynamic vs static typing
Disadvantagesfordynamicallytypedlanguages:
Slow, because of constant type-checking at run-time.
Errors only caught at run-time, when it is too late.
Dynamic vs static typing
Disadvantagesfordynamicallytypedlanguages:
Slow, because of constant type-checking at run-time.
Errors only caught at run-time, when it is too late.
Keyadvantagefordynamicallytypedlanguages:more flexibility. There are many programs that are safe, but cannot be typed at compile time, e.g.
x = 100
x = x*x
println ( x+1 )
x = “hello”
x = contact( x, “world” )
println ( x )
Moreover, vis-a-vis languages like Java, we don’t have to write type annotations. Less work …
Dynamic vs static typing
The compilation of dynamically typed languages (and how to make them fast) using JIT compilers is very interesting and a hot research topic.
Type inference for advanced programming languages is also very interesting and a hot research topic.
However, from now on we will only look at statically typed languages with type-checking (except maybe later in the advanced topics).
Type-checking for a simple imperative language
Let us look at a very simple programming language. Here’s it’s CFG:
P ::= |
| | |
x|0|1|…|true|false|P=Q|P P .append(P )| P .get(P ) whileP{Q}|ifPthenQelseQ′ x:=P|letx:α=PinQ|P;Q
Here x ranges over variables and α over types (see below).
Type-checking for a simple imperative language
Now we need to define types. Here they are.
α ::= Unit | Int | Bool | List<α>
The type Unit is the type of statements (like void in Java). Clearly Int is the type of integers, and Bool that of booleans. Finally List<α> is the type of lists storing things of type α.
Type-checking for a simple imperative language
Now we need to define types. Here they are.
α ::= Unit | Int | Bool | List<α>
The type Unit is the type of statements (like void in Java). Clearly Int is the type of integers, and Bool that of booleans. Finally List<α> is the type of lists storing things of type α.
The Java program
int x = 3;
x = x+1;
in our language is
let x : Int = 3 in x := x + 1
Type-checking for a simple imperative language
Likewise, if our mini language had objects and classes and methods, the Java program
List
a.append(10);
would translate to
let a : List
Type-checking for a simple imperative language
Let’s look at some examples.
Type-checking for a simple imperative language
Let’s look at some examples.
Theprogram3hastypeInt
Type-checking for a simple imperative language
Let’s look at some examples.
Theprogram3hastypeInt
TheprogramtruehastypeBool
Type-checking for a simple imperative language
Let’s look at some examples.
Theprogram3hastypeInt
TheprogramtruehastypeBool Whatabout3+4?
Type-checking for a simple imperative language
Let’s look at some examples.
Theprogram3hastypeInt
TheprogramtruehastypeBool Whatabout3+4?
Whatabout3+x?
Type-checking for a simple imperative language
The type of programs like 3 + x depends on our assumptions about the type of x: if we assume that x has type Int then
3 + x has type Int.
Type-checking for a simple imperative language
The type of programs like 3 + x depends on our assumptions about the type of x: if we assume that x has type Int then
3 + x has type Int.
If we assume that x has type Bool then 3 + x has no type!
Type-checking for a simple imperative language
The type of programs like 3 + x depends on our assumptions about the type of x: if we assume that x has type Int then
3 + x has type Int.
If we assume that x has type Bool then 3 + x has no type!
An executable program has no free variables, unlike 3 + x, since all variables have to be declared before use.
Type-checking for a simple imperative language
The type of programs like 3 + x depends on our assumptions about the type of x: if we assume that x has type Int then
3 + x has type Int.
If we assume that x has type Bool then 3 + x has no type! An executable program has no free variables, unlike 3 + x,
since all variables have to be declared before use.
So why do we have to worry about programs with free variables at all when the programs we run don’t have free variables?
Type-checking for a simple imperative language
We want to type-check in a compositional way, that means, determining types of a program from the types of its components.
The key construct here is
let x : α = P in Q.
To type-check Q we have to add the assumption that x stores something of type α to the assumptions we use to type-check thewholephraseletx :α=P inQ.
Type-checking for a simple imperative language
We want to type-check in a compositional way, that means, determining types of a program from the types of its components.
The key construct here is
let x : α = P in Q.
To type-check Q we have to add the assumption that x stores something of type α to the assumptions we use to type-check thewholephraseletx :α=P inQ.
Assume y stores something of type Int. Under this assumption, the program
let x : Int = y + 1 in x := x + 2 is well-typed and has type Unit.
Type-checking for a simple imperative language
In other words, we type-check using assumptions about the types of free variables. We can split this insight into parts (divide-and-conquer):
Type-checking for a simple imperative language
In other words, we type-check using assumptions about the types of free variables. We can split this insight into parts (divide-and-conquer):
Weneedtostore,andbeabletoaccesstheassumptions quickly.
Type-checking for a simple imperative language
In other words, we type-check using assumptions about the types of free variables. We can split this insight into parts (divide-and-conquer):
Weneedtostore,andbeabletoaccesstheassumptions quickly.
Weneedtobeabletogettheassumptions(thisistheαin let x : α = P in Q).
Type-checking for a simple imperative language
In other words, we type-check using assumptions about the types of free variables. We can split this insight into parts (divide-and-conquer):
Weneedtostore,andbeabletoaccesstheassumptions quickly.
Weneedtobeabletogettheassumptions(thisistheαin let x : α = P in Q).
We need a suitable data structure, an ’assumption store’.
Type-checking for a simple imperative language
To describe the type-checking algorithm concisely, let’s introduce some notation.
We write Γ ⊢ P : α, meaning that program P has type α under the assumptions as given by Γ. This Γ is our ’assumption store’. The ’assumption store’ is usually called symbol table or typing environment. You can think of Γ as a function: you pass a variable name to Γ and get either an error (if Γ has no assumptions on that variable) or a type α (if Γ stores the assumption that the variable has type α).
Type-checking for a simple imperative language
To describe the type-checking algorithm concisely, let’s introduce some notation.
We write Γ ⊢ P : α, meaning that program P has type α under the assumptions as given by Γ. This Γ is our ’assumption store’. The ’assumption store’ is usually called symbol table or typing environment. You can think of Γ as a function: you pass a variable name to Γ and get either an error (if Γ has no assumptions on that variable) or a type α (if Γ stores the assumption that the variable has type α).
We write Γ(x) = α to indicate that Γ stores the assumption that x has type α.
Type-checking for a simple imperative language
To describe the type-checking algorithm concisely, let’s introduce some notation.
We write Γ ⊢ P : α, meaning that program P has type α under the assumptions as given by Γ. This Γ is our ’assumption store’. The ’assumption store’ is usually called symbol table or typing environment. You can think of Γ as a function: you pass a variable name to Γ and get either an error (if Γ has no assumptions on that variable) or a type α (if Γ stores the assumption that the variable has type α).
We write Γ(x) = α to indicate that Γ stores the assumption that x has type α.
We sometimes want to add and assumption x : α to the assumptions already in Γ. We write Γ, x : α in this case, assuming that Γ does not already store assumptions about x.
Type-checking for a simple imperative language
To describe the type-checking algorithm concisely, let’s introduce some notation.
We write Γ ⊢ P : α, meaning that program P has type α under the assumptions as given by Γ. This Γ is our ’assumption store’. The ’assumption store’ is usually called symbol table or typing environment. You can think of Γ as a function: you pass a variable name to Γ and get either an error (if Γ has no assumptions on that variable) or a type α (if Γ stores the assumption that the variable has type α).
We write Γ(x) = α to indicate that Γ stores the assumption that x has type α.
We sometimes want to add and assumption x : α to the assumptions already in Γ. We write Γ, x : α in this case, assuming that Γ does not already store assumptions about x.
We’ll translate this into code a bit later.
Type-checking for a simple imperative language
Γ⊢true:Bool
Type-checking for a simple imperative language
Γ⊢true:Bool Γ⊢false:Bool
Type-checking for a simple imperative language
Γ⊢true:Bool Γ⊢false:Bool Γ⊢7:Int
These type can be derived without assumptions on the types of variables, and other program parts.
Type-checking for a simple imperative language
IfΓ(x)=αthenΓ⊢x :α
Type-checking for a simple imperative language
IfΓ(x)=αthenΓ⊢x :α
IfΓ⊢P:IntandalsoΓ⊢Q:IntthenΓ⊢P+Q:Int.
Type-checking for a simple imperative language
IfΓ(x)=αthenΓ⊢x :α
IfΓ⊢P:IntandalsoΓ⊢Q:IntthenΓ⊢P+Q:Int.
IfΓ⊢P:IntandalsoΓ⊢Q:Intthen Γ ⊢ P == Q : Bool.
Type-checking for a simple imperative language
IfΓ(x)=αthenΓ⊢x :α
IfΓ⊢P:IntandalsoΓ⊢Q:IntthenΓ⊢P+Q:Int.
IfΓ⊢P:IntandalsoΓ⊢Q:Intthen Γ ⊢ P == Q : Bool.
Writing out “if” etc explicitly gets unwieldy quickly, so let’s introduce a new form of notation.
Type-checking for a simple imperative language
We write
for
Γ⊢P:Int Γ⊢Q:Int Γ ⊢ P + Q : Int
Type-checking for a simple imperative language
We write
Γ⊢P:Int Γ⊢Q:Int Γ ⊢ P + Q : Int
for If Γ ⊢ P : Int and also Γ ⊢ Q : Int then Γ ⊢ P + Q : Int.
Type-checking for a simple imperative language
Γ⊢true:Bool Γ⊢false:Bool Γ⊢7:Int
Γ(x) = α
Γ ⊢ x : α Γ ⊢ P + Q : Int
Γ⊢P:Int Γ⊢Q:Int
Γ⊢P:Int Γ⊢Q:Int Γ ⊢ P == Q : Bool
Type-checking for a simple imperative language
Γ⊢P:Unit Γ⊢Q:Unit Γ ⊢ P;Q : Unit
Type-checking for a simple imperative language
Γ⊢P:Unit Γ⊢Q:Unit Γ ⊢ P;Q : Unit
Γ⊢C:Bool Γ⊢Q:Unit Γ⊢whileC {Q}:Unit
Type-checking for a simple imperative language
Γ⊢P:Unit Γ⊢Q:Unit Γ ⊢ P;Q : Unit
Γ⊢C:Bool Γ⊢Q:Unit Γ⊢whileC {Q}:Unit
Γ⊢C:Bool Γ⊢Q:Unit Γ⊢R:Unit Γ ⊢ if C then Q else R : Unit
Type-checking for a simple imperative language
Γ⊢P:Unit Γ⊢Q:Unit Γ ⊢ P;Q : Unit
Γ⊢C:Bool Γ⊢Q:Unit Γ⊢whileC {Q}:Unit
Γ⊢C:Bool Γ⊢Q:Unit Γ⊢R:Unit Γ ⊢ if C then Q else R : Unit
Γ⊢P:Int Γ⊢Q:Int inotdefinΓ Γ,i:Int⊢R:Unit Γ⊢fori = PtoQ{R}:Unit
Recall that Γ, i : Int means that we add the assumption that i has type Int to our environment. Show test.java.
Type-checking for a simple imperative language
Γ⊢x:α Γ⊢P:α Γ⊢x :=P :Unit
Type-checking for a simple imperative language
Γ⊢x:α Γ⊢P:α Γ⊢x :=P :Unit
Γ⊢P:α xnotdefinedinΓ Γ,x:α⊢Q:β Γ ⊢ let x : α = P in Q : β
Type-checking for a simple imperative language
Γ⊢x:α Γ⊢P:α Γ⊢x :=P :Unit
Γ⊢P:α xnotdefinedinΓ Γ,x:α⊢Q:β Γ ⊢ let x : α = P in Q : β
Γ ⊢ new List<α> : List<α>
Type-checking for a simple imperative language
Γ⊢x:α Γ⊢P:α Γ⊢x :=P :Unit
Γ⊢P:α xnotdefinedinΓ Γ,x:α⊢Q:β Γ ⊢ let x : α = P in Q : β
Γ ⊢ new List<α> : List<α>
Γ⊢P:List<α> Γ⊢Q:α Γ ⊢ P.append(Q) : List<α>
Type-checking for a simple imperative language
Γ⊢x:α Γ⊢P:α Γ⊢x :=P :Unit
Γ⊢P:α xnotdefinedinΓ Γ,x:α⊢Q:β Γ ⊢ let x : α = P in Q : β
Γ ⊢ new List<α> : List<α> Γ⊢P:List<α> Γ⊢Q:α
Γ ⊢ P.append(Q) : List<α>
Γ⊢P:List<α> Γ⊢Q:Int Γ ⊢ P.get(Q) : α
Let’s do type-checking in (pseudo-)code!
AST in pseudo-code
interface Prog
class Ident ( s : String ) implements Prog
class IntLiteral ( i : Int ) implements Prog
class BoolLiteral ( b : Boolean ) implements Prog
class Equal ( lhs : Prog, rhs : Prog ) implements Prog
class Plus ( lhs : Prog, rhs : Prog ) implements Prog
class For ( i : String,
from : Prog,
to : Prog, body : Prog ) implements Prog
class While ( cond : Prog, body : Prog ) implements Pr
class If ( cond : Prog, sThen : Prog, sElse : Prog ) i
class Assign ( i : String, e : Prog ) implements Prog
class Let ( x : String, t : Type, p : Prog, q : Prog )
class NewList ( t: Type ) implements Prog
class Append ( list: Prog, elem: Prog ) implements Pro
class Get ( list: Prog, index: Prog ) implements Prog
AST in pseudo-code
interface Type
class Int_T () implements Type
class Bool_T () implements Type
class Unit_T () implements Type
class List_T ( ty : Type ) implements Type
Symbol-tables in (pseudo-) code
A key data structure in semantic analysis is the symbol table, or environment, which we wrote Γ above.
Symbol-tables in (pseudo-) code
A key data structure in semantic analysis is the symbol table, or environment, which we wrote Γ above.
The symbol table maps identifiers (names, variables) to their types (here we think of a class signature as the type of a class).
Symbol-tables in (pseudo-) code
A key data structure in semantic analysis is the symbol table, or environment, which we wrote Γ above.
The symbol table maps identifiers (names, variables) to their types (here we think of a class signature as the type of a class).
We use the symbol table to track the following.
Iseveryusedvariabledefined(exactlyonce)?
Iseveryvariableusedaccordingtoitstype?E.g.ifxis declared to be a string, we should not try x + 3.
Symbol-tables in (pseudo-) code
HashMap
new HashMap
Symbol-tables in (pseudo-) code
HashMap
new HashMap
// Returns the type associated with x.
env.get(x)
Symbol-tables in (pseudo-) code
HashMap
new HashMap
// Returns the type associated with x.
env.get(x)
// Adds the association of x with type t.
// Removes any previous association for x.
env.put(x, t)
Symbol-tables in (pseudo-) code
HashMap
new HashMap
// Returns the type associated with x.
env.get(x)
// Adds the association of x with type t.
// Removes any previous association for x.
env.put(x, t)
// Returns true if there exists an
// association for x.
env.containsKey(x)
Symbol-tables in (pseudo-) code
HashMap
new HashMap
// Returns the type associated with x.
env.get(x)
// Adds the association of x with type t.
// Removes any previous association for x.
env.put(x, t)
// Returns true if there exists an
// association for x.
env.containsKey(x)
Note that: get should throw an exception if x is not in the symbol table.
Symbol-tables in (pseudo-) code
HashMap
new HashMap
// Returns the type associated with x.
env.get(x)
// Adds the association of x with type t.
// Removes any previous association for x.
env.put(x, t)
// Returns true if there exists an
// association for x.
env.containsKey(x)
Note that: get should throw an exception if x is not in the symbol table. Important: get always returns the most recently added binding.
Type-checking for a simple imperative language
env.put ( x, Int_T )
println ( env.get ( x ) // prints Int_T
env.put ( x, Bool_T )
println ( env.get ( x ) // prints Bool_T
Type-checking for a simple imperative language
env.put ( x, Int_T )
println ( env.get ( x ) // prints Int_T
env.put ( x, Bool_T )
println ( env.get ( x ) // prints Bool_T
Alternatively we could throw an exception when adding information about a variable more than once. Various different policies are possible, depending on the details of the language to be typed.
Type-checking for a simple imperative language
env.put ( x, Int_T )
println ( env.get ( x ) // prints Int_T
env.put ( x, Bool_T )
println ( env.get ( x ) // prints Bool_T
Alternatively we could throw an exception when adding information about a variable more than once. Various different policies are possible, depending on the details of the language to be typed.
If we don’t throw an exception, we can define variables more than once in our language. If we do, we have a language where we can only define variables once.
Type-checking for a simple imperative language
We want to write the following method:
Type check ( HashMap
…
}
It returns the type of p under the assumptions (on p’s free variables) in env if p is typeable under these assumptions, otherwise an error should be returned.
Type-checking for a simple imperative language
Translation of
is simple:
Γ⊢true:Bool Γ⊢7:Int
Type check ( HashMap
if p is of form
BoolLiteral ( b ) then return Bool_T
IntLiteral ( n ) then return Int_T
…
}
Type-checking for a simple imperative language
We want to translate
Γ⊢P:Int Γ⊢Q:Int Γ ⊢ P == Q : Bool
to code.
Type check ( HashMap
if p is of form
…
Equal ( l, r ) then
… }
if (check ( env, l ) != Int_T or
check ( env, r ) != Int_T )
throw Exception ( “typing error” )
else return Bool_T
Type-checking for a simple imperative language
Translation of is as follows:
Γ(x) = α Γ⊢x:α
Type check ( HashMap
if p of form
…
Ident ( x ) then return env.get ( x )
…
}
Type-checking for a simple imperative language
Translation of
Γ⊢P:Unit Γ⊢Q:Unit Γ ⊢ P;Q : Unit
is as follows:
Type check ( HashMap
if p of form
…
Seq ( l, r ) then {
if(check(env,l)!=Unit_T or check ( env, r ) != Unit_T )
throw Exception ( “type error: …” )
return Unit_T
else
}
… }
Type-checking for a simple imperative language
Translation of
Γ⊢P:Bool Γ⊢Q:Unit Γ⊢whileP {Q}:Unit
is as follows:
Type check ( HashMap
if p is of form
…
While ( cond, body ) then {
if ( check ( env, cond ) != Bool_T or
check ( env, body ) != Unit_T )
throw Exception ( “type error: …” )
return Unit_T
else
}
… }
Type-checking for a simple imperative language
Translation of
Γ⊢P:Bool Γ⊢Q:Unit Γ⊢R:Unit Γ ⊢ if P then Q else R : Unit
is as follows:
Type check ( HashMap
if p of form
…
If ( cond, thenBody, elseBody ) then {
if ( check ( env, cond ) != Bool_T or
check ( env, thenBody ) != Unit_T or
check ( env, elseBody ) != Unit_T )
throw Exception ( “type error: …” )
return Unit_T
else
}
… }
Type-checking for a simple imperative language
Γ⊢P:Int Γ⊢Q:Int inotdefinedinΓ Γ,i:Int⊢R:Unit Γ⊢fori = PtoQ{R}:Unit
translates as follows:
Type check ( HashMap
if p is of form
…
For ( i, from, to, body ) then {
if ( env.containsKey(i) ) throw Exception(…)
if ( check ( env, from ) != Int_T or
check ( env, to ) != Int_T )
throw Exception ( “…” )
env.put ( i, Int_T )
if(check(env,body)!=Unit_T ) throw Exception ( “…” )
else return Unit_T
}
… }
Type-checking for a simple imperative language
Translation of is as follows:
Γ⊢P:α Γ⊢x:α Γ⊢x :=P :Unit
Type check ( HashMap
if p is of form
…
Assign ( x, p ) then
… }
if ( check ( env, p ) != env.get ( x ) ) then
throw Exception ( “…” )
else
return Unit_T
Type-checking for a simple imperative language
Translation of
Γ⊢P:α xnotdefinedinΓ Γ,x:α⊢Q:β
Γ ⊢ let x : α = P in Q : β
is as follows:
Type check ( HashMap
if p of form
…
Let ( x, t, p, q ) then {
if ( env.containsKey(x) ) throw …
if ( check ( env, p ) != t ) throw …
env.put ( x, t )
return check ( env, q )
}
… }
Type-checking for a simple imperative language
Translation of
is as follows:
Γ ⊢ new List<α> : List<α>
Type check ( HashMap
if p of form
…
NewList ( t ) then {
return List_T( t )
}
… }
Type-checking for a simple imperative language
Translation of
is as follows:
Γ⊢P:List<α> Γ⊢Q:α Γ ⊢ P.append(Q) : List<α>
Type check ( HashMap
if p of form
…
Append ( p, q ) then {
Type t = check ( env, q )
if ( check ( env, p ) != List_T( t ) ) throw …
return List_T( t )
}
… }
Type-checking for a simple imperative language
Translation of
is as follows:
Γ⊢P:List<α> Γ⊢Q:Int Γ ⊢ P.get(Q) : α
Type check ( HashMap
if p of form
…
Get ( p, q ) then {
if ( check ( env, q ) != Int_T ) throw … Type t = check ( env, p )
if(t!= List_T(_))throw…
return ((List_T) t).ty
}
… }
A lot more could be said about type-checking
Typingobjectsandclasses,subtyping(structuralvs nominal subtyping)
Typingmethods
Inheritance
Typesthatcatchmorenon-trivalbugs,e.g.specifying“this is a sorting function” as types.
Fastertype-checkingalgorithms
Type-inferencealgorithms
…
Conclusion
Types are weak specifications of programs.
Conclusion
Types are weak specifications of programs. We can check them by walking the AST.
Conclusion
Types are weak specifications of programs. We can check them by walking the AST.
The key data-structure is the symbol table which holds assumptions about the types of free variables.