#I
CMPSC 461: Programming Language Concepts
Instructor: Danfeng Zhang W369 Westgate Building
TAs and LAs Teaching Assistants:
• Zeyu Ding (Head TA, dxd437@psu.edu) • Ashish Kumar (azk640@psu.edu)
• Madhav Deshpande (mzd574@psu.edu) • Namitha Nambiar (nmn5265@psu.edu)
Learning Assistants:
• Liang Leo (hql5432@psu.edu)
• Jianyu He (jvh6056@psu.edu)
Office hours will be announced before next week
Course Mode: Remote Synchronize
https://psu.zoom.us/j/91404124375?pwd=WWRDSG 16SXR1VFJUMy9qTHE3dmtmQT09
Attendance Policy
• Regular attendance is strongly recommended
• However, to accommodate special needs:
• All lectures will be recorded and and uploaded to Canvas
• Physical attendance will not be used as part of course assessment
Important Learning Resources
• Course Homepage http://www.cse.psu.edu/~dbz5017/cmpsc461fa20
Tentative Course schedule and reading materials
Important Learning Resources • Canvas
Find slides and recorded lecture here
Important Learning Resources
• Office hours (via Zoom) throughout the week
• To be announced before next week • Piazza
https://www.piazza.com/psu/fall2020/cmpsc461
Use Piazza for ALL course communication. Emails will typically be filtered/ignored.
Important Learning Resources • Required Textbook
Programming Language Pragmatics
3rd Edition
4th Edition
OR
Administration
Assignments
• Written and programming • No final projects
Late policy
• 1 day late: -20%
• 2 days late: -50%
• >2 days late: -100%
Exams: 2 midterms and 1 final (TBD)
Administration
Grading policy
• 30% assignments
• 40% midterm exams • 30% final exam
Letter grades
A: [93-100] A-: [90-93) B+: [87-90) B: [83-87)
B-: [80-83) C+: [77-80) C: [70-77) D: [60-70) F: [0-60)
Honors Option
• Schreyer scholars are welcome to take the honors option of this course
• Extra reading materials
• More challenging assignments
• Send me emails to sign up for the honors option
Academic Integrity
• Any form of sharing, plagiarism, copying, cheating will receive academic sanctions
• Any submitted work should be work of your own
• We will follow the guideline of CSE department:
• 0 for the submission that violates AI, AND
• a reduction of one letter grade for the final course grade
(students with prior AI violation will receive an F as the final course grade)
Questions?
Overview of CMPSC 461
CMPSC 461
Programming Language Concepts Penn State University
Fall 2020
CMPSC 461 IS NOT
• C/C++/Java/Python/Scheme/… programming • Compiler conststruction (471)
• Object-oriented programming (221)
• Data structures
CMPSC 461
Explores fundamental principles and
paradigms of programming languages.
Studies features found in many different languages and examine how they work and how they interact with each other.
Why CMPSC 461? Programming languages facilitate
communication of ideas
• Between people
• Between people and computers
This course explores the fundamental principles of programming languages, to facilitate communication in your future study/career
Why CMPSC 461?
Programming languages is a powerful tool once you master the principles
Example: use type systems to build secure software
Int i =0;
boolean b=true;
b = i; // incompatible types: int
// cannot be converted to boolean
Why CMPSC 461?
Example: use type systems to build secure software
Secret s =0;
Public p=1;
p = s; // type error: cannot assign
// secret value to public variable
Designing new languages for building secure software is an active research area
Ways to Fail
• Ignore assignments
• Never attend lectures & office hours
• Skim through slides
• Memorize assignments and practice questions before exams
• Miss exams (and their make-ups)
Ways to get an A? avoid all of the above!
Imperative Functional OOP
Programming Paradigms
Course Coverage
Imperative Language
Input
Control Arithmetic/logic Output
Program
Variable
Memory
The von Neumann-Eckert Model, 1945
Imperative: program is updates to variables
Functional Language
Output = ! (Input)
Functional: program is a mathematical function
Object-Oriented Programming Object
Imperative Functional OOP
Programming Paradigms
Course Coverage
Syntax Names Types Semantics Functions
Correctness
Memory Management
Programming Principals
Language Implementation
Levels
Focus of this course
Machine independent, e.g., y = x+1
Higher-order language
Compiler Assembler
Processor instructions, e.g., MOV EAX, 1
Assembly language
Machine language
Patterns of bits, e.g., 00000101
Source Code to Target Code
Source Code to Target Code
(1.0+2)+x
Token
( 1.0 + 2 ) + x
+ + id
Num Num 1.0 2
+ only take numbers x is in scope (visible)
Cmpsc461
x
Abstract Syntax Tree
Cmpsc471
Register the Piazza discussion group (https://www.piazza.com/psu/fall2020/cmpsc461)
#2
Functional Programming and Scheme
CMPSC 461
Programming Language Concepts Penn State University
Fall 2020
Functional Language
Output = ! (Input)
Functional: program is a mathematical function
History
What is computation?
• Combinatory logic (1920’s) • Lambda calculus (1930’s) • Turing machine (1930’s) •…
First electronic general-purpose computer • ENIAC (1946)
Mathematical Function
A mathematical function is a mapping of members of one set, called the domain set, to another set, called the range set (image).
Notations:
Domain Range
+: Z,Z ⟼Z isNegtive: Z ⟼ {True, False}
Infix form
Function application
+12 = 3 (same as 1+2=3)
isNegtive 1 = False
The λ-Calculus
Alonzo Church, 1930s
t: xlxx.tl twtr examples X.y XXX XX- Ay. ?
non- examples a.x x.y xty
A pure λ-term is defined inductively as follows:
• Any variable ! is a λ-t6erm#
Parameter function def
• If ” is a λ-term, so is λ!. ” (abstraction) . • If “!, “” are λ-terms, so!is “!”” (application)
landa
tiwtz r
apply function -4 to tz
return value
Analogy in C: Abstraction: int f Application: f(2)
separator
like comma t
.
(int x) {return x+1}
λ-Term Examples Identity function: λ!. !
Application: (λ!. !) % How to parse a λ-term
value
Xxixwy (IX . x) w y
I
1
[apply identity function to parameter!]
ta
t.
t,
6 xx. Huy)
1. λbindingextendstotherightmostpart λ!.l!λ%.%& isparsedas λ!.(!(λ%.(%&)))
–
.k!
2. Applications are left-associative
Xwyw Z 1Huy)w Z
t, ti
2Xw (Yuz)
“! “” “#
is parsed as (“! “”) “#
t,
tz
λ-Term Examples Identity function: λ!. !
Application: (λ!. !) % [apply identity function to parameter !]
How to parse a λ-term
1. λbindingextendstotherightmostpart
λ!.! λ%.% & is parsed as λ!.(! (λ%.(% &))) 2. Applications are left-associative
“! “” “# is parsed as (“! “”) “#
Examples
XX. X (Xy. y) Z 1 AX. lxlxy.ly) Z
” can’t have another one AX. XxHy. ly))z)
t, ti t3 Xw XX.YuXuXZ. Xuy
t3
inside (optional)
2
(t, tz)
1 xuix! !)
Number of Parameters
In the pure λ-calculus, λ only bind one parameter
For convenience, we write
λ!%.” asashorthandforλ!.(λ%.”)
This process of removing parameters is called currying, which we will cover later in this course.
#3
Functional Programming and Scheme
CMPSC 461
Programming Language Concepts Penn State University
Fall 2020
ZOOM Link https://psu.zoom.us/j/91404124375?pwd=WWRDSG 6
p 6
c
Instructor: Professor Danfeng Zhang, W369 Westgate Bldg, 814-8 Teaching Assistants:
Office hours
Zeyu Ding (Head TA, dxd437@psu.edu) Ashish Kumar (azk640@ Madhav Deshpande (mzd574@psu.edu) Namitha Nambiar (nmn52
Learning Assistants: https://psu.zoom.us/j/93599703460?pwd=TXEycTc2
Leo Liang (hql5432@psu.edu) Jianyu He (jvh6056@psu.edu)
VmNRelZrb1pOYTh0YVRVUT09
Office Hours https://psu.zoom.us/j/93599703460?pwd=TXEycT
Mon
Tue
Wed
Thu
Fri
8:00-9:00
Jianyu
Namitha
Madhav
9:00-10:00
Jianyu
Namitha
Madhav
Leo
10:00-11:00
Ashish
11:00-12:00
Ashish
Leo
1:00-3:00
Zeyu
2:30-3:30
Prof. Zhang
Required Textbook:
Programming Language Pragmatics ( 3
edition), Michael Scott
rd
2
The λ-Calculus
A pure λ-term t is defined inductively as follows: • Any variable ! is a λ-term
• If ” is a λ-term, so is λ!. ” (abstraction)
• If “! , “” are λ-terms, so is “! “” (application)
We use !, %, &, … for variables
The definition above defines an infinite set, named t
3
Syntax vs. Semantics
Syntax: the structure/form of lambda terms
Semantics: the meaning of lambda terms • Lambda calculus defines computation
• What computation is defined by ” ?
4
Capture-Avoiding Substitution In λ-calculus, computation is defined by
capture-avoiding substitution
“{%/!} means substitute all free ! in ” with % #
same x?
replace all xintwlthy (λ!.!)!ygylx) : (XX- Huy bound free
Analogy in C: – int x;
…
o
OOO
int f (int x) {return x}
5
Bound vs. Free Variables
In (λ!. “), the variable ! in ! is bound to λ! A variable is free if it is not bound to any λ A variable is bound to the closest λ
Examples
λ”. λ”n. “o is a function that takes a parameter, and returns the identity function (i.e., the inner-most ” is
bound to the second λ) bound
7
(λ”. “) ” applies the identity function to x (i.e., the ” after dot is bound to λ)
to two X
6
More Formally …
In (λ!. “), all free variables ! in ! is bound to λ”
A variable is free if it is not bound to any λ Systematically, we define free variables as follows:
• FV % = {%}
•FV)!)” =FV)! ∪FV()”) • FV -%.) =FV ) − %
Let Var(!) be all variables used in !, the bound variables in ! (written BD(!)) are
BD! =Var! −FV(!)
7
Examples
Systematically, we define free variables as follows:
• FV % = {%}
•FV)!)” =FV)! ∪FV()”)
• FV -%.) =FV ) − %
t, tr free t
free
f
FV((λ!. !) !)£ FV(λy. (λ!. (! %)) !) I2l)
,
. l a # = f V l x x . X D v f V / X 4a
–
sfvccxx.lx.yDXD.ly/:fVllXil-lxDulXr )
sffvlxx.lx.yyvfvlxzll.ly/:0ulxd
:(fvlx, y) – Hulk))- ly
: 1kt
X. is bound
sfgxiluly} –
)
1×3101×231-141
🙁 gylvlxr)) – ly) :{ Xz)
8
!-Reduction (Informal)
G a-y.li
I- 3
. Cz ‘Bs
Ez • .BIJ:
Identity function: λ!. !
is the same as λ%.% and λ&.& etc.
Observation: the name of a parameter is irrelevant in λ-calculus
i:
‘
.
‘ .
Analogy in C:
int f (int x) {return x} Is same as int f (int y) {return y} Is same as int f (int z) {return z}
9
!-Reduction
Observation: the name of a parameter is
irrelevant in λ-calculus
(λ!.! !) =? (λ#.# #) ‘
” ”
(λ!.’! !) =? (λ#.’! #’) ! =? #
—
(λ!. λ!. !) =? (λ#. λ!. #)
—
—
10
!-Reduction (fIormal) λ!.O” = λ%.0″{%/!} when % ∉ FV(“)
•.
where “{%/!} means substitute all free ! in ” with % Fvlxiuxz)
: .lv/XDvfVlk) t.xy.mx/lylx(lλl:!x.y!.ly!y
)ly=tf(Vλ#lx.x#l:l#k).xi3 (λ!. ! !) ≠ (λ#.O! #)
¥
!≠”t (λ!. (λ!. !
) ≠ (λ#. λ!. #)
FVIXXX)
— fvlx) – Ix)
— 1×3-1×3 — 0
D
–
sxylcptxxllylx)) say. (xxx)
when
ytfvlxx.HU
11
#4
Functional Programming and Scheme
CMPSC 461
Programming Language Concepts Penn State University
Fall 2020
Free vs. Bound Variable
•FV# ={#}
•FV’!'” =FV’! ∪FV(‘”) •FV +#.’ =FV ‘ − #
Example: For the following term, connect all bound variables to their definitions with lines
”
tt.4.tp.lt?D
” “t'”” fvlxy.ly ” ))
Htt t.VKxzllxy.ly
!”.”$ !%.%”
tf
xD) Ix)
– –
–
t’
‘- –
fvfxuz. )
fvlx.zlufvlxy.ly xD
Ix) .fr/zj–Frlyxd-ly3:lx..zlvlXd-lxl
.mx,
.
1×142.3
‘- ))- ffvlylufvlxz
ly
)
2
:{
z)
:
– 114,2) : ly,X. ) – ly)
: Hr)
Capture-Avoiding Substitution ‘{./#}: substitute all free # in ‘ with . when . is
not captured in ‘ (i.e., there is no λy in t) captured variable
check
Examples
!{./#} = ”
(λ!. !){./#}= (λ!. !)
FV (XXX)
(! !) {./#} = (” “) (λ.. #){./#} ≠ (λ.. .)
–
w captured =,
x) j
free
–
sfvlx ) — Ix ) –
:O
Ix) Ix )
Hy
–
–
!-Reduction
Replacing all bound variables with their definition
gives the same term
(λ!.!) = (λ”.”) (λ!.! !) = (λ”.” “)
(λ!.!!)≠(λ”.!”) ! ≠” ill
More forOmally:
-λ#. ‘ = λ.. ‘{./#} when . ∉ FV(‘)
y
Subtle point: what if % is captured in &? Use ‘-reduction to
rename the captured % in & ↳ ay et rename xy as Az
4
“-Reduction (Informal) Identity function: λ#. #
P
input
tff
(λ#. #) . = .
T
output
Observation: we can substitute the formal parameter with the true parameter
Analogy in C:
Abstraction: int f (int x) {return x}
O
Application: f(y) evaluates to y
lll
5
“-Reduction
The key reduction rule in λ calculus
(λ!.# )# = # {# /!} D! ” ! ”
body
Capture-avoiding substitution
6
“-ReducItion Example body
ti tz Example 1: (λ#. # #) .
.
(x x) lylx) (p- reduction)
2-Reduction
(λ#. ‘!) ‘” = ‘!{‘”/#}
=
:(y y)
since y is not captured in CK x)
fvlx, xD : fvfx.) u
FV(Xd :{ H) ul Xa)
c- Hnk)
7
“-Reduction Example
tr
2-Reduction
(λ#. ‘!) ‘” = ‘!{‘”/#}
ti
Example2:(λ#. λ..#.).
:p lay. (x g)Dlylx) ← captured variable check here
check y is not captured in the body
/
HI. lxztlylxl b
not captured :X -2. ly z)
failed b/c there is a lxyllzly )
” free :(y z)
ay. Hy) y
Nlt7.(X Z)) =.—
— I x )
9
Useful Rule
We say a term t is closed=if it contains no free variable
Examples: (λ#. #) (λ#. λ.. # . ) –
variable
Rule: for a capture avoiding substitution
‘!{‘”/#}
the subtle captured-variable check is vacuously
true when ‘” is closed ⇒ skip captured variable
no
free
check
11
Another Useful Rule
A
A
.tk .HN.tltilts) –
)
(λ#1 #2 … #4. ‘) &! &” … &# = ‘{‘!/#1} …{‘#/#4} when ‘! ‘” … ‘# are all closed terms
.. Maxi . . .tn
12
Evaluation
An evaluation of a λ term is a sequence
#! = #” = ## = …
where each step is either an α-reduction or a 2-reduction
13
Evaluation Order
No reduction order is specified in classical λ-Calculus
If evaluation terminates, any order gives same result
‘
(λ”. (λ%. “) $)
+ht (λ”. (λ%. “) $) + =(λ”. “)+
=(λ%. +)$
=+ =+
14
¥5
Functional Programming and Scheme
CMPSC 461
Programming Language Concepts Penn State University
Fall 2020
Is the λ-Calculus Turing complete?
2
Encoding: Boolean
Booleans
Shorthand for
λ”. (λ%. “)
TRUE ≜ λ’ (.’ FALSE ≜ λ’ (.(
Encoding of “if”? 0 when / is TRUE Goal: IF / 0 1 = 31 when / is FALSE
IF ≜ λ/01. (/01)
Definition
3
Encoding: Boolean Booleans
TRUE ≜ λ’ (.’ FALSE ≜ λ’ (.(
Encoding of “and”? TRUE when /!, /” are both TRUE
Goal: AND /! /” = 3 FALSE otherwise
AND ≜ λ/! /”. (/! (/” TRUE FALSE) FALSE) Check that AND TRUE FALSE = FALSE (Note 2)
Definition
4
§ = defined as
Is lambda calculus Turing complete?
xyxy ‘f4+f
bt It
Boolean Encoding true !xxy. x
4
t
b
‘
else
s FALSE =
tt
.
Xx y y
x4 Y
AND ¥ Abi bi. lb, Ibe TRUE FALSE)FALSE)
:*:::::i:÷:c.. .. ..” 2?Ab,bi. (b,TRUEbi)
Ex: IF TRUE TRUE FALSE
closed
= (Ab t f. b t f) TRUE TRUE FALSE (by def of If)
=p(b t f){TRUElb){Truett)IFALSEH)
‘
– TRUE TRUE FAIS – f
closed
e- (XX y. x)TRUE FALSE (bydefofTRUE)
=P (x) { TRUE1×31 FALSELY)
‘ TRUE
X n y infix
AND x y
↳
prefix i s a function
1
Ex: If b, (IF bz TRUE FALSE)FALSE
=
.–.
1= bi (be TRUE FALSE) FALSE
3 :b, be FALSE
b2 TRUE FALSE
“””””””
b’ I i either TRUE o r FALSE
“””””””””””
÷! Hille”lxYYI%
-. TRUE
/
if bi Ii FALSE, . . .
= FALSE
E x:
AND TRUE FALSE
= (Xb, bi. (bi (be TRUE FALSE)FALSE) TRUE FALSE
=P (b, (be TRUE FALSE)FALSE) {TRUElb,){FALSElbs
Natural Numbers Ixfz. z
function composition
fog! Xx. flglx))
n- fold composition
I xfz.
f z
Svu D= I
suaI= 2- :
‘-
:
tf z
‘ .
‘
(Tnf: F)
: true lutist that Faist) false closed
:(XX y. x)(FALSE TRUE FALSE) FALSE
=P (x){(FALSE TRUE f-Atsf)1×31FALSELY) ” FALSE TRUE FALSE
(def of TRUE)
closed :(Xx yay) TRUE FALSE
😛 ly) {THEhillFALSELY) = FALSE
(clef of FALSE)
fff z)
I xfz fff If z)) Xfz Tz
f’ f’ f
l- fold
L- fold
–
Hx) flffx, ))
fff
n time
2- af z
.
.
…
H
church # s
…
…
fflx)))
sua b : htt nhaitopowerofrepeatngfonzforntimesIfsvubxn.info
flnwfwz) t
:
.
Platz
.
ffnfz )) In)
.
.
→ natural # f.(f z)
.
= xfzf-(IfZ) .
.
input output:
church #
=(xnifzf-(nfz)! (defsve)
ntl
free = Xfz f-((Xfz fz)f z)
.
=P Af z. f. ( (f z ) Hlf) {Hz))
= Xfz . f. (f z )
:2 ,
.
closed
: Afzfr”z
closed
(def of AND)
n Natural #’s n f it –‘-
…
.
…
-R2
tl
…
Itn
PLUS ? Ani Ne
In .
, Ncc nz)
Ai’t lt)t H-n
Nitti ltli
….
XM M
.
(m sull ni)
for times on 4
:(It)Clt)
…
Clt)th
repeat 4
4
–
hi
pl
us I 3
=:
=3 (see in note 2)
#to
Functional Programming
and Scheme
CMPSC 461
Programming Language Concepts Penn State University
Fall 2020
Church Encoding Natural numbers
Church numerals: n ≜ λ# $. #! $ Note: n is the encoding of number n
0≜ λ#$.#”$=λ#$.$
1≜ λ#$.##$=λ#$.(#$) 2≜ λ#$.#$$=λ#$.(#(#$)) …
%-fold composition
2
Church Encoding Natural numbers
n≜ λ#$.#!$
Encoding of “+ 1”?
Goal:
Definition
SUCC 0 =λ#$.#!&#$ SUCC≜ λ0#$. (#(0#$))
3
Church Encoding Natural numbers
n≜ λ#$.#!$
Encoding of “+”?
Goal:
Definition
PLUS 0# 0$ =λ#$.#!!&!” $ PLUS ≜ λ0# 0$. (0# SUCC 02)
4
Church Encoding: Example Natural numbers
n≜ λ#$.#!$
PLUS ≜ λ0# 0$. (0# SUCC 02) Check that PLUS 1 2 =3 (Note 2)
Definition
5
#7
Functional Programming and Scheme
CMPSC 461
Programming Language Concepts Penn State University
Fall 2020
Church Encoding Natural numbers
n≜ λ#$.#!$
us to
repeat f for n time, on Z
A
allows
p
church # Definition
fffl .kz))) -..
n → Natural H
PLUS ≜ λ*” *#. (*” SUCC *2)
?Ah, Nz. (n, f(Azf T
z )) PT
Nz f’s
(f If .
body
in
-fffffffl n,
2)) nz
…
N , t h e ‘ –
÷I t I 1 – I t . . . t h e
2
Church Encoding: Example
Natural numbers
n≜ λ#$.#!$
MULT ≜
3114€
MULTI4-: I
repeat
XM Nz . (A, t#
(th)
for ni times ( Plus Nz) Q)
b# take church
on
Definition
church
3×4=4-×4×4
3 times
Nat Az , –
Azt
O
– n xnz-
t n, times.
i. …
3
Multi-Argument Functions
SUCC≜λn#$. # *#$
1kt
an. af. xz . fflnfz)) . . – Ah, Mr. H, SVU m)
shorthand for PLUS ≜ λn1 *2. *1 SUCC *2
Currying: every function is treated as one that takes at most one parameter at one time
(12″2#…2!.4)isthesameas12″. 12#. … 12!.4 so
(12 5.6) 6″ 6# is the same as 12. 15.6 6″ 6#
4
Named Functions Usedefinition SUCC≜λn#$. # *#$
int addOne Intn)I
return nil; I
addone (addoneID)
in term (SUCC (SUCC 1 ))?
Syntax: let nam→e = def in body (or, let (na→me def) body)
E.g.,letSUCC=(λn$%. $ ‘$% )in(SUCC(SUCC 1 )) let name = def in body ⇒
is just a shorthand for
(λname. body) def
1)))in (Antz. . . ) b
( Asus b
(sua (sua .
boday → B
5
name
Pure vs. Applied λ-Calculus Pure λ-Calculus: the calculus discussed so far
t::-. ix.t It, thx
Applied λ-Calculus:
• Built-in values and data structures (e.g., 1, 2, 3, true, false, (1 2 3))
• Built-in functions (e.g., +, ∗, /, and, or)
• Named functions • Recursion
All features can be encoded in the pure λ-Calculus!
6
Functional Languages
Lisp
AI
Calculus
Common Lisp
* focus Scheme
→
Haskell
.
Applied A-
ML
SML
OCaml
7
Running Scheme Racket: a Scheme-based language
https://racket-lang.org
DrRacket: IDE for Racket
8
Scheme
We focus on the non-imperative subset of Scheme (no assignments, whiles)
Syntax:
• Atoms: e.g., 3, #t, #f, “abc”
• Lists: (3 #t “abc”)
• Functions: (sqrt 2), (+ 1 2 4)
• Comments: (+ 1 2) ; evaluate to 3
”
pts
+ false
Prefix form
Read-Eval-Print Interpreter
fun para ti
• (sqrt 2) is the same as (eval ‘(sqrt 2))
intuleval beforeIt
Inst
9
Parenthesis in Scheme
In Scheme, () is used for lists, and both code
and data are lists
Function is a first-class value
(t23)⇒ (eval(t23)) (234)⇒ (234)
• (+ 2 3) I
• (2 3 4)
Lists are evaluated as function calls
• (+ 2 3)
• ((+ 2 3))
• (2 3 4)
Evaluates to 5
5
–
Exception: 5 is not a procedure Exception: 2 is not a procedure
(eval(2 3 41)→ error Taj function
(5) → feral s) b
error
Lists w/o evaluation: use apostrophe (’)
• ’(234)
Returns (2 3 4)
10
#8
Functional Programming and Scheme
CMPSC 461
Programming Language Concepts Penn State University
Fall 2020
Parenthesis in Scheme
In Scheme, () is used for lists, and both code
and data are lists • (+ 2 3)
• (2 3 4)
Lists are evaluated as function calls
• (+ 2 3) Evaluates to 5
• ((+ 2 3)) Exception: 5 is not a procedure • (2 3 4)
Exception: 2 is not a procedure
Lists w/o evaluation: use apostrophe (’)
• ’(234)
Returns (2 3 4)
2
Expressions Arithmetic:
Relational: Conditional: Type test:
(* 1 2 3 4)
(+ (* 1 2) (* 2 4))
(< (* 1 2) (+ 1 1))
(and (> 2 4) #f)
(if (< 2 3) 1 (+ 2 3))
(* 2 (if (< 3 5) 3 “abc”))
(number? 2) (number? #f)
(string? “a”) (string? 1)
(boolean? #t) (boolean? 1)
3
Environment
Environment binds identifiers to values
• +,*,-,/ are identifier bound to values • Functions are values in Scheme
Built-in identifiers in initial environment
• +, sqrt, positive?, max, ...
Add bindings to the environment • Key word: define
4
Definition Built-in identifiers
(max 1 2 3)
“define” introduce global identifiers
(define pi 3.14159)
(define radius 2)
(* pi (* radius radius))
(define circumference (* 2 pi radius))
circumference
5
Local Definition Let Expr.:
(let ((x 3)
(y (sqrt 7)))
(+ x y))
(let ((x1 exp1)
(x2 exp2)
...
(xn expn))
body_exp)
General form:
x1,x2,...,xn can be used in body_exp
6
Local Definition Let* Expr.:
(let* ((x 3)
(y x))
(+ x y))
(let* ((x1 exp1)
(x2 exp2)
...
(xn expn))
body_exp)
General form:
x1 can be used in exp2, exp3, ..., body_exp x2 can be used in exp3, exp4, ..., body_exp ...
7
Anonymous Function
Anonymous functions are introduced by lambda
Example
(lambda (x y z) expr)
parameters func. body
((lambda (x) x) 1)
((lambda (x y z) (+ x y z)) 1 2 3)
((lambda (f) (f 2)) (lambda (x) (* 2 x)))
8
Named Function
(define pi 3.14159)
In Scheme, function is a first-class value, so similarly, we can define named functions
(define f (lambda (x) (+ x 1)))
(f 1)
(let ((f (lambda (x) (+ x 1)))
(g (lambda (y) (* y 2))))
(+ (f 1) (g 2)))
9
Named Function
In Scheme, named functions can be introduced in a more concise way:
(define (f x) (+ x 1))
func. name parameter
is the same as
(define f (lambda (x) (+ x 1)))
10
Named Function
Common use:
Name Parameter
(define (sqr x) (* x x))
(sqr 5)
(define (area x y) (* x y))
(area 5 10)
(define (f x y z)
(let ((x1 exp)
(x2 exp))
f’s body))
11
Function Examples
(define (test x)
(cond ((number? x) "num")
((string? x) "str")
((list? x) "list")
(else "other")))
(define (abs x)
(cond ((< x 0) (- x))
(else x)))
(define (factorial n)
(if (= n 0) 1
(* n (factorial (- n 1)))))
12
Scheme: Key Points
Evaluation of expressions
(e0 e1 e2 e3)
define, if, cond, ‘
Key words
No static type system
(define (f x) (+ x “abc”))
(* 2 (if (< 3 5) 3 “abc”))
13
Scheme: Key Points
No assignments, no iterations (loops)
All variables are immutable (mathematical symbols)
14
¥9
Functional Programming and Scheme
CMPSC 461
Programming Language Concepts Penn State University
Fall 2020
Recursion and Induction
Recursion
• Computationally: a procedure/function calls itself
• Semantically: defining a computation/value inductively
Induction
• Naturalnumber:i)0∈N ii)ifn∈N,then(n+1)∈N
• Inductive proof that property P holds on any n ∈ N :
• i) for 0, P holds
• ii)ifn∈NandPholdsonn,thenPholdson(n+1)∈N
2
Factorial
!! = !∗ !−1 ∗⋯∗1
Inductive definition
• Factorial:
• i)0!=1
• ii) n! = + ∗ + − 1 !
(define (factorial n)
(if (= n 0) 1
(* n (factorial (- n 1)))))
3
Fibonacci Number
Numbers in the following sequence: 1 1 2 3 5 8 13 21 ...
Inductive definition
• fib(1) = 1
• fib(2) = 1
• fib(n) = fib(n-1) + fib(n-2)
(define (fib n)
(cond ((= n 1) 1)
((= n 2) 1)
(else (+ (fib (- n 1)) (fib (- n 2))))))
4
List
A list is a sequence of expressions in parentheses
(1 2 3) (sqrt x) (x 1 “abc”)
Programs are just lists interpreted as code
A list is either
• Empty: ()
• or non-empty: it has a head and a tail, where
the head can have any type the tail is itself a list
Inductive definition!
5
List Construction
Inductive definition
• ()isalist
• Ifeis an expression,xsis a list,(cons e xs) is a list
“cons” is a built-in function for constructing lists
(1 2 3 4) is equivalent to
(cons 1 (cons 2 (cons 3 (cons 4 ‘()))))
head tail
6
List Destruction
tents of address register of data cops → contents
car and cdr are the destructors for lists: if xs is a non-empty list, then
(car xs) is the head
(cdr xs) is the tail
Algebraically:
(car (cons x xs)) = x (cdr (cons x xs)) = xs
register
7
List Destruction
(car ’(1 2 3)) = 1, which is the same as
(car (cons 1 ’(2 3)))) = 1
(cdr ’(1 2 3)) = (2 3), which is the same as
(cdr (cons 1 ’(2 3)))) = (2 3) X
8
List
(1 2 3) (sqrt x) (x 1 “abc”)
“cons” adds an element to a list “car” returns the head of a list
“cdr” returns the tail of a list “append” concatenates two lists
append ’(1 2 3) ’(4 5 6)
9
Additional List Functions list creates a list from its arguments
(list 1 2 3)
null? checks if a list is empty
list? checks whether something is a list
cadr the second item in a list caar, cdar, cddr ...
10
List Example
Define lstSum, which returns the sum of a num list
Inductive definition i) ‘(): 0
ii) (cons head tail): head + lstSum(tail)
(define (lstSum lst)
(if (null? lst) 0
(+ (car lst) (lstSum (cdr lst)))))
(1stSun' (123)):b (1stSun'(dy G 8D - 20
11
Recursion Over Lists
(define (f l)
(if (null? l) (base-case)
(recursive-case, using (car l),(cdr l))))
12
Equality Test
equal: return #t if two expressions have the same structure and content
(equal? 1 1)
(equal? ’(1 2) ’(1 2))
(equal? 5 ’(5))
(equal? ’(1 2 3) ’(1 (2 3)))
(equal? ’(1 2) ’(2 1))
13
List Example
Define subs, which takes a b l, and replaces a
with b in l
(define (subs a b l)
base case
(if (null? l) '()
(let ((rest (subs a b (cdr l))))
(if (equal? (car l) a)
(cons b rest)
(cons (car l) rest)))))
2'llI 2 2 33)
' '(22 2 2 33)
( subs I
14
# to
Functional Programming and Scheme
CMPSC 461
Programming Language Concepts Penn State University
Fall 2020
Higher-Order Functions
In Scheme, function is a first-class value (Functions can go wherever expressions go)
• Higher Order Functions
• Function can be used as an argument • Function can be used as a return value
2
Function as Argument
Repeat any function twice? jT
(define (double x) (* x 2)) X * 2 (define (quad x) (double (double x)) (x #
(define (square x) (* x x))
(define (fourth x) (square (square x))
2)* 2
More reusable code
%be.ie
-
(define (twice f x) (f (f x))) 2.
pp (twice double 2)
(twice quad 2)
3
Examples w/o Higher-Order Functions Find a=ll numbers > 5 in a list
newlist = empty
foreach (elem in lst)
if (elem > 5)
add elem to newlist
Find all strings in a list
newlist = empty
foreach (elem in lst)
if (elem is String)
add elem to newlist
4
takes each element in 1st
either returnfor#t or Hf
filter f l: return elements for which f returns #t D9 O →
Filter
list
DDD fttb
#t#f#f
XX
.
#t D
…
My
…..
D
return
value
..–
5
Examples w Higher-Order Functions
filter f lf: return elements for which f returns #t
tf #t tf #t #f #f (filter string? ’(1 “1” 2 “2” 3 “3”))
f (filtery
”””” (rs
“”) 3
‘””‘ (lambda (x) (> x 5)) ’(1 4 7 10))
(: 7. 4 7 10″
*! tht#t ¥#t tho) Ht
6
Examples w/o Higher-Order Functions Multiple 2 to each element in a list
newlist = :empty foreach (elem in lst)
add elem*2 to newlist
Square each element in a list
newlist = empty
foreach (elem in lst)
add elem*elem to newlist
7
Map
element in 1
n e w
g)
take’
map f 0l: applies function f to each element of list l
element
in
result
D
ft D
—
→
ft D
I:
ft
D
….
.
D
list
D
return
:
…..
8
Examples w Higher-Order Functions
map f l: applies function f to each element of list l
(map CdIouble ’(1 2 3)) (map square ’(1 2 3))
‘ 124
6) 11149 )
I:D doublet
D23D 1: DBB doth doth pfifI
Dz
4£B DDyB
9
DaDd. . . Da Otta. . . to
(equals?x a)bx) In previous lecture, we had:
1st
Map
Da to
IX. (tf
(define (subs a b lst)
(if (null? lst) ’()
(let ((rest (subs a b (cdr lst))))
(if (equal? (car lst) a)
(cons b rest)
(cons (car lst) rest)))))
Using map, we have
(define (subs a b lst)
(map (lambda (c)
(if (equal? a c) b c))
lst))
10
Map on Multiple Lists
map f l1 l2 l3: applies function f to elements at
the same position of list l1,l2,l3 (with same length) left
(map + ’(1 2 3) ’(4 5 6) ’(7 8 9))
y
fight
(map (lambda (x y) (- x y)) ’(1 2 3) ’(4 *5 6))
0 .
11100230 :*# i 10
2
11
” ” ¥ ¥ ¥ ¥ ¥: t
.
Q
↳
11
7
7
4
4
4
#H
Functional Programming and Scheme
CMPSC 461
Programming Language Concepts Penn State University
Fall 2020
Higher-Order Functions
In Scheme, function is a first-class value (Functions can go wherever expressions go)
• Higher Order Functions
• Function can be used as an argument • Function can be used as a return value
2
Examples w/o Higher-Order Functions Compute the sum of a list
O
init = 0
foreach (elem in lst)
init += elem
d'”” ” d'””” Compute the product of a list
i n i t = O1
foreach (elem in lst)
O
init *= elem
3
Foldl
foldl f a (e1 e2 en): returns
&
Pinilla valve
1st
f en(… (f e2(f e1 a)) …)
l ll!
Ci ez
ez
Cn
current ← element
:* :a w :*
l ‘fIlf la!fla
..
…
f
!
Al , a
ur:c: value
.
←
output
4
Examples with Higher-Order Functions foldl f a (e1 e2 en): returns
f en(… (f e2(f e1 a)) …)
%
(foldl + 0 ‘(1 2 3 4 5))
(foldl * 1 ‘(1 2 3 4 5))
I3 24
5:
÷ is
:*. ¥:* :*
:*.
5
Foldl
In previous lecture, we had:
(define (lstSum lst)
(if (null? lst) 0
( + (car lst) (lstSum (cdr lst)))))
Using foldl, we have
(define (lstSum lst) (foldl + 0 lst)
6
Examples with Higher-Order Functions foldl f a (e1 e2 en): returns
f en(… (f e2(f e1 a)) …)
current element result i → current
(foldl (lambda (x a) (+ a (* x x))) 0 ‘(1 2 3 4 5))
234S
tinsmiths
Ol
Il
ll
l
f lo “””
fdO
‘ 11.235
,,
,
7
Function as Return Value
In Scheme, function is a first-class value (Functions can go wherever expressions go)
• Functions as return values
(define (addN n) (lambda (m) (+ m n)))
((addN 10) 20) (+ 10 20)
(twice (addN 10) 20) (twice (+ 10) 20)
8
Currying and Partial Evaluation
9
Multi-Argument Functions
(define (add m n) (+ m n))
add: Int, Int → Int
Pass multiple parameters as a list?
(add ’(1 2))
(add 1 2)
add takes 2 parameters
10
Multi-Argument Functions
(define (add m n) (+ m n))
add: Int, Int → Int
Add 2 to each element in a list?
map f l: applies function f to each element of list l
(map (add 2) ’(1 2 3)) ?
11
Multi-Argument Functions
(define (add m n) (+ m n))
add: Int, Int → Int
Add 2 to each element in a list?
map f l: applies function f to each element of list l
(map (add 2) ’(1 2 3))
add takes two parameters
A multi-argument function can only
be used when all parameters are ready!
12
Currying: Every function is treated as taking at most one parameter
(define (add m n) (+ m n))
add: Int, Int → Int
Curried version
(define (addN n) (lambda (m) (+ m n))
addN: Int → (Int → Int)
also written as: Int → Int → Int (right associative)
13
Uncurried version Curried version
(define (add m n)
(+ m n))
(define (addN n)
(lambda (m) (+ m n))
add: Int, Int → Int
addN: Int → (Int → Int)
(add 2 3) ((addN 2) 3)
Curried form allows partial evaluation
(map (add 2)
’(1 2 3))
(map (addN 2)
’(1 2 3))
14
Currying
Haskell B. Curry Penn State 1929-1966
Outside of McAllister Building
15
Currying
In terms of lambda calculus,
the curried function of !”!”” … “#. % is !”!. (!””. … !”#. % )
(define (curry3 f)
(lambda (x)
(lambda (y)
(lambda (z)
(f x y z)))))
(define (curry2 f)
(lambda (x)
(lambda (y)
(f x y))))
16
Partial Evaluation
A function is evaluated with one or more of the leftmost actual parameters
((curry2 add) 2)is a partial evaluation of add
We can think it as a temporary result, in the form of a function
17
Partial Evaluation
A function is evaluated with one or more of the leftmost actual parameters
(map ((curry2 add) 2) ’(1 2 3))
add: 78 9. (+ 8 9) curry2add :78.79.(+89)
curry2add 2 :79.(+29)
map( curry2add 2) ’ 123 :’ 345
18
Uncurrying
In terms of lambda calculus,
the uncurried function of !”!. (!””. … !”#. % ) is !”!”” …”#.%
(define (uncurry2 f)
(lambda (x y)
((f x) y)))
(define (uncurry3 f)
(lambda (x y z)
(((f x) y) z)))
19
Uncurrying
In terms of lambda calculus,
the uncurried function of !”!. (!””. … !”#. % ) is !”!”” …”#.%
(define (uncurry2 f)
(lambda (x y)
((f x) y)))
(define (uncurry3 f)
(lambda (x y z)
(((f x) y) z)))
19
#12
Syntax
CMPSC 461
Programming Language Concepts Penn State University
Fall 2020
Imperative Language Functional Language OO Language
C, Python Scheme
1+2
Java, Ctt
How does the Magic Box work?
3
2
t
Source Code to Target Code
”
(1.0+2)+x
Token
( 1.0 + 2 ) + x
“\\—
+ + id
Num Num 1.0 2
+ only take numbers x is in scope (visible)
,
Cmpsc461
x
Abstract Syntax Tree
Cmpsc471
3
Formal Languages
Language: a set of (legal) strings
Goal: a concise & precise notation for specifying a language
Four levels of languages [Chomsky]:
1. Regular
2. Context-Free
3. Context-Sensitive 4. Unrestricted
programming languages
4
Lexical Syntax
tokens
Rules for basic symbols, such as
identifier, literals (e.g., numbers), operators, keywords, punctuation
C language:
Identifier: letters, digits and underscore ‘_’ only. The first character must be an underscore or a letter
literals: digits, decimal point, suffix such as “l”, “u” operators: + – * / …
keywords : if, while, for, int, … punctuation:{ } [ ] ; …
5
(1.0+2)+x
Token
( 1.0 + 2 ) + x
6
Language of tokens (C)
Identifier: letters, digits and underscore ‘_’ only. The first character must be an underscore or a letter
literals: digits, decimal point, suffix such as “l”, “u” operators: + – * / …
keywords : if, while, for, int, …
punctuation:{ } [ ] ; …
How can we specify these tokens (sets of strings)?
7
Regular Expression Definition:
AB
ab
• A character
• Empty string (!)
• Concatenation of two RE (e.g., (ab))
• Alternation of two RE, separated by “|” (e.g., (a|b)) • Closure (Kleene star) (e.g.(a*))
. . . ,A9
….
8
Examples
RE Meaning
a “a”
ac “ac”
a|c “a” or “c”
a* “” or “a” or “aa” or … (a|b)(c|d) “ac” or “ad” or “bc” or “bd” (ab)|(cd) “ab” or “cd”
((a|b)c)* “” or “ac” or “bc” or “acac” or “bcbc” or …
9
More Formally…
Regular expression defines a set of strings (aka. a
language)
I takes Rf , and outputs a language
“($): the language defined by RE $
• A character &: “(&) = {“&”}
• Empty string !: “(!) = {“”}
•Alternation:”$, =”$ ∪”(,) •Kleenestar:” $∗ = “” ∪” $ ∪” $$ ∪⋯
• Concatenation: “($,) = {-. /|- ∈ “($), / ∈ “(,)}
On
String concatenation
a
10
((R) means apply L to R Examples
• Acharacter!:” ! = ”x”
• Emptystring&:” & =
• Concatenation:” () = *.,*∈” ( ,,∈” ) • Alternation:”() =”( ∪”) •Kleenestar:”(∗ =””∪”(∪”((∪⋯
RE
a
“”
“(RE) {“a”} {“ac”}
( Ca) :
Vac) : Er. s
{a)
ac
a|cHalo): Ha)uhc)=3″a”}u{“c”):{“a”,“c”}
a*
{“”, “a”, “aa”, …}
11
Examples
• Acharacter!:” ! = ”x”
• Emptystring&:” & =
• Concatenation:” () = *.,*∈” ( ,,∈” ) • Alternation:”() =”( ∪”) •Kleenestar:”(∗ =””∪”(∪”((∪⋯
RE “(RE)
(a|b)(c|d)= lr.slrellalbl.se Held)) {“ac”, “ad”, “bc”, “bd”}
(ab)|(cd):{” ” IvLaalb)c)ulllalblc){“ab”,“cd”} =A
((a|b)c)*
Halb)4)u
…
{“”, “ac”, “bc”, “acac”, “bcbc”, …}
12
Examples
• Acharacter!:” ! = ”x”
• Emptystring&:” & =
• Concatenation:” () = *.,*∈” ( ,,∈” ) • Alternation:”() =”( ∪”) •Kleenestar:”(∗ =””∪”(∪”((∪⋯
RE (a|b)(c|d) (ab)|(cd) ((a|b)c)*
“(RE)
{“ac”, “ad”, “bc”, “bd”} {“ab”, “cd”}
{“”, “ac”, “bc”, “acac”, “bcbc”, …}
12
#B
Syntax
CMPSC 461
Programming Language Concepts Penn State University
Fall 2020
Regular Expression
Definition:
• A character
• Empty string (!)
• Concatenation of two RE (e.g., (ab))
• Alternation of two RE, separated by “|” (e.g., (a|b)) • Closure (Kleene star) (e.g.(a*))
2
Precedence of RE
The order is (high to low) • Closure (*), then
• Concatenation, then
• Alternation
Analogy in arithmetic: • Exponentiation
• Multiplication
• Addition
( a*b)tCc*a) ( Jl ) →
(a) + ( b*lcH*d ) (ab)|(cd)
(a)|(b(c*)d)
( a ) N b c ) ‘” d ) (a) tKb*cle*d )
ab|cd
C)() → a|b(c*d)
3
UNIX Extensions to RE
Extension
[abcd] – )
Core RE a|b|c|d
–
.
–
. (wild cast) →
a|b|c|…(all char. except new line) !|a
a(a*)
—
a? ” ” l”a” –
aaa ,
→ }→
‘””””
a+I’a –
“aa
…
,, –
[a-z]
\[ \] \.
→ a|b|c|…|z
→ [ ] .
—
( -2.7 ##
Go-20]x
a
4
Grammar
of languages name
in RE language
1! → # d e f . nonterminal Lang. definition in
symbol meta language
of
How to write down a RE grammar?
5
Language of tokens (C)
Identifier: letters, digits and underscore ‘_’ only. The first character must be an underscore or a letter
numbers: digits, decimal point, suffix such as “l”, “u” operators: + – * / …
keywords : if, while, for, int, … punctuation:{ } [ ] ; …
character
past
“#$%&”‘”$( → [ _a-zA-Z ][ _a-zA-Z0-9]O∗
Grammar
‘ O’ ‘ pl
%+,-$( → (0|[1-9][0-9]∗)(\. [0-9]+)? (l?|u?) ./$(0&.( →+|-|∗|/|…
1$23.(# →if|while|for|int|… /+%4&+0&”.% →{|}|\[|\]|;|…
lol
O
6
Is a RE grammar correct?
• Does it cover all legal strings?
• Does it allow any illegal string?
7
bit COD Examples
{“O” “l”) 21011):{“O” “l”) ,,
RE
–
bit
0|1 (0|1)(0|1)(0|1)(0|1) (0|1)+ ((0|1)(0|1))+
[0-9] | [1-9][0-9] |1 [0-9][0-9] |2 [0-4][0-9]
| 25[0-5]
Use the RE above ?
4 bits [01%131017101]
(non-empty) bits com.
(non-empty) Even # of bits
frag: Num bet. 0 and 255
161710131’T 10-931
( I -9710-911 110-9310-911
Ho- 4)(o- 931 Slo – S]
“”
IP Address
bits with equal 0’s and 1’s
frag.frag.frag.frag .
8
#14
Syntax
CMPSC 461
Programming Language Concepts Penn State University
Fall 2020
Midterm 1 is rescheduled to Oct. 7th 6:15 to 7:30 HW2 is extended to this Friday midnight
1st Mideterm covers materials from HW1 to HW3 More details coming later
Scanning with RE
A scanner takes an ordered list of REs For each RE, read one character at a time and • output a token, or
• wait to see next character, or
• use next RE / report an error
3
(Not covered in this class)
4
Formal Languages
Language: a set of (legal) strings
Goal: a concise & precise notation for specifying a language
Four levels of languages [Chomsky]: 1. Regular
2. Context-Free
3. Context-Sensitive 4. Unrestricted
bits with equal 0’s and 1’s
Not a regular language!
5
Context-Free Grammar (CFG)
A CFG consists of
• A set of terminals T (basic alphabet) • A set of non-terminals N
• A start symbol S (a non-terminal)
• A set of production rules
!→#
nonterminal
string of terminals and nonterminals, separated by, ‘|’ meaning alternation
‘ ’ meaning concatenation
6
Context-Free Grammar (CFG)
Terminals
• Building block of CFG
• E.g., letters, digits, or tokens from scanner
Non-terminals
• Symbols defined by production rules
• E.g., if-stmt → IF LPAREN expr RPAREN stmt
terminal terminal
7
CFG Example
A pure λ-term is defined inductively as follows: • Any variable ! is a λ-term
• If ” is a λ-term, so is λ!. ” (abstraction)
• If “!, “” are λ-terms, so is “!”” (application)
term → var
| term term
|) var . term terminals
8
Define Tokens via CFG
term → var
| term term
|) var . term var → letter var
| letter letter→, – …|/
InRE:var→([a−z]|[A− Z])+
9
CFG Example
expr → id number − expr (expr) expropexpr
op →+−∗|/ …
10
Typical Use of RE and CFG
(1.0+2)+x
Token
( 1.0 + 2 ) + x
+ + id
Num Num 1.0 2
RE CFG
x
11
Define Tokens by Scanner
expr → id | number | − expr (expr) expropexpr
where id number op are tokens recognized by the scanner
12
Build CFG
All strings with . 0’s followed by . 1’s, . ≥ 1? > → 0>1 | 01
13
Build CFG
All strings with . 0’s and . 1’s, . ≥ 0? > → 0>1> | 1>0> | B
14
RE and CFG
CFG is strictly more expressive than RE
1. For any RE %, there is CFG &, such that ‘(%) = ‘(&) 2.ExistsCFG&,suchthatforallRE%,’ % ≠'(&)
1. Proof by induction.
2. A language with , 0’s followed by , 1’s, , ≥ 1
> → 0>1 | 01
15
RE
B
a
R1 R2 R1 |R2
R*
CFG
>→B
>→a
> → C1 C2
>→C1 |C2 >→>C|B
where !1, !2 and ! are the equivalent nonterminal for R1,R2 and R in RE respectively
16
#Is
Syntax
CMPSC 461
Programming Language Concepts Penn State University
Fall 2020
Context-Free Grammar (CFG)
A CFG consists of
• A set of terminals T (basic alphabet) • A set of non-terminals N
• A start symbol S (a non-terminal)
• A set of production rules
!→#
nonterminal
What strings are
defined by a CFG?
string of terminals and nonterminals, separated by, ‘|’ meaning alternation
‘ ’ meaning concatenation
2
Context-Free Language
!(#): the language (set of strings) defined by #
Definition:
!(#) is the set of all terminal strings derivable from the start symbol of #
3
Derivation
A sequence from start symbol, each step a non- terminal is replaced by RHS of a production
pstart
var ⇒ letter var
⇒ letter letter var ⇒ letter letter letter ⇒ x letter letter ⇒ x y letter ⇒ x y z
A multi-step reduction var ⇒* xyz
symbo
Var → letter var | letter
letter→a|b|…|Z
A derivation of “xyz” with rule
4
Derivation
start symbol y
O
expr → id number − expr
(expr) expropexpr op →+−∗|/
…
expr ⇒ expropexpr ⇒ idopexpr ⇒ idopexpropexpr ⇒ idopnumberopexpr ⇒ idopnumberopid
⇒ x op number op id ⇒ x + number op id ⇒⋯⇒x+3∗y
Areduction expr⇒* x+3 ∗y
A derivation of “x+3*y” with rule
5
Order of Derivation Derivation can follow any order
non terminal
letter →a|b|…|Z
replace left most
Leftmost derivation of “xyz” only
2
⇒ x y var ⇒ x y letter ⇒ x y z
Rightmost derivation of “xyz”
var⇒lettervar
⇒ letter letter var ⇒ letter letter letter ⇒ letter letter z ⇒ letter yz ⇒ xyz
→
Writing derivation can be tedious →
]
var → letter var
var⇒lettervar 7 3
I1 I2 |letter
⇒ x var ⇒ x lettervar
1
t
→ ga* at
⇒§§
→ sat →gs*at
→
Sgt att a t Sat a*at
t
* aataa
6
Parse Tree
Derivation in graphical from
Root: the start symbol Leaf: terminal
parent
child1 child2 … childn
represents
parent ⇒ child! child” … child#
7
Parse Tree
Derivation in graphical from
Derivation Step Tree
var ⇒ letter var
var
letter var
8
Parse Tree Example Leftmost derivation of “xyz”
”
var ⇒ letter var O
⇒ x vOa r ⇒ x
⇒ x 0y v 0a r ⇒ x y l e t t e r ⇒ x y z
letter var
DO
var
letter var
x
y
letter var
letter z
9
Parse Tree Example
Rightmost derivation of “xyz”
°var ⇒ letter var
⇒ letter letter var ⇒ letter letter lOetter
OO
–
OO
⇒ letterletterz ⇒ letteryz ⇒ xyz
var letter avar
The parse tree is the same as leftmost derivation!
O O
vOar letter
x
y
letter
D
O
z
10
Parse Tree Features
\var letter var
x
y
•Root is the start symbol •Leaves are terminals
•Other nodes are nonterminals •Leaves (left to right) form the
parsed string
” xyz
letter var letter
z
J
”
11
Parse Tree Example
parse trees for “x+3*y”
expr expr
expr
expr op expr
id + number x3
expr → id number − expr (expr) expropexpr
op →+−∗|/ …
op expr expr op expr
∗
id OR id y x
+ expr op expr
number ∗ id
37
12
Ambiguity
Two different parse trees for “x+3*y”
expr
op expr
expr op expr
expr
op expr
+ expr op expr
expr
expr
∗
id
y
OR id x
+ x3
id
number
number ∗ 37
id
A grammar is ambiguous if its language contains at least one string with two or more parse trees
13
Ambiguity
Is the following language ambiguous?
expr → id number − expr (expr) expropexpr
op →+−∗|/
Yes, two parse trees (previous slide) for string x+3*y
14
Expression Grammar
Precedence: which operator should be evaluated sooner
Arithmetic: * and / have higher precedence; all operators are left-associative
Associativity: (binary) operator with same precedence evaluated left-to-right or right-to-left
15
Precedence
expr expr
expr
expr op expr
id + number x3
(x+3)*y
op expr expr op expr
∗
+ expr op expr y x number ∗ id
id OR id
OR x+(3*y)
37
The lower an operation is, the higher precedence it has
16
Defining Precedence in Grammar Define operations at different “levels”
expr → id number − expr (expr) expropexpr
op →+−∗|/
expr → expr + expr |expr − expr | term term → term ∗ term | term/term| factor factor → id number (expr)| − factor
Level3
Level1
Level2
The farther from start symbol, the higher precedence
17
expr
expr
term factor id
x
+ expr term
term ∗ term factor factor
number 37
id
x+(3*y)
Only one parse tree for x+3*y
18
# lb
Syntax
CMPSC 461
Programming Language Concepts Penn State University
Fall 2020
Defining Precedence in Grammar Define operations at different “levels”
expr → id number − expr (expr) expropexpr
op →+−∗|/
expr → expr + expr |expr − expr | term term → term ∗ term | term/term| factor factor → id number (expr)| − factor
Level3
Level1
Level2
The farther from start symbol, the higher precedence
2
Ambiguity
expr expr
expr
expr op expr
id + number x3
(x+3)*y
String with different parse tree might have different meanings
op expr expr op expr
∗
+ expr op expr y x number ∗ id
id OR id
OR x+(3*y)
3$
3
Associativity of Binary Operators
A property of binary operations
… + a + …: sign “+” is left-associative since
a is associated with the left “+”
An operator with left (right) associativity is evaluated left-to-right (right-to-left)
Left: +,-,*,/
Right: = in C ( a=b=c same as a=(b=c))
4
Associativity in CFG
Left-recursion: LHS is the start of RHS in a derivation
E! →∗ E! op …
Defines left-associativity on op
*!
E! op E!
E! op
E! →∗ …op E!
Defines right-associativity on op
E!
Right-recursion: LHS is the end of RHS in a derivation
5
Recursion and Ambiguity
Left-recursion: E! →∗ E! op … Right-recursion: E! →∗ … op E!
A grammar with at least one operation that is both left and right recursive is ambiguous
*!
E! op E!
E! op E!
Two parse trees for
*!
E! op E!
E! op E!
E! op E! op E!
6
Direct Recursion in Grammar
Left-recursion: E! →∗ E! op … Right-recursion: E! →∗ … op E!
expr → expr + expr |expr − expr | term term → term ∗ term | term/term| factor factor → id number (expr)| − factor
The production rule of +,-,*,/ is both left- and right-recursive. So the grammar is ambiguous.
7
Different parse trees for string “6-5-4”
Left tree evaluates to -3; right tree evaluates to 5
expr expr −
term
term − term
factor factor number id
expr
term
expr
term
expr
−
factor factor
id id
expr term
term − term
factor
number id
factor
4 6 65584
Indirect Recursion in Grammar
Left-recursion: E! →∗ E! op … Right-recursion: E! →∗ … op E!
equals → var = equals1
equals1 → equals | var
Indirect recursion: equals →∗ var = equals
= is right-associative
9
Defining Associativity in Grammar
expr → expr + expr |expr − expr | term term → term ∗ term | term/term| factor factor → id number (expr)| − factor
Remove right-recursion on +, −,∗,/
expr → expr + term |expr − term | term term → term ∗ factor | term/factor| factor factor → id number (expr)| − factor
In this grammar: +, −,∗,/ are left-associative
10
expr
expr −
expr − term
term
factor number
6
(6-5)-4
Only one parse tree for 6-5-4
term
factor factor id
number
4 5
11
Dangling Else Ambiguity
if (b1)
if (b2) c1
else c2
OR
if (b1)
if (b2) c1
else c2
stmt
stmt stmt if (expr) stmt ORif (expr)stmtelse
if ( expr ) stmt elsestmt if ( expr ) stmt
Grammar:stmt→if expr stmt|if expr stmtelsestmt
12
Avoid Dangling Else
Approach 1: Change syntax (ALGOL, Ada)
Grammar:stmt→if expr stmtAi|if expr stmtelsestmtAi
if (b1)
if (b2) c1
else c2
fi
fi
if (b1)
if (b2) c1
fi
else c2 fi
13
Avoid Dangling Else Approach 2: Keep syntax, change grammar
Grammar: stmt → matched | unmatched matched→if expr matchedelsematched
unmatched→if expr stmt
|if expr matchedelseunmatched
if (b1)
if (b2) c1
else c2
if (b1)
if (b2) c1
else c2
14
Concrete Syntax Tree
expr
term factor id
x
expr
+ expr term
term ∗ term factor factor
number 3$
id
The parse tree is very verbose. It tracks information only useful for parsing (e.g., defining precedence and associativity)
15
Abstract Syntax Tree (AST)
expr
+ expr term
expr
term factor id
x
+
∗
x
y
3
term ∗ term
OR
BinOp: +
factor
number id
factor 3$
BinOp: ∗
Id: x
Number:3 Id: y
16
Abstract Syntax Tree (AST)
BinOp: +
AST doesn’t show the whole syntactic clutter,
e.g., parentheses, nonterminals added for unambiguity
AST keeps the same structure as parse tree
Each node usually corresponds to an object in impl. (more manageable for later compiler stages)
AST hides syntactical language differences
Id: x
BinOp:∗
Number: 3
Scheme: (* 3 y) BinOp: ∗
Id: y
C, Java: (3 * y) BinOp: ∗
Number: 3 Id: y Number: 3 Id: y 17
Expression Gramma
• Define precedence (grammar with different levels of operators)
• Define associativity (left-recursive vs. right-recursive derivations) to avoid ambiguity
Avoiding ambiguity in general is undecidable
Abstract Syntax Tree (AST) is a compact representation of parsing tree
18
Parsing
Assemble tokens (from scanner) to a syntax tree, according to a CFG
In practice, most programming languages falls in CFG with linear time parser (e.g., LL, LR)
Parser generator (e.g., Yacc, Bison) automatically generates parser from CFG
19
(1.0+2)+x
Token
( 1.0 + 2 ) + x
Abstract Syntax Tree
RE, FSA
CFG
Type check Scope check
+ only take numbers x is in scope (visible)
20
#
17
Names, Scopes, Bindings
CMPSC 461
Programming Language Concepts Penn State University
Fall 2020
Plan
HW3 due this Friday midnight
1st Mideterm covers materials from lecture 1 to 16, i.e., HW1 to HW3
Next Monday: review of assignments and practice problems
Next Wednesday: evening exam (no lecture)
Midterm 1
Carefully read the exam instructions (see Canvas announcement) before the exam
Practice questions are posted on Gradescope (will not be graded)
Conflict exam: send me (dbz5017@psu.edu) and Zeyu (dxd437@psu.edu) with supporting documents, if any, to request a conflict exam by this Friday
(1.0+2)+x
Token
( 1.0 + 2 ) + x
Abstract Syntax Tree
RE, FSA
CFG
Type check Scope check
+ only take numbers x is in scope (visible)
4
Limitations of CFG Grammar for a simple language:
Need Context Sensitive Information
y defined?
b number?
Prog ::= Decl* Stmt*
Decl ::= int ID; | bool ID; Stmt ::= ID = Exp
int x; y=0
We need to add meanings to symbols when go beyond syntactic analysis
bool b; b=0
5
Names in Program
A mnemonic string in high-level languages
Identifiers in most languages
An abstraction of low-level representation, such as memory address and register
6
Variables
A binding of a name to a memory address (binding: symbol to object)
A variable usually has:
name, address, type, value, lifetime, scope
7
Names in Programs
Scope: visibility of names in source program
Static
Storage: memory space associated with names
Dynamic
Lifetime: the time interval a variable is allocated with memory
8
Objects and Lifetime
Run-time representation of a name Key events of objects:
• Creation of objects
• References to variables
• (temporary) deactivation of bindings • Reactivation of bindings
• Destruction of bindings
• Destruction of objects
source code
binding obj1
int x;
9
Typical Memory Layout
Higher Address
Lower Address
Stack
Free space
Heap
Static data
Code
10
Storage Allocation Mechanism and Object Lifetime
Static allocation: bound to memory cells before execution begins and remains bound to the same memory cell throughout execution
Stack allocation: storage bindings are created for variables when their declaration statements are elaborated
Heap allocation: Nameless memory cells that are allocated and deallocated by explicit directives “run-time instructions”, specified by the programmer (either explicitly or implicitly)
11
Static Allocation
Lifetime of Objects: throughout execution
E.g., global variables, constants Objects allocated in static data area
12
Stack Allocation
Function Function Call Return
An example of stack during program execution
Local variables are created/destroyed with function call/return 13
Lifetime of Local Variables
When does the lifetime of each variable begin & end?
int p(int p1) {
int px;
q(p1,px); }
int q(int q1,q2) {
int qx;
… }
int main () {
int x;
p(x);
q(x,x); }
main calls p B#
exit main I μ
91
start
e itofp
startofp #
start ofq
•
cxltofq
14
Heap Allocation Allocation Request (with size)
Free request
An example of heap during program execution
Typically used for user-requested storage
15
Lifetime of User-Requested Storage
When does the lifetime of dynamic storage begin & end?
String* x;
int main () {
p();
q(); }
int p() { …
x = new String[10];
… }
int q() { …
delete x
… }
callsp call q BIO
exit of main
start of main
.
X.new. – . -47,1
object created
d.de/.eexltofqnthheaPst:FTT.ubjeitdfeahbcnke4
gfastofp
.
off
from the heap
16
#18
Names, Scopes, Bindings
CMPSC 461
Programming Language Concepts Penn State University
Fall 2020
Names in Programs
Scope: visibility of names in source program
Static
Storage: memory space associated with names
Dynamic
Lifetime: the time interval a variable is allocated with memory
2
Scope
The visibility or availability of names
• Whichcoderegionisanamevisible?
When is it determined? • Static or Dynamic?
• Lexical or Control-Flow?
How & when does it change?
3
Static (Lexical) Scoping Scopes can be determined by the compiler
All bindings for identifiers can be resolved by examining the program
Typically, use the closet binding
Used in most languages (C, Java, Scheme)
4
(let*(name def) (i– … ..)
Scope Rules in Scheme
Tname
(let ((x 0)) Scope of x
(+ x x))
plane
I) …….
(use names))
ram’ (let*) Scope of x T
(let* ((x 0) (y x))
(+ x y))
Scope of inner y
y scope (let ((x 0) (y 7))
(let ((y (+ x 1))
(x (+ y 1)))
(+ x y)))
(let ((x 0))
→ name
(let ((y (+ x 1)))
(+ x y)))
Scope of x
→ name
(define (f x)
(g x x))
which y?
Scope of g
Scope of *
(define (f x)
(let ((g *))
(g x x)))
Scope of x Scope of z
(let ((x 17))
(lambda (z) (+ z x)))
b
(define (f x)
(let ((* +))
(* x x)))
name
5
Nested Scope
In Scheme, each (let …) defines one scope
(let ((x 0))
(let ((y 1)) (+ x y)))
In Pascal, ALGOL, function can be nested
function E (int x) {
function F (int y) …
return F(1)+x; }
// F is not visible here
In C, block scopes can be nested
How are
scope rules being checked?
function E (int x) {
{ int y=3; x=x+y;}
// y is not visible here
return x; }
6
Scope Check
Symbol table: a table of names and their bindings
Single scope: a dictionary or hash map
Nested scope: a tree of symbol tables • Each scope has one symbol table
• Each symbol table may have a parent
7
Symbol Table
Each entry in the symbol table contains
• The name of an identifier
• Additional information: its kind, its type, if it is constant and so on (compiler dependent)
Name
Kind
Type
Constant?
x
id
int
0
8
Scope Check Example
each
scope
has o n e symbol table
(global) (first let)
Name
Type
Constant?
Name
y
Type
int
Constant?
x
int
0
7
(let (⇒(x 0) (y 7))
–
–
(let ((y (+ x 1))
(x (+ y 1)))
(+ x y)))
(second let)
Name
Type
Constant?
y
int
1
x
int
8
9
Name Collision: Identifier with Same Name
(global) (first let)
Name
Type
Constant?
Name
y
Type
int
Constant?
x
int
0
7
(let ((x 0) (y 7))
(let ((y (+ x 1))
(x (+ y 1)))
(+ x y)))
which y?
which y?
To find a binding:
1. Startfromtableofcurrentscope 2. Goupuntilthenameisfound
3. Failifnotablecontainsthename
(second let)
Name
Type
Constant?
y
int
1
x
int
8
Name Search
10
Scope Check Example
In scope?
In scope?
In scope?
11
CMPSC 461: Programming Language Concepts Midterm 1 Practice Questions
Problem 1 -calculus.
a) What are the free variables in the -term ( x .( y . y z) y) x.
b) Fully evaluate ( x .( y .(( y . y) x))) z.
c) Define a function doubleChurch, which takes a Church Numeral and doubles it. Try to build it from scratch, without using the encoding of + or ⇤ as defined in lecture.
Problem 2 Higher-order functions and currying.
a) Whatisthecurriedformof xy.(+xy).
b) Explain in one sentence, what’s the result of ( x y . (+ x y)) 1.
Problem 3 Scheme.
a) Define a recursive Scheme function that takes a list of numbers and returns the product of all the list
elements. Given the empty list, the product should be 1.
b) Define the same functions using “foldl”.
Problem 4 Scheme.
What do the following functions do?
(define (blue p l) (= (length l) (length (filter p l)))) (define (renmap f x ys) (map (lambda (y) (f x y)) ys))
(define (binmap f xs ys) (map (lambda (x) (renmap f x ys)) xs))
Problem 5 Regular Expressions.
a) Define a regular expression for all strings of odd length, over the alphabet of {0}.
b) Define a regular expression for identifiers over the alphabet of {A,B,C,a,b,c,0,1,2,3,4,5,6,7,8,9}, such that an identifier must begin with an alphabetic character and must contain at least one numeric character.
Problem 6 Context free grammar.
Consider the following context free grammar:
P ::=id|P _P |P ^P |¬P |(P) id ::= p | q | r | s
where the set of terminals symbols is {p, q, r, s, _, ^, ¬, (, )} and the start symbol is P . a) Give the leftmost derivations for the following expression:
• ¬(p^q)
b) Show that this language is ambiguous.
1/2
Problem 7 Context Free Grammar.
For each of the following grammars state whether it is ambiguous or unambiguous. If it is ambiguous give an equivalent unambiguous grammar. If it is unambiguous, state all the precedences and associativities enforced by the grammar. Assume Id generates identifiers.
a) E ::= E + F | E F | F F ::= F |Id | (E)
b) E ::= E _ F | F
F ::=G\F |G[F |G G ::= G ^ H | H
H ::= Id | (E)
Midterm 1 Practice Questions, Cmpsc 461
2020 Fall
two different y
^
I2
1 add parenthesis
( xx.HN/.lyzDyDx. 2 fUc×) = Ex}
FUCXX.tl > Fact) – Ex}
H tt , .dz/:FUCt.)vfUltz)
A fvkxy.ly# y)
Fvclxx. A)x ) f-VIXX A) v FV (x )
.
— fvlxy.ly#ufVlyL –
=
‘ Kvly,z) lying? :#vlylufvlz)) – lyDuly}
:(ftp.l-lxllulx) 🙁 Iyaz) – Ext) ulxl
= Huggy : 14%2-1
–
– 1%23
pp
=p layHayy)#1)Ezlx)
-[
free variable
#) .
–
– ay Clay.y)z
S) : Xy. -2
=p ay
Cy
input
IAfxz. tlf…. H2))
…
afx*z.tl (f ffz))
..-
output
TT
in f la f z ))
In. It. 77.4 f ( n f z))
G -71 → AM -7 AS ⇒ AAM
÷
“” G→s⇒AM→aEM- a a it
legal
G→s→AM→abAe→abaE⇒aba “aba” is
•→S→→→b”” AMbAA aface→ aa baais
legal
legal
pattern:
more a ‘s then b’s
A: one more a in B:onemorebin
f:Ita=#b
string
string
¥19
Names, Scopes, Bindings
CMPSC 461
Programming Language Concepts Penn State University
Fall 2020
tatic Scoping SI
1 build symbol table for each scope
2 build a tree : child is defined in the scope of parent
3 find
name
(global)
Name
‘
n
Kind
Type
Id
int
first
fun
void ->void
second
fun
Void->void
int n=2;
global
first
–
void first() { μ n=1
}
(first)
(second)
Name
Kind
Type
Name
n
Kind
Id
Type
int
μ,
}
void second() {
int n=0;
,y
/ first();
Symbol tables determined at compile time Search for the binding from the table
for the current scope
–
first();
second();
2
Dynamic Scoping
Static Scoping
• Bindings determined by lexical structure • Local renaming principle
Dynamic Scoping
• Bindings depend on flow of control
• Always use most recent, active binding • Names important!
• Meaning of a variable can change
3
Dynamic Scoping and Stack
Looking for active binding
4
1 build symbol table
Dynam/ int n=2;
void first n= 1
}
void second() {
int n=0;
first();
}
first();
second();
for each 2 build tree : parent is the caller
scope
of the child (global)
3find name ic Scoping
Name
Kind
Type
n
Id
int
first
fun
void ->void
second
fun
Void->void
() {
(first)
Name
Kind
Type
Set global n to be 1
5
Dynamic Scoping
(global)
Name
Kind
Type
n
Id
int
first
fun
void ->void
second
fun
Void->void
int n=2;
void first() { n= 1
}
void second() {
int n=0;
first();
}
first();
second();
(second)
Name
Kind
Type
n
Id
int
(first)
Name
Kind
Type
Set n in “second” to be 1
Symbol tables changes at run time!
Always use most recent, active binding
6
Static vs Dynamic Scoping
Similarity
• One symbol table per scope
• Names are resolved from bottom to top in symbol table tree
Difference
• Static: the symbol table tree is stable
• Dynamic: the symbol table tree changes during execution
7
If a subroutine is passed as a parameter, how do we resolve names?
function F(int x) {
function G(fx) {
int x = 13;
fx(); };
function H() {
print x;
};
G(H); };
What’s the parent of H’s symbol table?
which x?
8
Who’s the parent of H’s symbol table?
function F(int x) {
function G(fx) {
int x = 13;
fx(); };
Shallow Binding: when the subroutine is called
function H() {
print x;
};
G(H); };
Deep Binding: when reference is created
9
Shallow Binding: use the environment at call time
(F)
Name
Kind
Type
x
para
int
G
fun
fun ->void
H
fun
void->void
function F(int x) {
function G(fx) {
int x = 13;
fx(); };
function H() {
print x;
};
G(H); };
(G)
Name
fx
x
Kind
para
Type
fun
Id
int
Established
at call time (H)
Name
Kind
Type
10
Deep Binding: use the environment when ref. is created
function F(int x) {
function G(fx) {
int x = 13;
(F)
Name
Kind
Type
x
para
int
G
fun
fun ->void
H
fun
void->void
fx(); };
function H() {
print x;
};
G(H); };
Established when ref. is created
(H)
Name
Kind
Type
11
Impl. of Deep Binding: Function Closures
function F(int x) {
function G(fx) {
int x = 13;
fx(); };
function H() {
print x;
A function closure contains:
• A pointer to the function
• The current symbol table
(or all symbol tables, depending on implementation)
};
G(H); };
Pass in a function closure
12
Shallow, Deep Binding and Static, Dynamic Scoping
Dynamic scoping
• Both shallow and deep binding are implemented • Deep binding has a higher cost at run time
Static scoping
• Shallow binding has never been implemented
13
#
20
Names, Scopes, Bindings
CMPSC 461
Programming Language Concepts Penn State University
Fall 2020
void f (int i) {
int a;
fun. f :÷¥ :
F
lT G
t
H
void h (int j) {
a = j;
}
void g (int k) {
h(k); }
g(i+2); }
t
2
void f (int i) {
int a;
void h (int j) {
a = j;
}
void g (int k) {
h(k);
}g(i+2); ? }
Where is a? From symbol table of h, go up two hops
Name
a
i
h
g
Name
k
Kind
id
para
fun
fun
Kind
para
Name
Kind
j
para
memory
3
Establishing Addressability
How does the compiled code locate a variable used in a function at run time?
Value of z
Value of y
Value of x
Value of z
Value of y
Value of x
z in func f y in func f x in func g
Memory abstraction in PL
0xf0000008 0xf0000004 0xf0000000
Real Memory
4
Establishing Addressability
How does the compiled code locate a variable used in a function at run time?
• What is the start address?
• Start address = base address (of a scope) + offset (in the scope)
Global variables
Local variables (w/ or w/o surrounding scopes) Heap variables
5
Typical Code and Data Layout When executed, a program occupies memory
Higher Address
Stack
Free space
Heap
Static data
Code
Code section
• Stores source code • Read-only
Data section
• Static data area • Stack
• Heap
–
global
Lower Address
6
Global & Static Variables
Base Address: fixed throughout the run
Offset: known at compile time (length of each variable is determined by its type)
Value of z
Value of y
Value of x
Base+12 (z)
Base+4 (y) Base (x)
Static data
Global variables: int x;
float y;
int z;
7
Scope and Stack
Lower Address
Higher Address
8
Each instance of a subroutine has
a frame (activation record) at run time • Compiler generates code that setup
frame, call routine, and destroy frame
Frame pointer (fp): currently active frame Stack pointer (sp): free space on stack
Hardware Support
X86 Registers and instructions: • Stack pointer register: esp
• Frame pointer register: ebp
• Push instructions: push, pusha, etc. • Pop instructions: pop, popa, etc
• Call instruction: call
• Return instruction: ret
9
Typical Frame Layout
Temporaries.: register spill area, language-specific exception-handling context, and so on (not covered)
Bookkeeping info.: a reference to the stack frame of the caller (also called the dynamic link) and so on
(when the frame is active)
10
voidOP(inta,intb){
&int x, y; …
}
void Q () {
int u, v;
P (3, 4);
i:c::”
addr ofx
adds ofy fpt 4 TO
fpt Y –
base
‘-
Y
–
Local Variables
base
The offsets of objects within a frame usually can be statically determined
Local x Local y
Frame of P
Previous fp
}
(when the frame is active)
Frame of Q
Return Addr
Para. a = 3 Para. b = 4
Local u Local v
11
Nested Scopes
Languages with nested functions:
How do we access local variables from other frames?
Static Scoping:
Static link: a pointer to the frame of enclosing function
Dynamic Scoping:
Dynamic link: pointer to the previous frame in the current execution (the same as previous fp)
12
Static Link and Dynamic Link
void f (int i) {
int a;
void h (int j) {
a = j;
}
void g (int k) {
h(k); }
g(i+2); }
Dynamic links
Static links
Frame h
Frame g
Frame f
13
Name
Kind
a
id
}
i
para
h
fun
g
fun
Name
Kind
j
para
Name
Kind
k
para
Where is variable a in h?
Static Scoping
void f (int i) {
int a;
void h (int j) {
a = j;
}
void g (int k) {
h(k); }
g(i+2);
Symbol tables
Where is a? From frame of h,
go up one hop following static links
Where is a? From symbol
table of h, go up one hop 14
Where is variable a in h?
Dynamic Scoping
para fun
void f (int i) {
int a;
void h (int j) {
a = j;
}
void g (int k) {
h(k); }
g(i+2); }
Symbol tables
Where is a? From frame of h,
go up two hops following dynamic links
Name
a
i
h
g
Kind
para
Name
k
Kind
id
fun
Name
Kind
j
para
Where is a? From symbol table of h, go up two hops
15
Heap-Based Variable
Similar to a normal variable, except that the memory cell stores the address of heap space
Value of z
y: Address of A
Value of x
Base+8 (z)
Base+4 (y) Base (x)
Object A
int x;
A* y = new A(); int z;
Assume pointer has 4 bytes
16
Big Picture: Data Memory Layout
17
#21
Procedures and Functions
CMPSC 461
Programming Language Concepts Penn State University
Fall 2020
Terminology
Function (Fortran, Ada): returns a value
Procedure (Ada), Subroutine (Fortran): returns no value C-like language: both are called function
Method (C++, Java): a function declared inside a class
2
Caller & Callee
When function A calls function B • A is the Caller
• B is the Callee
What needs to be done? Who is responsible?
3
Stack-Based Allocation
4
Stack Frame (Activation Record)
A stack frame contains: • Local variables
• Temporaries
• Arguments, return values • Bookkeeping info
When is it created? By whom? When is it destroyed? By whom?
arguments
temporaries
locals
misc bookkeeping
return addr
5
Registers
6
X86 Registers
General registers eax, ebx, ecx, edx
Indexes and pointers sp, pc, fp, …
And many more …
Registers are shared among function calls
7
Registers
How can registers be used?
When should they be used? Who saves registers?
8
Saving & Restoring Registers
Registers are shared, caller & callee need to save and restore registers
Ideally: save exactly registers used in caller & callee
In practice, registers fall into two categories: 1. Caller-saves
2. Callee-saves
9
Caller Save vs. Callee Save
Caller saves all caller-saves registers before call Callee saves all callee-saves registers before run Callee restores all callee-saves registers before exit Caller restores all caller-saves registers before resume
Separation of responsibility:
1. Caller assumes callee will not destroy callee-saves
2. Callee assumes nothing is valuable in caller-saves
10
C Calling Convention on x86 Caller-Saves: EAX, ECX, EDX
Callee-Saves: EBX, EDI (dest. idx), ESI (src. idx) Return value stored in EAX
11
What needs to be done before P calls Q? • Saving dynamic link (current fp)
• Saving caller-saves registers
• Changing stack and frame pointers
12
What needs to be done before Q returns to P? • Restoring saved registers
• Deallocating stack space of Q
13
Control Flows in HW
Program Counter (PC): a register that always contains the memory address
of an instruction
PC always points to the instruction following the one being executed
PC can be modified by jump, call, return instructions
pc
Stack
Free space
Heap
Static data
Code
14
What needs to be done before P calls Q? • Saving dynamic link (current fp)
• Saving other registers
• Changing stack and frame pointer
• Saving return address pc • Changing program counter
…
P’s code
…
Q’s code
…
15
What needs to be done before Q returns to P? • Restoring saved registers
• Deallocating stack space of Q
• Restoring program counter
…
P’s code
…
Q’s code
…
pc
16
Parameter Passing
Caller passes parameters to callee Callee passes values to caller
Formal vs. actual parameters
17
Parameter passing modes
Value or reference? Input or output?
Read or write?
How is parameter passing implemented? (covered later)
18
What needs to be done before P calls Q? • Saving dynamic link (current fp)
• Saving other registers
• Changing stack and frame pointer
…
P’s code
…
Q’s code
…
• Saving return address
• Changing program counter
• Passing parameters
pc
19
What needs to be done before Q returns to P? • Restoring saved registers
• Deallocating stack space of Q
• Restoring program counter
• Passing return values
…
P’s code
…
Q’s code
…
pc
20
One Example (implementation dependent)
• Caller processes actual parameters (evaluation, address calculation) and stores them
• Caller stores some control information (e.g., return address) and registers
• Control transferred from Caller to Callee • Callee allocates storage for locals
• Callee executes
• Callee deallocates storage for locals
• Callee stores return value (if function) • Control transferred back to Caller
• Caller deallocates storage used for control information and actual parameters
• Caller restores registers
21
A Typical Calling Sequence
Caller
saves registers
computes values of parameters, move them to stack or registers switch control to callee (call)
Callee Prologue
allocates frame (change sp)
saves previous frame pointer (dynamic link) and sets new one saves registers might be overwritten in the callee
22
A Typical Calling Sequence
Callee Epilogue
moves return value into stack or register restore callee-saves registers
restore fp and sp
transfer control to caller
Caller
moves return value where needed
restores caller-saves registers when needed
23
Function call is space and time consuming
Space: each active function requires one frame (activation record) on stack
Time: instructions are added by the compiler to set up and clean up function calls (calling sequence)
24
¥22
Procedures and Functions
CMPSC 461
Programming Language Concepts Penn State University
Fall 2020
Review of Storage Layout
High address
Stack
Free space
Heap
Static data
Code
Low address
2
Support for Functions
int fact(int n) {
int tmp = 0;
if n==1 return 1;
else
{tmp = n-1;
return n*fact(tmp);}
}
fact
fact
fact
During execution, each function call has one frame (activation record) on the stack
3
Stack Frame (Activation Record)
A stack frame contains: • Local variables
• Temporaries
• Return values
• Bookkeeping info
• Arguments (to the callee)
arguments
temporaries
locals
misc bookkeeping
return addr
4
Stack Frame (Activation Record)
int fact(int n) {
int tmp = 0;
if n==1 return 1;
else
{tmp = n-1;
return n*fact(tmp);}
}
Local tmp
Previous fp
Return Addr
Para. n = 1
Local tmp
Previous fp
Return Addr
Para. n = 2
Frame of fact(1)
Calling fact(2) in main
Frame of fact(2)
Frame of main
5
int fact(int n) {
int tmp = 0;
if n==1 return 1;
else
{tmp = n-1;
return n*fact(tmp);}
}
Para. n = 98
Local tmp
Previous fp
Return Addr
Para. n = 99
Local tmp
Previous fp
Return Addr
Para. n = 100
Frame of fact(99)
What happens for fact(100)? Consumes memory
Wastes time on calling sequence
Frame
of fact(100)
Frame of main
6
…
int fact(int n) {
int tmp = 0;
if n==1 return 1;
else
{tmp = n-1;
return n*fact(tmp);}
}
Para. n = 98
Local tmp
Previous fp
Return Addr
Para. n = 99
Local tmp
Previous fp
Return Addr
Para. n = 100
Frame of fact(99)
Can we destroy the frame of fact(100) before going into fact(99)?
Frame
of fact(100)
Frame of main
7
…
int fact(int n, int prev ) {
if ( n == 1 ) return prev ;
else return fact(n-1, prev*n);
}
Para. n = 98
Para. prev = 9900
Previous fp
Return Addr
Para. n = 99
Para. prev= 100
Previous fp
Return Addr
Para. n = 100
A Different Implementation
Can we destroy the frame of fact(100) before going into fact(99)?
Frame
of fact(100)
Frame of main
Frame of fact(99)
Para. prev = 1
8
…
int fact(int n, int prev ) {
if ( n == 1 ) return prev ;
else return fact(n-1, prev*n);
}
Previous fp
Return Addr
Para. n
A Different Implementation
Can we destroy the frame of fact(100) before going into fact(99)?
What change enables the more efficient implementation?
Reused frame
Para. prev
9
Tail-Recursive Functions
Recursion only occur at the end of a function: no computation after recursive call
int fact(int n, int prev ) {
if ( n == 1 ) return prev ;
else return fact(n-1, prev*n);
}
The current activation record before recursive call is useless!
10
Tail-Recursive Functions Tail-recursive functions are equivalent to loops
int fact(int n, int prev ) {
if ( n == 1 ) return prev ;
else return fact(n-1, prev*n);
}
int fact(int n, int prev ) {
while (true) {
if ( n == 1 ) return prev ;
else {prev = prev*n; n–;);
}}
11
Is this function tail recursive?
int search(int[] a, int k, int fst, int lst) {
if ( fst == lst ) return (a[fst]==k?fst:-1);
int m = (fst + lst)/2;
if (k <= a[m]) return search(a, k, fst, m);
else return search(a, k, m+1, lst);
}
12
Loop version
int binSearch(int[] a, int k, int fst, int lst) {
while (true) {
if ( fst == lst ) return (a[fst]==k?fst:-1);
int m = (fst + lst)/2;
if (k <= a[m]) lst = m;
else fst = m+1;
} }
13
Tail-Recursive Functions to Loops
Tail-recursive
Loops
int fact(int n, int prev ) {
if ( n ==!1 ) retur"n prev ; !while (true) {
else return fact(n-1, prev*n); if ( n == 1 ) return prev ;
}m "÷:" :÷¥÷.
In C and Java, the loop version is more efficient, though
some compilers will automatically generate more efficient code
In most functional languages (e.g., Scheme, Ocaml),
the compiler automatically generate code as efficient as the loop version
int fact(int n, int prev ) {
else {prev = prev*n; n--;);
}}
⇒
14
# 23
Procedures and Functions
CMPSC 461
Programming Language Concepts Penn State University
Fall 2020
Parameter Passing
How caller communicates with callee
Formal parameters: names in the declaration of a function
Actual parameters: variables/expressions passed to
a function
formal
foo (int x) {...}
foo (3+5);
actual
2
Parameter Modes
Call-By-Value (CBV) Call-By-Value-Return (CBVR) Call-By-Reference (CBR) Call-By-Name (CBN)
3
Call-By-Value
Calling mechanism
• Arguments are evaluated to their values
• Memory or registers allocated for arguments on AR • Argument values copied to AR
• AR destroyed when callee returns
4
a c t u a l a n d f o r m#a l p a r a m e t e r h a v e T h e ir values might be different
memory . parameter
...
s e p a r a t e
format
I
3. value from l is copied to
Memory
→
Call-By-Value
int x=1;
int foo (int a) {
→
value s
I. read
/
foo main
x --
" h'imation.
x = 2;
print (a);
→
I
Ia = 5;
stack
para a=1
YS
...
x=1
&
→
update
return x+a;
21-5=7
}
void main() {
foo(x); //result? 7 print(x); //result?
factual parameter Only updates
static area
}
formal parameters
5
Call-By-Value
int x=1;
int foo (int a) {
x = 2;
print (a);
a = 5;
return x+a;
}
void main() {
foo(x); //result?
print(x); //result?
}
...
foo
stack
main
para a=5
...
x=2
static area
6
Call-By-Value
int x=1;
int foo (int a) {
x = 2;
print (a);
a = 5;
return x+a;
}
void main() {
foo(x); //result?
print(x); //result?
}
...
stack
main
x=2
static area
7
Call-By-Value
Formal & Actual parameters have separate memory: their values may diverge
Characteristics
• Actual parameters may not directly be changed in callee (unless pass in pointers)
• Arguments can be complex expressions • Simple and intuitive (less error-prone)
8
→
copy
valve to actual to array
parameter
Call-By-Value: Performance
Primitive types: cost per parameter is small
Arrays, records, structures: copying value could be slow
C: programmer has to pass pointers/references to avoid copying cost
Java: only primitive types use call-by-value
9
Call-By-Value-Return (In-Out)
Calling mechanism
• Arguments are evaluated to their values
• Memory or registers allocated for arguments on AR
• Argument values stored in AR
• Before callee returns, AR values copied back to actual arguments
• AR destroyed when callee returns
10
1 actual and formal
still have
become identical after caller returns
...
As
...↳ led
parameter
separate
memory
Call-By-Value-Return2 they
int x=1;
int foo (int a) {
foo
stack
main
x = 2; print (a); a=5; return x+a;
}
void main() {
(
para a=1
( → 5 →
→
>
foo(x); //result? > print(x); //result? 2
static area
x=1 6 s I
}
((BV) 5
( ( BVR)
11
Call-By-Value-Return
int x=1;
int foo (int a) {
x = 2;
print (a);
a = 5;
return x+a;
}
void main() {
foo(x); //result?
print(x); //result?
}
…
foo
stack
main
para a=5
…
x=2
static area
12
Call-By-Value-Return
int x=1;
int foo (int a) {
x = 2;
print (a);
a = 5;
return x+a;
}
void main() {
foo(x); //result?
…
foo
stack
main
para a=5
…
x=5
print(x); //result?
}
static area
AR values copied back to actual arguments
13
Call-By-Value-Return
int x=1;
int foo (int a) {
x = 2;
print (a);
a = 5;
return x+a;
}
void main() {
foo(x); //result?
print(x); //result?
}
…
stack
main
x=5
static area
14
Call-By-Value-Return Characteristics
• Mostly identical to call-by-value, except an extra step of copying values back to actual parameter
15
Call-By-Reference
Calling mechanism
• Arguments are evaluated to their values
• Memory or registers allocated for arguments on AR • Argument address stored in AR
• AR destroyed when callee returns
16
1 actual and formal parameter refer
to
Call-By-Reference2 they always
same
memory
alias of X
have the same value
foo
stack
intx=1; → int foo (int a) {
→ x=2;
→ print (a);2
→ a=5; return x+a;
foo(x); //result? 10
print(x); //result? s }
…
(Bull)
when update a you updateX
main
para a ref …
* x=1 *
5
5+5=10
}
void main() {
static area
17
Call-By-Reference
int x=1;
int foo (int a) {
x = 2;
print (a);
a = 5;
return x+a;
}
void main() {
foo(x); //result?
print(x); //result?
}
…
foo
stack
main
para a
…
x=2
static area
18
Call-By-Reference
int x=1;
int foo (int a) {
x = 2;
print (a);
a = 5;
return x+a;
}
void main() {
foo(x); //result?
print(x); //result?
}
…
foo
stack
main
para a
…
x=5
static area
19
Call-By-Reference
int x=1;
int foo (int a) {
x = 2;
print (a);
a = 5;
return x+a;
}
void main() {
foo(x); //result?
print(x); //result?
}
…
stack
main
x=5
static area
20
Call-By-Reference
Formal parameter is an alias of actual parameter: their values are the same
Characteristics
• Actual parameters may directly be changed in callee
• Some language disallows complex expressions as arguments
• Programs are harder to understand (more error-prone) 21
Call-By-Reference: Performance Avoids the cost of memory copy
Indirect memory access: addressàvalue C: call-by-value is the default mode
Java: constructed types use call-by-reference
22
¥24
Procedures and Functions
CMPSC 461
Programming Language Concepts Penn State University
Fall 2020
Parameter Modes
Outputs with
CBV? CBVR? CBR?
12,5, 2 12,4 , 8
( 😀 8
b=#4
.
p:5
2
t stack static
q:
area
2
Parameter Modes
Y
7
&
‘
Outputs with CBV? CBVR? CBR? 11, 4,7
”
>
p: T4 q:i7
t stack static
area
2
Key Points
Call-by-value: Formal & Actual parameters have separate memory: their values may diverge
Call-by-value-return: CBV+ copy final values of Formal parameters to Actual parameters
Call-by-reference: Formal parameter is an alias of Actual parameter; their values are the same
3
Call-By-Name (Lazy Evaluation)
Calling mechanism
• Arguments are not evaluated to their values (like macros)
• Actual parameters replace all formal parameters in body
P a li ] void swap (int a, int b) {
intt=a; htt.- i; ti-72 a=b; → i- Ali): it -74 b = t; Ali ) : t ; A. 4) → 2
Pi
A
d.3,4
doesnt exist exit
}
swap(i, A[i]) // value swapped?
ALLI
(2,
,
Still, a useful mode in functional programming, e.g., Haskell
4
Exceptions
So far, we assumed each function will run from start to its return points
But a program also needs to handle exceptions: • Out of memory
• Divide by zero • File not found •…
5
Workaround in C language
• Return a special value (e.g., NULL) when a real value cannot be computed
• Return an explicit ”status” to indicate if an exception happens
• Rely on the caller to pass in the exception handling code (e.g., via function pointer)
Exception handling is not enforced (error-prone) Clutter up the program, especially for normal cases
6
Exception Handling
A language feature that
• Isolates error-checking code
• Direct execution to a handler when appropriate
Define exceptions?
How does exception change control flow?
7
Defining Exceptions Pre-defined exceptions
• e.g., divide by zero, out-of-bound array access
User-defined exceptions
• Typically, defined as a special kind of class
public class myException extends Exception { private int i;
public myException(int x) {
this.i = x; }
public int getI() { return i; }
}
8
Throwing Exceptions Pre-defined exceptions
• Automatically generated
User-defined exceptions
• Generated by “throw” keyword
int foo () {
…
throw new myException(10);
…
}
9
Catching Exceptions
try {
.. code that might throw exceptions ..
}
catch (Exception1 e1) {.. handling code ..}
catch (Exception2 e2) {.. handling code ..}
¥
…
finally {.. this code always executes,
e.g. cleanup routine ..}
10
The “Replacement” Semantics
int foo () {
…
try { …
↳throw new myException(10); … // this is replaced wit)
} catch (myException e) {
←.. Handling code … …
}
}
h handling code
throws level
handling
11
Exception Propagation
current frame
*A B
C
D
Stack with Frames
If no handler is found
If no handler is found
control flow continues after the handler in C
The exception mechanism needs to properly follow calling sequence to properly “exit” functions (e.g., restore registers, destroy frame)
12
Exception Implementation
When encounter an exception at pc:
• Go through handlers associated with pc in sequence
• Each handler either executes (when exception matches) or re-throws the exception for next handler
• If exception not handled, use a function-level handler: exit the function properly and re-throw the exception
• Now, pc is in the caller, and repeat the first bullet in the caller
13
# 25
Types
CMPSC 461
Programming Language Concepts Penn State University
Fall 2020
Values and Types
Values are type-less in hardware
0100 0000 0101 1000 0000 0000 0000 0000
Floating point number: 3.375 Two 16-bit integer: 16472 and 0
32-bit integer : 1,079,508,992 Four ASCII characters: @ X NUL NUL
Operations expect certain values • add take integers, fadd take floats
How to ensure operators get right type of values? 2
What Can Go Wrong?
Operation on wrong values will produce garbage • float addition on two integers
• assign a string to a float
• pass an int to a function expecting a string
All of these errors are type errors
3
Type
An abstraction of a set of values, and legal operations on these values
• int: -231 to 231-1, with operations +,-,*,/, …
Meaning of type
• Denotational: a set of value
• Constructive: set of primitive types, type constructors • Abstraction: an interface (set of operations)
4
Why Types?
• Identify/Prevent type errors
• Program organization/abstraction • Documentation
• Support optimization
5
Kinds of Types Primitive
Constructed • Products
• Unions
• Arrays
• Lists User-Defined
6
Type System
A method or specification for associating types with variables, expressions, etc.
Type equivalence: are two types equivalent Type compatibility: can int be used as float? Type safety: absence of type errors
7
Type Checking
Strongly typed: all type errors caught by type checking Weakly typed: type checking may miss type errors Static typing: type checking happens at compile time Dynamic typing: type checking happens at run time
Type inference: the type-checker infers types for variables
C is weakly typed, at compile time
union U {int a; float p} u;
float x = 1.0;
u.a = 1;
x = x + u.p;
8
Basic Types Numeric
• bytes, integers, floats Booleans
Characters
Data types available on contemporary machines
9
Integers
Length depends on language and compiler Representation: two’s complement format
n
15 -n -5
15-5?
Just add
binaries of
15 and -5
0
binary representation of n
0
000 0000 0000 0000 0000 0000 0000 1111
1
flip all bits of binary representation of n, and add 1
1
111 1111 1111 1111 1111 1111 1111 1011
0
000 0000 0000 0000 0000 0000 0000 1010
10
Floats
Single precision (float): 32 bits Double precision (double): 64bits
Due to the limited space, floats are estimations of the number they present
float z = 1.345+1.123;
printf(“%d\n”, z==2.468);
11
Floats
12
Boolean
Most languages: true or false
C: 0 means false, all other values mean true
In most implementations, a boolean value occupies more than one bit in memory (word is the basic unit of load/store)
13
Character
All languages support ASCII code (7-bit)
Most modern language support Unicode (e.g., Java char uses UTF-16, a 16-bit char set)
14
Enumeration Types
Provide names to a sequence of integral values C/C++
enum day {Monday, Tuesday, Wednesday,
Thursday, Friday, Saturday, Sunday};
enum day myDay = Friday;
Enumeration type improves readability
15
Enumeration Types
Provide names to a sequence of integral values C/C++
enum day {Monday, Tuesday, Wednesday,
Thursday, Friday, Saturday, Sunday};
enum day myDay = Friday;
Java
enum Day {Monday, Tuesday, Wednesday,
Thursday, Friday, Saturday, Sunday};
for (Day d: Day.values()) {
System.out.println(d);
}
16
# 26
Types
CMPSC 461
Programming Language Concepts Penn State University
Fall 2020
Kinds of Types Primitive
Constructed • Products
• Unions
• Arrays
• Lists User-Defined
2
Records and Structures Usually laid out contiguously
Possible holes for alignment
Compilers may re-arrange fields to minimize holes
struct id {
int i;
double d;};
struct id x;
x.i, x.d
3
struct element {
char name[2];
int atomic_number;
double atomic_weight;
_Bool metallic;}
Possible Memory Layouts
4
Records and Structures Usually laid out contiguously
struct id {
int i;
double d;};
struct id x;
x.i, x.d
A possible memory layout:
x.i x.d
i
d (first 4 bytes)
d (last4 bytes)
Each field has a separate piece of memory
5
(Free) Union Types Laid out in shared memory
union id {
int i;
double d;};
union id x;
x.i = 1;
y = 1.0 + x.d;
A possible memory layout:
x.i x.d
4 bytes of i (or, first 4 bytes of d)
unused (or, last 4 bytes of d)
All fields share the same piece of memory
6
(Free) Union Types
Not type safe:
x.d will read the binary 0x00000001,???????? as a double
A possible memory layout:
x.i x.d
union id {
int i;
double d;};
union id x;
x.i = 1;
y = 1.0 + x.d;
4 bytes of i (or, first 4 bytes of d)
unused (or, last 4 bytes of d)
How can we make it type safe?
7
Discriminated Union Types
A combination of a tag (like an enum) and a payload per possibility (like a union).
enum Kind {isInt, isFloat}
struct intorreal {
enum Kind which;
union U {int a; float p} u;
} ir;
float x = 1.0;
if (ir.which == isInt) ir.u.a = 1;
if (ir.which == isFloat) x = x + ir.u.p;
Still not type safe: type system doesn’t enforce tag check8
Sum Types
Many functional programming languages support type-safe sum types
Haskell
(possibly empty) Tag payload type
data intorreal = isInt Int | isFloat Float
— given u has type intorreal
case u of
isInt i -> i + 1
isFloat f -> f + 1.0
Type safe: type is checked under each case statement (the only way to read from a value with the sum type)
9
Sum Types are General
Haskell
Tag
data day = Monday | Tuesday | Wednesday |
Thursday | Friday | Saturday | Sunday
— Given d has type day
case d of
Monday -> …
Tuesday -> …
…
A generalization of Enumeration type
10
Sum Types and Product Types
Sum Types: alternation of types
Product Types: concatenation of types (such as?)
(records and structures are product types)
enum Color {Red, Blue}
enum Shape {Circle, Rectangle}
struct ColoredShape {enum color c;enum shape s}
7¥
Analogy:
Values Values
of color of shape Values of
coloredShape
(Red + Blue) * (Circle + Rectangle)
= (Red*Circle) + (Red*Rectangle) + (Blue*Circle) + (Blue*Rectangle)
11
Array
Lifetime and array size
• Global lifetime, Static shape • Local lifetime, Static shape
• Local lifetime, Dynamic shape
int A[10];
int f() { int A[10]; …}
int f(int n) { int A[n]; …}
12
Bounds Checking Is it done?
How is it done?
int f() {
int A[10];
…
A[e]
…
}
int f(int n) {
int A[n];
…
A[e]
… }
When is it done?
13
C Array
Static/Stack/Heap allocated
Size statically/dynamically determined Array bounds not checked (buffer overflow)
14
Java Array
Heap allocated
Size dynamically determined
Array size is part of stored data (Dope Vector) Array bounds checked
15
¥27
Types
CMPSC 461
Programming Language Concepts Penn State University
Fall 2020
C Array
Static/Stack/Heap allocated
Size statically/dynamically determined Array bounds not checked (buffer overflow)
2
Java Array
Heap allocated
Size dynamically determined
Array size is part of stored data (Dope Vector) Array bounds checked
3
Dope Vectors (one example)
4
int
10
A[0]
A[1]
…
A[9]
elem. elem. array size type size
Pointer of A
Address of A[i]?
Bound check?
Benefit: the array may change dynamically
A + 4*I
0 <= i < 10
4
Stack vs Heap Allocation Size statically determined; fixed
• Allocation in the usual way
Size dynamically determined; fixed
• May still be allocated in the stack
• Need shape information at run time (dope vector)
Size may change • Heap allocated
5
Memory Layout One-dimensional arrays
Two-dimensional arrays
7
int[][] A = new int[10][100]
1,0
0,00,I0,2. . . . . - - - - 0,01,0 O
0,1
0,2
Memory Layout Row-Pointer Layout
int[][] B = new int[10][]
B[0] = new int[100]
B[1] = new int[50]
8
Address Calculation (Row major)
4
int
10
100
A[0][0]
A[0][1]
...
A[9][99]
elem. elem. rows colu- size type mns
Address of A[i][j]? Bound check?
Pointer of A
A + 4(i*100+j)
0 <= i < 10, 0 <= j < 100
9
Address Calculation (Column major)
4
int
10
100
A[0][0]
A[1][0]
...
A[9][99]
elem. elem. rows colu- size type mns
Address of A[i][j]? Bound check?
Pointer of A
A + 4(j*10+i)
0 <= i < 10, 0 <= j < 100
10
Pointers What are they?
• A set of memory addresses and operations on them
Values: legal addresses, and a special value, nil address
7084 p 7080 x
7080
20
int x=20;
int* p = &x;
11
Pointers
Operations: assignment, dereferencing, arithmetic
int x=20;
int* p = &x;
int y=*p;
int a[3] = {1,2,3};
int x = *(a+1) //same as a[1]
Uses
• Indirect addressing (access arbitrary address) • Manage dynamic storage (heap)
12
Pointers vs. References Pointers: int *p;
References: int &p;
Value Model vs. Reference Model: A = B
• Value model: the value of B is copied to A
• Reference model: A is an alias of B (same memory)
• Java: primitive types follow value model; objects follow reference model
13
References
Restricted pointers: cannot be used as value or operated in any way
Not directly visible to the programmer
No explicit data type
double r=2.3;
double& s=r;//s is an alias of r (share memory) double *p = &r; //p has value: address of r s += 1; *p += 1;
"'p"
s,r
14
References
Uses
• Indirect addressing (access arbitrary address) • Manage dynamic storage (heap)
Alias of existing variables
15
The Nil Pointer Problem
“I call it my billion-dollar mistake. It was the invention of the null reference in 1965. ... This has led to innumerable errors, vulnerabilities, and system crashes, which have probably caused a billion dollars of pain and damage in the last forty years.”
How to avoid it?
C.A.R. Hoare, 2009
17
The Nil Pointer Problem
Solution: Use sum types Haskell
tpag
tag ppayload
payload
data Maybe Int = Nothing | Just Int
-- given p has type Maybe Int
case p of
Nothing -> .. nil value ..
Just i -> .. valid value ..
ptag ptg datatype int option = NONE | SOME of int
— given p has type int option
case p of
None => .. Nil value ..
| SOME i => .. Valid value ..
p
SML
18
‘
z Ot 3
:
Xx int. Htt
(true 3)
true it boolean value not function
¥28129
Simply Typed Lambda Calculus
CFG Terms:
Types:
example : lot 3 : int
)
Hint (xx) .
Ot Ix:int.×
b under
number
fPte, : r → r’ rtea: r
‘ ftp.pp,
.
e:r→r )
–
‘ ( T- ABS)
environment
being checked
f. any any
n
assumption
Rte, int rt ez : int (g.ADD)
natural number variable add logical add
pppt
e : : – . .nl/-ruelfalselxleiez1xx:J.e/eltez1einez
Ji:: IntlbootlJi → Tz
natural has type int
, preez:rtendril
ptcxx.ir
b input type
}type safe } type unsafe
b output
type
: int
T) e : (Name
Ptfe p)yres
pt e.nez) (
assumptions
“tn:”t
n p I
0 t 3 : Dit j
empty /
Symbol XV
table inputsto
type system
: boot
typing \s type judgementha
inference conclusions rule
t
typing N°9
P bb’
thus typed. e
output inputs to type system
rite : T’
V
,tea)
Pte, : bool Ptez : bool (g.AND)
type system: takes a program as input and returns either
the type of the program o r rejects the program
I type J BI- n : intft- NUM) pi- true: bool l’T- TRUE) rt- false: bool (T- FALSE) r x: rtx :(T- VAR)
o n
, r,x: rte : r’
P where ha, 7 X hastypeT,thevalueofx
under a
PROOF TREE
(xx: int. x)
int
2
does not satisfy isn’t a function Theres no way to interpet a boolean type a s a function type ^ t Tat’ bebool
7 1- true : bool 01-3 : int for any program there is a unique proof tree
X
0 t (true 3) :
ttt
1/1/11
P C, ez
V cannot type
check
inputs y
‘
0,×:intt- o’ 1 n f
x:
int
0.x:htt- l:
T
0. X: inttx:
f II fi pl
finna 0,x: inttlxtl):
e t
int
‘
^
1
int
AVtm
v
inputs
Ot Xx: int. 1×+1): infant tl
V
: int —
x: intent
ly ; p,
be
.
p type
1
# 28/29
1 Introduction
CMPSC 461: Programming Language Concepts Note 3: Simply Typed Lambda Calculus
Type checking is a lightweight technique for proving simple properties of programs. Type checking usually cannot determine if a program will produce the correct output or not. Instead, it is a way to test whether a program is well-formed, with the idea that a well-formed program satisfies certain desirable properties. The traditional application of type checking is to show that a program cannot get stuck. In other words, a type-correct program will never reach a state from which the programs behavior is undefined (e.g., adding two floats by an integer addition). This is a weak notion of program correctness, but nevertheless very useful in practice for catching bugs. Type systems are a powerful technique. In the past couple of decades, researchers have discovered how to use type systems for a variety of different verification tasks, such as verifying information flow security at compile time.
2 A simply typed lambda calculus
To understand how type system works, we consider a typed variant of the -calculus we have seen earlier in this class. In this language, we assign types to certain -terms according to some typing rules. A -term is considered to be well-formed if a type can be derived for it using the rules. In this lecture, we will not formalize and prove the properties of this type system. But we will see the properties informally in the end.
2.1 Syntax
The language syntax is similar to that of untyped -calculus, with some notable differences. There are two kinds of inductively-defined expressions, terms and types. The definition of this language can be written in context-free grammar as follows. As standard, we use e1 and e2 to distinguish multiple uses of the same nonterminal e in the grammar.
terms e::= n|true|false|x|e1 e2 | x:⌧ .e|e1 +e2 |e1 ^e2 types ⌧ ::= int|bool| ⌧1 ! ⌧2
One difference from the pure -calculus is that the natural numbers (n) and Boolean constants (true and false) are taken to be primitive symbols (just as in C and Scheme). This language also contains primitive operations: the plus operation (+) on natural numbers, as well as the and operation (^) on Boolean values. Another difference is that a -abstraction explicitly mentions the type of its argument (symbol ⌧ in the term x : ⌧ . e is the type of parameter x). A type ⌧ in this language is either a primitive type (int, bool), or a constructed function type. Here, ⌧1 ! ⌧2 represents a function from type ⌧1 to type ⌧2.
A value is either a number, a Boolean constant, or a closed -abstraction x : ⌧ . e. There is a set of typing rules, given below, that can be used to associate a type with a term. If a type ⌧ can be derived for a term e according to the typing rules, we write ` e : ⌧ , read as “e has type ⌧ ”. This metaexpression is called a type judgment. For example, every number has type int, thus ` 3 : int. Boolean value true has type bool, thus `true:bool. A function x :int. y :int. x has the typeint! (int!int). The ! constructor associates to the right, so int ! (int ! int) is the same as int ! int ! int. Thus we can write
` ( x :int. y :int. x) :int!int!int
Not all -terms can be typed, for instance ( x : int . x x) or (true 3). These expressions are considered nonsensical. Intuitively, such terms are ill-formed. These terms should be rejected by a type system that ensures type-safety. Next, we will define a type system that ensures that any well-typed term will never get stuck. For example, both ( x : int. x x) and (true 3) are rejected by the type system.
1/3
2.2 Typing rules
The typing rules will determine which terms are well-formed terms. They are a set of rules that allow the derivation of type judgments of the form ` e : ⌧ , read as “e has type ⌧ in context ”.
Here is a type environment, a partial map from variables to types used to determine the types of the free variables in e. The domain of , written as dom( ), is a subset of variables, whose types are defined in the type environment. The environment [⌧ /x] is obtained by rebinding x to ⌧ in (or creating the binding anew if x is not in the domain of ):
8><>: (y), if y 6= x and y 2 dom( ) [⌧/x](y) , ⌧, if y = x
undefined, otherwise
We can also view type context as a sequence of variables and their types. The notation , x : ⌧ , which is equivalent to [⌧ /x], follows this view. In this lecture, we also use , x : ⌧ to represent that ⌧ is the most recent binding of x in the context. This notation is used frequently in the literature. Also, one often sees x:⌧ 2 ,whichmeansjust (x)=⌧.
We also write ` e : ⌧ as a metaexpression to mean that the type judgment ` e : ⌧ is derivable from the typing rules. The environment ; is the empty environment, and the judgment ` e : ⌧ is short for ; ` e : ⌧. The typing rules are:
` n :int(T-NUM) `true:bool(T-TRUE) `e1 :⌧ !⌧0 `e2 :⌧ (T-APP)
`e1 e2 :⌧0
`e1 :int `e2 :int(T-ADD)
`(e1 +e2):int
`false:bool(T-FALSE) ,x : ⌧ ` x : ⌧ (T-VAR) ,x:⌧ `e:⌧0 (T-ABS)
`( x:⌧ .e):⌧ !⌧0
`e1 :bool `e2 :bool(T-AND)
`(e1 ^e2):bool
The typing rules are presented as inference rules, which inductively define relations consisting of the derivation of types. Each typing rule has three parts: the possibly empty assumption (type judgments above the bar), the conclusion (the type judge below the bar), and a name for the typing rule (in parentheses). When the assumption is empty, we simply write the conclusion without the bar.
Now let’s take a closer look at these typing rules:
• The first three rules just say that all the base values have their corresponding base types.
• (T-Var) For a variable x, x has a type ⌧ if the binding x : ⌧ appears as the most recent binding of x in the type environment.
• (T-App) An application expression e1 e2 represents the result of applying the function represented by e1 to the argument represented by e2. For this to have type ⌧0, e1 must be a function of type ⌧ ! ⌧0 for some ⌧, and its argument e2 must have type ⌧.
• (T-Abs) A -abstraction ( x : ⌧ . e) is supposed to represent a function. The type of the input should match the annotation in the term, thus the type of the function must be ⌧ ! ⌧0 for some ⌧0. The type ⌧ 0 of the result is the type of the body under the extra type assumption x : ⌧ .
• The last two rules say the operands of add (resp. and) must have type int (resp. bool), and the result has type int (resp. bool).
Note 3, CMPSC 461 2020 Fall 2/3
Every well-typed term has a proof tree consisting of applications of the typing rules to derive a type for the term. We can type-check a term by constructing this proof tree. For example, consider the term x : int . (x + 1), which takes a natural number x, and returns x + 1, another natural number. So we expect ` ( x : int . (x + 1)) : int ! int. Here is a proof of that fact:
x :int` x :int x :int` 1 :int x : int ` (x + 1) : int
` ( x : int . (x + 1)) : int ! int
Type checking starts from the bottom judgement. It has an empty typing environment (there is no global variable), and the goal of type checking is to derive the type on the right (initially unknown). To derive the type, at each step, we use the typing rule that matches the term being typed. Here, we first apply rule T-Abs to check function body x + 1 under environment x : int. The rule application gives us the second to last judgement in the proof tree, whose type after ‘:’ is still unknown at at this moment. Next, we apply rule T-Add to type-check x and 1. Here, we can fill in the rightmost types, since axioms T-Var and T-Num apply, and we no longer need the assumptions to derive the correct type. With the complete assumption, we can finish the type of x + 1 using rule T-Add, and similarly, fill in the type in the last judgement to complete the proof tree. Deriving types using a proof tree typically involves this “bottom-up” and then “top-down” flavor. You can use the next proof trees as an excise.
Consider another term ( x : int . y : bool . x) 2 true, which evaluates to 2. Since ` 2 : int,we expect ` (( x :int. y :bool. x) 2true) :intas well. Here is a proof of that fact:
x :int,y :bool` x :int
x :int` ( y :bool. x) :bool!int
` ( x :int. y :bool. x) :int!bool!int ` 2 :int ` (( x :int. y :bool. x) 2) :bool!int
` ((( x :int. y :bool. x) 2)true) :int
`true:bool
An automated type checker can effectively construct proof trees like this in order to test whether a program is type-correct. A term type-checks if there is a proof tree for that term; otherwise, the term is rejected by the type system. Most type-systems are decidable, and in fact, efficient. The complexity of most type systems ensuring type safety (including the one in this lecture) is linear to program size.
Note that types, if they exist, are unique. That is, if ` e : ⌧ and ` e : ⌧0, then ⌧ = ⌧0. This can be proved easily by structural induction on e, using the fact that there is exactly one typing rule that applies in each case, depending on the form of e.
3 Type Safety
The type system enforces a very important property. Informally, we can state the property as follows: if a term e has type ⌧ in context , and e is executed in a memory state that is consistent with (e.g., x has value 1 if (x) = int), the final value of e must be a value of type ⌧ . One implication is that since the type system checks that all operands to an operator (e.g., + and ^ in our language) have the proper type, an operator can never get a value with wrong type at run time. Hence, any well-formed program is type safe. However, the formal definition and proof of this property are beyond the scope of this lecture.
Note 3, CMPSC 461 2020 Fall 3/3
Type Inference
> assign type of x to a
IX.e f
> int
.
XX- i¥(Xtl) > add constraint 2 — int
(lax :#t.ly#xooX.x) 2) true
‘l lt ‘ l
int
bool
Xt1=2⇒X.- l
y- 10=2 ⇒ 9=12 → constraint solution t
constraints wt unknown
variables ( K y )
1) declare type constraints from program 2) solve type constraints
Type constraints
(:: = C,;Cz/ti:Tz
t.in/r::=alint1boollr
,→ ra infant;boo at
a int
← constraints
Pte: Ty,
n 0,X:2tX : a ) 0 0,x:2tl : int ) 0
0,X:Dtxt): int 71- XX- (XH):
) ← int;int’.int a→int aint;infant
↳e
inputs v
outputs
su
v
r
uve x
solution: X → int
type: int→int
‘
T
0,x – a. y :B tx :
x
)
0
An 0,x:atXyx :p -s a
‘
s
) 0
0ty.x.ae#t2int)00tfXX.Xy.x)2 #
7 true :B-a)x=int +:b
X) 2) true Jesuit;B e, ez
(:a
01
.-
bool
–
XX Ay
.-
solution: a → int
p→ boot type : int
< → (a → int) : int → P
T, and Tz are identical int:int,-=L,int→a = int→a
partial solution
- Unify ( constraints) > Cas
traints)
J]o(
Unit
y(
cons
( simplify)
Unify(asint;int.- 2)
1)
= (x ← int]o UnifyHrt=L (a ← Int] : (a ← int]0 Unify(int’ int)
P
:(a ← in1-ToUnify’ 0)
replacedbyint La←int)l) .o
:[a ← int)
ll T, Tz T, Tz
T Unify
(x→int)– int→B) : Unify(x — int;a → int= B)
a)
— Cx← int)o Unify(a → int:P Kent))
: Kent]o Unify(intent=P)
:[a← int]offs- lint→ int))o Unify107 :[a ←int)off←dnt→ int))
# 291*30
1 Introduction
CMPSC 461: Programming Language Concepts Note 4: Type Inference and Unification
Type checking is a lightweight technique for proving type safety. However, writing down one type an- notation for each variable can be tedious for programmers. Type inference is the process of determining the appropriate types for expressions based on how they are used. For example, OCaml knows that in the expression (x + 3), x must be a natural number, because it is an operand of the plus function. A type inference algorithm automatically infers types for variables whose type annotations are missing. For (x + 3), the OCaml type inference engine would assign x with type int.
2 Typing Constraints
To understand how type inference works, we first define a simple constraint language that can be used to model a type system. In this constraint language, constraints are equalities on types. The constraint language syntax (written in context-free grammar) is as follows:
C ::= C1;C2 | ⌧1 = ⌧2
⌧ ::= ↵ |int|bool| ⌧1 ! ⌧2
In this constraint language, a constraint element may be a type variable ↵ whose value is to be solved for, or a type in the simply-typed -calculus we defined in Note 3. Constraints C is in general a set of type equality constraints, separated by semicolons (;).
Validity and Satisfiability A constraint ⌧1 = ⌧2 with no type variable is valid if types ⌧1 and ⌧2 are equivalent, defined by the follow induction rules:
⌧ 1 = ⌧ 10 ⌧ 2 = ⌧ 20 int=int bool=bool ⌧1 !⌧2 =⌧10 !⌧20
Validity as defined above works for constraints without variables. When constraints mention variables, they are satisfiable if there exists a valuation of all variables such that the constraints after value substitution are all valid. For example, a constraint ↵ = int is satisfiable, since replacing ↵ with int gives a valid constraint int = int. A constraint int = ↵ ! ↵ is unsatisfiable, since no value substitution makes it valid. Constraints ↵ = int; ↵ = bool are unsatisfiable, since only one of these two constraints can be satisfied, but not both. We will look at an efficient algorithm that solves constraints later in this lecture.
3 Type Checking as Constraint Solving
A type system with type inference maps naturally into constraint solving, since typing rules are just equality constraints on types. We first extend the simply-typed lambda calculus defined in Note 3 to include variable declaration without type annotation:
terms e ::= · · · | x . e
Here the dots represent all -terms defined in Note 3. The added term x . e defines a function without providing a type for the parameter x.
1/4
To formally define type inference, we introduce a new typing relation ` e : ⌧ | C, read as “e has type ⌧ in type environment if constraints C are satisfiable. In other words, if ` e : ⌧ | C, then expression e has type ⌧ provided that every constraint in C is satisfied.
Here are the new typing rules:
` n :int| ; (T-NUM) `true:bool| ; (T-TRUE) `false:bool| ; (T-FALSE)
`e1 :⌧ !⌧1 |C1 `e2 :⌧2 |C2 (T-APP) `e1 e2 :⌧1 |C1;C2;⌧ =⌧2
,x:⌧ `x:⌧ |;(T-VAR) ,x:⌧ `e:⌧0 |C (T-ABS1)
`( x:⌧ .e):⌧ !⌧0 |C
`e1 :⌧1 |C1
,x:↵x `e:⌧0 |C ↵x afreshvariable (T-ABS2) `( x.e):↵x !⌧0 |C
`e2 :⌧2 |C2 (T-ADD) `(e1 +e2):int|C1;C2;⌧1 =int;⌧2 =int
`e1 :⌧1 |C1 `e2 :⌧2 |C2
`(e1 ^e2):bool|C1;C2;⌧1 =bool;⌧2 =bool
Now let’s take a closer look at these typing rules:
(T-AND)
• The first four rules are just the same as in Note 3, except that each of them also produces an empty set of constraints.
• (T-App) An application expression e1 e2 represents the result of applying the function represented by e1 to the argument represented by e2. e1 must be a function of type ⌧ ! ⌧1 for some ⌧, ⌧1. Hence, the result type is ⌧1. This rule accumulates constraints C1, C2 and produces a new constraint ⌧ = ⌧2, assuming that argument e2 has type ⌧2. The reason is that formal parameter’s type must match real parameter’s type.
• (T-Abs1 and T-Abs2) A -abstraction is supposed to represent a function. The type of the input should match the annotation in the definition (T-Abs1). When no annotation is provided, a fresh type variable ↵x is generated for the input. The type ⌧ 0 of the result is the type of the body. This rule does not generate new constraints; it just carries the constraints generated by the function body.
• The last two rules say the types of operands of add (resp. and) must be equivalent to type int (resp. bool), and the result has type int (resp. bool).
Using these typing rules, the proof tree of every term generates a set of constraints to be solved. For example, consider the term x . (x + 1), which takes x, whose type is to be inferred, and returns x + 1. Here is the proof tree of this term:
x:↵x `x:↵x |; x:↵x `1:int|; x:↵x `(x+1):int|↵x =int;int=int `( x.(x+1)):↵x !int|↵x =int;int=int
It is easy to check that the top-level constraints (↵x = int; int = int) are satisfiable, with solution ↵x int. Hence, the term x . (x + 1) has type int ! int, and the inferred type of x is int.
Note 4, CMPSC 461 2020 Fall 2/4
Consider another term ( x . y . x) 2 true, which evaluates to 2. Here is the proof tree of this term:
x:↵x,y:↵y `x:↵x |; x:↵x `( y.x):↵y !↵x |;
`( x. y.x):↵x !↵y !↵x |; `2:int|; `(( x. y.x)2):↵y !↵x |↵x =int `true:bool|;
` ((( x . y . x) 2)true) : ↵x | ↵x =int;↵y =bool
It is easy to check that the top-level constraints ↵x = int; ↵y = bool are satisfiable. Hence, the term
((( x . y . x) 2) true) has type int, and the inferred type of x is int; the inferred type of y is bool. 4 Unification
Constraints defined in this lecture can be solved by a very general mechanism called unification. Briefly, unification is the process of finding a substitution that makes two given terms equal. A type substitution is a finite map from type variables to types. For example, we write [↵ int, (int ! int)] for the substitution that maps type variable ↵ to int, and type variable to int ! int.
The essential task of unification is to find a substitution S that unifies two given terms (that is, makes them equal). Let us write s[S] for the result of applying the substitution S to the term s. For example, the substitution [↵ int, (int ! int)] unifies terms (↵ ! ( ! int)) and (int ! ):
(↵ ! (↵ !int) =int! )[↵ int, (int!int)] = (int! (int!int) =int! (int!int))
In general, the same variable may occur in both the domain and range of a substitution. In that case, the intention is that the substitutions are performed simultaneously. For example the substitution [↵ int, (int ! ↵)] maps to int ! ↵, instead of int ! int. Unlike the substitution in -reduction, we do not need to worry about avoiding variable capture, since there are no constructs in the language that bind type variables.
Given two substitutions S1 and S2, we write S1 S2 for their composition. For example, ([↵ int] [ bool]) = [↵ int, bool]. Note that composition performs substation in sequence. For example, ([↵ ] [ ] [ bool]) = [↵ bool, bool, bool].
To solve a set of constraints C, we need to find a substitution that unifies C. More specifically, suppose that ` e : ⌧ | C; a solution for constraints C is a substitution S such that all equalities in C[S] are valid. If there is no substitution that satisfies C, then we know that e is not typeable.
4.1 Unification Algorithm
The unification algorithm is known as Robinson’s algorithm. The unification algorithm is given in terms of a function Unify that takes a set of constraints C and produces a substitution, if it exists. If C is a set of constraints, then C[↵ t] denotes the result of applying the substitution ↵ t to all the constraints in C. The algorithm is defined as follows:
• Unify(;) , [], the empty substitution
• Unify((⌧1 = ⌧2); E) , Unify(E) if ⌧1 = ⌧2
• Unify((↵ = ⌧ ); E ) , [↵ ⌧ ] (Unify(E [↵ ⌧ ])) if ↵ is not a free variable in ⌧
Note 4, CMPSC 461 2020 Fall 3/4
•Unify((⌧ =↵);E),[↵ ⌧] (Unify(E[↵ ⌧]))if↵isnotafreevariablein⌧ • Unify((⌧1 ! ⌧2 = ⌧10 ! ⌧20);E) ,Unify(⌧1 = ⌧10;⌧2 = ⌧20;E)
• Unify(⌧1 = ⌧2) , fail if none of the previous rules apply
The check that ↵ is not a free variable of the other type ensures that the algorithm does not produce a cyclic substitution (e.g., [↵ (↵ ! ↵)], which does not make sense since our language does not support recursive types.
Forexample,Unify(↵x =int;↵y =int)=[↵x int] Unify(↵y =int)=[↵x int] [↵y int] = [↵x int, ↵y int]. On the other hand, Unify(↵x = int; ↵x = bool) = [↵x int] Unify(int = bool) = fail.
The unification algorithm always terminates. Moreover, it produces a solution if and only if a solution exists. The solution found is the most general solution, in the sense that if S = Unify(C) and S0 is a solution to C, then there is some S00 such that S0 = S00 S.
Note 4, CMPSC 461 2020 Fall 4/4
# 31
Types
CMPSC 461
Programming Language Concepts Penn State University
Fall 2020
Midterm 2
Midterm 2 is scheduled to Nov. 13th 6:15 to 7:30
Conflict exam? Send request to TA Zeyu (dxd437@psu.edu) and me (dbz5017@psu.edu) by Nov. 11th
HW6 due next Wednesday midnight, no late submission
Midterm 2
Midterm 2 covers materials from HW3 to HW6, which is the same as lecture 17 to lecture 31 (this lecture)
Next Wednesday: review section for Midterm 2 Practice questions will be posted soon
Changes to the ZOOM exam
Midterm 2 will be like Midterm 1, with a few changes
• 50 minutes examà60 minutes exam
• 10pt bonus questionà10pt extra question (110 points in total, graded as 100 points exam)
• Students have 15 minutes to copy questions (without working on them!) before the exam starts
• Assign students to different ZOOM rooms, rather than using breakout rooms
A detailed instruction will be posted soon.
Function performs an operation for different values
of the same type
class IntArrayList {
…
boolean add (int i);
int remove(int idx);
}
class DoubleArrayList{
…
boolean add (double d);
double remove(int idx);
}
(Polymorphic) Functions takes values of different types
5
Java used to do this (before Java 1.5):
class ArrayList {
…
boolean add (Object o);
Object remove (int idx);
}
6
Does it work?
List langList = new ArrayList();
langList.add(new Language(“Java”));
Language java = (Language) langList.remove(0); langList.add(new System(“Linux”)); // Whoops! Language c = (Language) langList.remove(0);// Error
Compiler doesn’t know langList should only contain values of type Language
7
Generics
A facility added to Java 5.0, which allows “a type or method to operate on objects of various types while providing compile-time type safety.”
Not essentially an object-oriented idea Not originated in Java
8
Polymorphism: function with multiple forms Parametric polymorphism (fun. with a set of types)
• Explicit para. polymorphism (Generics, aka.
Templates in C++) Java
C++
template
T GetMax (T a, T b) {
T result;
result = (a>b)? a : b;
return (result);
}
class ArrayList
…
boolean add (T o);
T remove (int idx);
}
9
Polymorphism
Parametric polymorphism (fun. with a set of types) • Explicit para. polymorphism
• Implicit para. polymorphism (Lisp, Haskell, ML)
(define (min a b) (if (< a b) a b))
No mention of type
In Haskell
min a b = if a < b then a else b
10
Type Parameterization
ArrayList
ArrayList
• an application of a type-level function (ArrayList) • to the type parameter Integer
• gives an array list of integers
The idea of generics is to allow user-defined parameterized types
11
User-Defined Parameterized Type
Java
Formal type parameter
C++
Formal type parameter
template
class ArrayList {
…
bool add (T o);
T remove (int idx);
}
class ArrayList
…
boolean add (T o);
T remove (int idx);
}
Intuitively, we defined a type-level function
12
Generic Functions
Java
Formal type C++ Formal type parameter parameter
template
T GetMax (T a, T b) {
T result;
result = (a>b)? a : b;
return (result);
}
T max
T result;
result = (a.compareTo(b)
>=0)? a : b;
return (result);
}
Intuitively, a function that takes a type, and parameters
13
Generics Enforce Type-Safety
List
Language java = (Language) langList.remove(0); langList.add(new System(“Linux”)); // Type error! Language c = (Language) langList.remove(0);
Compiler knows langList only contain values of type Language (no run-time errors)
14
int GetMax (int a,int b) {
int result;
…
}
char GetMax (char a, char b) {
char result;
… }
int i = GetMax(1,2);
char c = GetMax(‘c’,’e’);
template
T GetMax (T a, T b) {
T result;
result = (a>b)? a : b;
return (result);
}
int i = GetMax(1,2);
char c = GetMax(‘c’,’e’);
Implementing Generics
Static mechanism (Ada, C++)
• Compiler generates separate copy for every unique instance • Each copy is type-checked separately
Defined by programmer
Generated/Checked by the compiler
15
Implementing Generics
Dynamic mechanism (aka. type erasure in Java) • Type system checks type safety
• All instances share one code, without generics!
16
Type Erasure in Java
class ArrayList
…
boolean add (T o) {…};
T remove (int idx) {…};
}
ArrayList
langLst.add(new Language(“Java”));
Language java=langLst.remove(0);
Defined by programmer
class ArrayList {
…
boolean add (Object o) {…};
Object remove (int idx) {…};
}
ArrayList langLst = …;
langLst.add(new Language(“Java”));
Language java=
(Language)langLst.remove(0); 17
Executed by JVM
Type system ensures all dynamic casts are safe
Pros and Cons of Type Erasure
Pros
• Backward compatibility
• One code copy shared by all instances
Cons
• Cannot use type parameter at run time
class Example
void method(Object item) {
if (item instanceof T) { … } // cannot compare to T
T anotherItem = new T(); // cannot use constructor
T[] itemArray = new T[10]; // cannot create T[10]
}}
18
# 32
Program Verification
CMPSC 461
Programming Language Concepts Penn State University
Fall 2020
Correctness
A program implements the desired property for all possible inputs
E.g.,
Functional correctness: a program calculates n!
Type safety: no typing errors at run time
Memory safety: no buffer overflow in a program
Security: no information leakage, no violation of integrity, …
We will focus on functional correctness in this course 2
Functional Correctness
r:=0, i:=0;
while (i
And: ( ∧ )
Or: ( ∨ )
not: ¬(
E.g., 1 > 0 ∧ 0×5 = 0 (true)
true iff both ( and ) are true false iff both ( and ) are false
true iff ( is false 0×5=0∨ 5−1=5(true)
5
Implication
A proposition ! implies another proposition ” (written as ! ⇒ “) iff (¬! ∨ “)
(intuitively, ” is true whenever ! is true)
E.g., 7 > 5 ⇒ 5 > 3 (true)
1 > 2 ⇒ 2 > 3 (true, due to the false condition) 2 > 1 ⇒ 2 > 3 (false)
Derivation:
Modus Ponens: given ! ⇒ ” and !, we have ”
6
(First-Order) Predicate Logic A formula can mention bounded variables
For all (Universally quantified): ∀). + E.g., ∀). ) = 5 (false)
Exists (Existentially quantified):∃). + E.g., ∃). ) = 5 (true)
(We assume the domain of integers for simplicity)
7
(First-Order) Predicate Logic
E.g., ∀). () > 5 ⇒ ) > 3) (true)
∀). () > 1 ⇒ ) > 3) (false, consider ) = 2) ∃). () > 1 ⇒ ) > 3) (true, consider ) = 4)
Truth value of a formula can be proved based on derivation rules from predicate logic and number theory
We only use simple formulas in this course; in general, some tools (e.g., an SMT solver) can automatically tell the truth value of many formulas
8
Program and Logics
We need to formally specify 1) The desired property
int Max(int a, int b) {
int m;
if (a>b) m:=a;
else m:=b;
return m; }
Assignment
Precondition (true) Postcondition (5 = max(9, ;))
function symbol in logics
9
Program and Logics
We need to formally specify 1) The desired property
int factorial(int n) {
int r:=1, i=n;
while (i>0) {
r := r*i;
i –; }
return r; }
Precondition (n ≥ 0)
Postcondition (! = %!) 10
Program and Logics Is a program correct?
We need to formally specify 1) The desired property
2) The behavior of program
11
Informally …
x := 5; y := 1;
After second assignment, we know {) = 5, B = 1}
Why?
1. Initially, we assume nothing
2. After the first assignment, we know {- = 5}
3. After the second assignment, we know {. = 1} is true as well
12
Formalizing the Reasoning
x := 5; y := 1;
1. Initially, we assume nothing
2. After the first assignment, we know {- = 5}
3. After the second assignment, we know {. = 1} is true as well
The reasoning:
truex:=5 {-=5} y:=1 {-=5∧.=1}
pre.
of T T past
Each predicate specifies the assertion that must
a
ofof of
pre
post
be true before/after a statement
13
Hoare Triple
Assertion: a predicate that describes the state of a program at a point in its execution
Hoare Triple: C D{F}
Precondition C: an assertion before execution
Postcondition F: an assertion after execution Program D: program being analyzed
A triple is valid If we start from a state satisfying C, and execute D, then final state must satisfy F
14
Examples true x:=5{) = 5}
valid
B = 6 x:=5{) = 5, B = 6}
valid
true x:=5{) < 10}
) = B x:=x+3{) = B + 3} )=9if(x<0)thenx:=-x {)=|9|}
All of these triples are valid
{true ) x : :S EX:O } invalid
valid
valid
valid
x: - S {Yeo} invalid {true) x:> Xtl {x’03 invalid
{true}
15
Program Correctness
A program is correct (w.r.t. pre/postcondition) if the corresponding Hoare triple is valid
{true}
int m;
if (a>b) m:=a; else m:=b; {m=max(a,b)}
{! >= 0}
int r:=1, i:=n; while (i>0) {
r := r*i;
i –; }
{% = !!}
How can we tell the validity of a Hoare triple?
16
CMPSC 461: Programming Language Concepts Midterm 2 Practice Questions
Use these problems in addition to Assignment 4, 5 and 6 to prepare for the 2nd midterm.
Problem 1 For each of the following Scheme programs, circle all x’s that refer to (i.e., are in the scope of) the definition of x at the FIRST LINE. You don’t need to circle anything if no such x exists.
(let ((x 1)) (let ((x 2))
(+ x y)))
(let ((x 3))
(let ((x 4) (y x))
(+ x y)))
(let ((x 5) (y 6)) (let* ((y x) (x y))
(+ x y)))
Problem 2 What is the difference between static, stack, and heap allocation and how do they affect the lifetime of a variable?
Problem 3 What is a tail-recursive function? What makes it an interesting concept?
Problem 4 What outputs are produced by the following pseudo code if the language uses static scoping?
What are the outputs if the language uses dynamic scoping?
a=2; b=3; int f1(a) {
return a + b; }
int f2(b) {
return 2 * f1(b); }
print f1(a) * f2(a);
Problem 5 Consider the following pseudo code.
int a=0;
void A(int m) {
print a;
m = a; }
void main () { int a=1, b=2; A(b);
print b;
}
1. What are the outputs if the language uses static scoping and all parameters are passed by value?
2. Whataretheoutputsifthelanguageusesdynamicscopingandallparametersarepassedbyreference?
1/2
Problem 6 Use the following typing rules to write down the proof tree for the term (( x : bool . ( y : bool . x ^ y)) true).
Typing rules: ` true : bool (T-TRUE) ` false : bool (T-FALSE) , x : ⌧ ` x : ⌧ (T-VAR)
`e1 :⌧ !⌧0 `e2 :⌧ (T-APP) ,x:⌧ `e:⌧0 `e1 e2 :⌧0 `( x:⌧ .e):⌧ !⌧0
` e1 :bool ` e2 :bool (T-AND) ` (e1 ^ e2) : bool
(T-ABS)
Problem 7 Follow the constraint unification rules in Lecture Note 4 to solve the following constraint (int ! ↵ = ! ; = int)
Midterm 2 Practice Questions, Cmpsc 461 2020 Fall 2/2
Review –
T B
i
build a proof tree based CFG
‘
2 fill in missing types J -7J
boot boot→ boot
0 , X : booby: bool tx : bool 0, X : bool, y : bool TY: 0, X : boot, y : bool t Xny: bool c- T
bool
Ot true: bool
bool
8.X : bool 1- (Xy: Xny
‘
): bool→boot ← t Ot (xx : bool. lay: bool- Xny): boot→ (boot→ bool)
.
Ot (xx: boot. (Xy: bool. xny))true: bool→ bool
e,
Cz
line 4 2
1
T Tz Ti Tz’ E
Unify(int→ a =P→ B; P– int)
E
=
= Unify(int=P,a=p;P int→
)
= [BeinHo Unify(x =P; B: int(B← int))
= (p←intoUnifyCa–int;int:int)
: (B←int]o laeint)o Unity(intsintla←intl)
— Cp←int)ok←Int]o Unify(int= int) :[paint)o [a ← int)0 Unify10)
:[Bent. ]o la←int]
:i:::÷÷:±÷÷
::
a
b
id
f,
id
fun
f2
fun
stat
ic scoping : global call Ald)
return 2+3 call fdl2)
f,
FL
return dxfl (2) –
(21-3)
print 540=50
Dyn
amic Scoping call f-ICH
:
ba’ glo
,
FL ← b d H2) so big 2
indb
eturn 21-3 :S
4f
r
I
2x HH) fl C- no b
272.4
pr
int s > (2×4)=40
# 33
Program Verification
CMPSC 461
Programming Language Concepts Penn State University
Fall 2020
Overview of Program Verification
y precondition {true}
{x+y>5}
post condition known as
Hoare triple
Write down specification
Check if the Triple is valid
x := 5; y := 1;
x+y>5 after execution from any initial state?
Program is verified if the triple is valid
x := 5; y := 1;
A triple ! ” # is valid If we start from a state satisfying !, and execute “, then final state must satisfy #
2
Goal: check if {!}”{#} is valid
most useful ✓
true x:=5{% = 5}
true x:=5{% = 5 ∨ % = 2}
true x:=5{% > 0} true x:=5{% < 10}
Observation: some postconditions are more useful %=5⇒%=5∨%=2
%=5⇒%>0
% = 5 ⇒ % < 10
Need to compute the strongest postcondition
3
✓
✓
✓
Compute strongest postcondition and check the truth value
/0 % ≔ 5; 3 ≔ 1, 5678 ⇒ % + 3 > 5?
Write down specification
{true} {x+y>5}
Check if the Triple is valid
Program is verified if the triple is valid
x := 5; y := 1;
x := 5; y := 1;
x+y>5 after execution from any initial state?
4
Goal: check if {!}”{#} is valid Method 1: check sp “, ! ⇒ #
Strongest Postcondition
sp(/, 😉 is the strongest postcondition of /, w.r.t. ; Property: ; / = is valid iff sp(s, 😉 ⇒ =
Hence,validityofatriple ;/{=}isequivalentto the truth value of proposition sp(s, 😉 ⇒ =
5
Assignment Rule (Floyd’s Axiom)
value of x before assignment
sp(x:=e,”)=(∃%. ‘= )’←% ∧(“‘←%))
p
t
new value ofX Substitutexwithvine 5fx←v] trucker]
Examples:
sp(x:=5, true)= (∃C. % = 5 ∧ true)= % = 5
sp(x:=x+3,%=3)= ∃C. %=C+3 ∧ C=3 = %=3+3
Xsylxev ) my
Xt3lx←v ]
: Vt3
6
Composition Rule
←¥(Si ) sp ,p
S2
– sp(Sa, SpCyp))
sp(s1;s2, “)=sp(s2, sp(s1, “))
x := 5; y := 1;
sp(x:=5;y:=1, true) = sp(y:=1, sp(x:=5, true)) = sp(y:=1, % = 5) (previous slide)
= ∃C. 3=1 ∧ %=5 =3=1∧%=5
7
Composition Rule
sp(s1;s2, “)=sp(s2, sp(s1, “))
x := 5; x := 2;
c-
sp(x:=5;x:=2, true) = sp(x:=2, sp(x:=5, true)) = sp(x:=2, % = 5)
= ∃C. %= 2%←C ∧(%=5%←C) = ∃C. %=2 ∧ C=5
= %=2
The existential quantifier complicates the formula … 8
{true}
x := 5;
y := 1; {(!=1)∧(#=5)}
sp(x:=5;y:=1, true) = sp(y:=1, sp(x:=5, true)) = sp(y:=1, % = 5)
= ∃C. 3=1 ∧ %=5
19=1) n CHS) ?
The existential quantifier complicates the formula …
Goal: check if {;}/{=} is valid Method 1: check sp /, ; ⇒ =
=3=1∧%=5
⇒
9
{true}
x := 5;
y := 1; {(!=1)∧(#=5)}
sp(x:=5;y:=1, true) = sp(y:=1, sp(x:=5, true)) = sp(y:=-1, % = 5)
-= 3=1 ∧ %=5 The reasoning:
truex:=5 {(=5} y:=1 {(=5∧,=1} sp computation (forward):
Backward?
Goal: check if {;}/{=} is valid Method 1: check sp /, ; ⇒ = Method 2: check ; ⇒wp(/, =)
10
Goal: check if {!}”{#} is valid Method 1: check sp “, ! ⇒ # Method 2: check ! ⇒wp(“, #)
Weakest
– Precondition
wp(/, =) is the weakest precondition of /, w.r.t. = Property: ; / = is valid iff ; ⇒wp(s, =)
Hence,validityofatriple ;/{=}isequivalentto the truth value of proposition ; ⇒wp(s, =)
11
Assignment Rule (Hoare’s Axiom)
wp(x:=e, #)=#[% ← ‘]
x.s Ks ‘( )
Examples:
wp(x:=5, % = 5)= (5 = 5)= true wp(x:=x+3,%=3+3)= %+3=3+3
b
lx← x– y- 3(
= %=3
..
This rule is simpler than Floyd’s axiom, hence
“s.
weakest precondition is used in most systems
.
12
( pf’s! S2
{Q) V
wmk, QD
Composition Rule
wp(s1;s2, #)=wp(s1, wp(s2, #))
{true}
x := 5;
y := 1; {(!=1)∧(#=5)}
Q
wp(x:=5;y:=1, %=5 ∧ 3=1 )
=wp(x:=5,wp(y:=1, %=5 ∧ 3=1 ))
=wp(x:=5, %=5 ∧ 1=1 )
Q
=5=5∧1=1 = true
Q
←I] QCx← s)
Q
13
Composition Rule
wp(s1;s2, #)=wp(s1, wp(s2, #))
{true} x := 5; x := 2; {#=2}
wp(x:=5;x:=2, % = 2) Q
= wp(x:=5, wp(x:=2, % = 2)) = wp(x:=5, 2 = 2)Q
= (2 = 2)
= true
Q
Q(x← 27 Qlxes]
14
[⇒ SiIQ)
Sz IQ)
wp(if , s1elses2,-)=
(, ⇒ wp(s1, #)∧ ¬, ⇒ wp(s2,#))
wp(P, 3 ≥ 0)
= % > 0 ⇒ wp(y ≔ x, 3 ≥ 0)∧
¬(% > 0) ⇒wp(y≔ −x,3 ≥ 0))
= (% > 0 ⇒ % ≥ 0) ∧ (% ≤ 0 ⇒ −% ≥ 0) = true
Branch Rule
V. ⇒
{true}
if (x>0)
y := x; else
y := -x; {!≥0}
15
Computing WP
wp(x:=e, #)=#[% ← ‘]
wp(s!; s”, #)=wp(s!,wp(s”,#))
wp(if, s!elses”,-)=
(, ⇒ wp(s!, #)∧ ¬, ⇒ wp(s”,#))
wp(nop, #)=#
16
# 34
Program Verification
CMPSC 461
Programming Language Concepts Penn State University
Fall 2020
Goal: check if {“}${%} is valid Method 1: check sp $, ” ⇒ % Method 2: check ” ⇒wp($, %)
wp(x:=e, “)=”[$ ← &]
wp(s!; s”, “)=wp(s!,wp(s”,”)) wp(if+ s!elses”,”)=
(+ ⇒ wp(s!, “)∧ ¬+ ⇒ wp(s”,”)) wp(nop, “)=”
A dummy operation that has no effects
2
{x>0}
x := x+1;
y := x * (x+5);
{y>0}
Goal: show the Hoare triple is valid 1) Compute wp(prog, postcondition)
wp(x:=x+1; y:=x*(x+5), ! > 0)
= wp (x:=x+1, wp (y:=x∗(x+5), ! > 0))
= wp (x:=x+1, ‘ ∗ (‘ + 5) > 0))
= ‘+1∗’+6>0
2) Show the precondition implies wp
(x>0)⇒( ‘+1 ∗ ‘+6 >0)
4
wp(x:=e, %)=%[) ← +]
wp(s!; s”, %)=wp(s!,wp(s”,%))
wp(if0 s!elses”,”)=
(0 ⇒ wp(s!, %)∧ ¬0 ⇒ wp(s”,%))
wp(nop, %)=%
{x=a}
if (a<0) x := -a;
{x=|a|}
Goal: show the Hoare triple is valid 1) Compute wp(prog, postcondition)
wp(if (a<0) x:=-a ,' = |0|)
= (0 < 0 ⇒wp (x:= −a, ' = |0|)) ∧
(0 ≥ 0 ⇒wp (nop, ' = |0|))
= (0 < 0 ⇒ −0 = |0|) ∧ (0 ≥ 0 ⇒ ' = |0|) = (0 ≥ 0 ⇒ ' = |0|)
2) Show the precondition implies wp
(x = 0) ⇒ (0 ≥ 0 ⇒ ' = |0|)
7
wp(x:=e, %)=%[) ← +]
wp(s!; s", %)=wp(s!,wp(s",%))
wp(if0 s!elses",")=
(0 ⇒ wp(s!, %)∧ ¬0 ⇒ wp(s",%))
wp(nop, %)=%
Dafny: A Verification Tool
https://rise4fun.com/Dafny
8
{true}
if (x<0) r:=-x;
else r:=x;
{r=|x|}
Example
Observation: program verification is systematic and automatic if there is no loop! 9
{! ≥ 0}
r:=0, i:=0; while (i
Hence,567∧8⇒wp ;,567 and567∧¬8⇒> (Proof is beyond the scope of this lecture)
Loop invariant (567) is a proposition that is:
1) Initially true (? ⇒ 567)
2) True after each iteration (567 ∧ 8 ⇒ wp ;, 567 ) 3) Termination of loop implies the postcondition
(567 ∧ ¬8 ⇒ >)
12
Loop Invariant and Induction
Loop invariant (567) is a proposition that is:
1) Initially true (? ⇒ 567)
2) True after each iteration (567 ∧ 8 ⇒ wp ;, 567 ) 3) Termination of loop implies the postcondition
(567 ∧ ¬8 ⇒ >)
Intuitively, we are proving the correctness of an
arbitrary number of loop iterations, by induction! 13
{! ≥ 0}
r:=0, i:=0; while (i
r := r*i;
i –; }
{$ = !!}
Example
Goal: show the Hoare triple is valid
1) Write down a tentative loop invariant (!”#)
$ = ∏& 4 ∧ ( ≥ 0 ∧ ” ≥ 0 !”#$%
2) Show Inv is a loop invariant
– {” ≥ 0} r:=1, i=n; {!”#} is valid -!”#∧(>0⇒wp r:=r∗i;i−−;,!”# – !”# ∧ ( ≤ 0 ⇒ $ = “!
4
{! ≥ 0}
r:=1, i:=n; while (i>0) {
r := r*i;
i –; }
{$ = !!}
Example
5
Verification in Practice
Goal: show the Hoare triple is valid
1) Write down a tentative loop invariant (!”#) 2) Show Inv is a loop invariant
Step 2) is automatic, but 1) is mostly manual …
Significant artifacts (e.g., simple OS) have been verified, but with pains (e.g., 3 person-years)
6
Total vs. Partial Correctness
!while ‘ ({*}
Is this program correct?
{) ≥ 0, +,-./0! > 0}
float sqrt (float a, float epsilon) {
float x := 1.0;
while (x*x > a+epsilon || x*x < a-epsilon)
x := x;
return x;
}{) − +,-./0! ≤ 4! ≤ ) + +,-./0!}
Wecanverifyitwithinvariant !"#$.Why?
7
Total vs. Partial Correctness
!while ' ({*}
Partial correctness: if the loop terminates, 8 must
be true. However, the loop might not terminate E.g., 9 while true A 8 ,!"#∧¬true⇒8istrue
The loop invariant only enforces partial correctness
Total correctness: prove loop determinates (undecidable in general)
8
Summary
Goal: prove a program A is correct
Step 1: formalize “correctness” by writing down the precondition 9 and postcondition 8
Step 2: show that the Hoare tripe ( 9 A{8}) is valid - Mostly automatic, except for the loops
What is verified?
Given any state satisfying P, the final state after executing A must satisfy 8, if s terminates
9
#
36
OOP and Garbage Collection
CMPSC 461
Programming Language Concepts Penn State University
Fall 2020
HW7
• Last assignment
• Due on the last day of class (Dec. 11) midnight (no late submission)
2
Abstract Data Types
Primitive types: values and operations on values User-defined types: records, lists, ...
Focus on values
ADT: defined by a set of operations on a type
Focus on operation
Stack is a type with new, pop, push, empty ... Internal representation is less relevant
3
Classifying Operations Creators: create new objects of type
Producers: create new objects from old ones Mutators: change objects, e.g., list.add(n)
Observers: take objects of ADT and return objects with different type, e.g., list.size()
4
ADT Example int
Creators: numeric literals 1,2,3,... Producers: arithmetic operations +,-,*,/,...
Observers: comparison operators ==,!=,<,> Mutators: none (immutable)
5
ADT Example List
Creators: ArrayList, LinkedList, … Producers: Collections.unmodifiableList()
Observers: size(),get() Mutators: add(),remove(), …
6
ADT Example String
Creators: String(), String(char[]) Producers: concat(),substring(),…
Observers: length(), charAt(),… Mutators: none (immutable)
7
OOP Terminology
Class: a richer version of ADT
Object (Instance): a variable of a class (a
value of a type) Field: variable in a class
Method: operation in a class
8
Object-Oriented Programming
Key elements: • Encapsulation • Subtyping
• Inheritance
9
Encapsulation (Information Hiding)
• Group data and operations in one place (typically, in one class)
• Hide irrelevant details (using visibility modifiers, such as public, private, protected)
10
Subtyping
Sometimes, every value of type B is of type A
– e.g., a Rectangle is always a Shape We say B is a subtype of A, meaning
“every object that satisfies interface of B also satisfies interface of A”
11
Subtyping
interface Shape {
public double area();
public int edges();
}
class Circle implements Shape {
double radius;
public double area() {return 3.14*radius*radius}; public int edges() {return 1};
}
12
Subtyping Relation
We write A ≤ B to mean A is a subtype of B
Relation ≤ defines a partial ordering on types Object
Integer String Shape Rectangle
Reflexibility: ∀”. ” ≤ ”
Antisymmetry: ∀”!,”” .”! ≤ “” ∧ “” ≤ “! ⇒ “! = “” Transitivity:∀”!,””,”#.”! ≤””∧”” ≤”# ⇒”! ≤”#
Circle
13
Subtype Polymorphism
“Casting” a subtype to its supertype is ALWAYS safe (known as “Upcast”)
Hence, a function with parameter of type B can take any data with type A ≤ B
int f (Object o) {…) // takes object of any type
int g (Shape s) {…} // takes object of subtype
// of Shape, such as Circle
14
Upcasting and Downcasting
– Upcast: change to a supertype – Always safe
– Downcast: change to a subtype – Not always safe
Rectangle r = new Rectangle(2,3); // OK
Shape s = r; // Upcast is always safe
r = (Rectangle) s; // OK, but downcast
// is not always safe
Java checks type casts at run time
15
Motivation of Inheritence
class Circle implements Shape {
double radius;
public double area() {return 3.14*radius*radius}; public int edges() {return 1};
}
How to implement ColoredCircle? Option 1: copy-and-paste
Circle Circle (fixed bug)
ColoredCircle ?
16
Inheritance
class ColoredCircle extends Circle {
private Color color;
…
Color getColor {return color};
// methods area, edges are inherited from Circle }
Circle Circle (fixed bug)
ColoredCircle
17
Inheritance
class ColoredCircle extends Circle {
private Color color;
…
Color getColor {return color};
// methods area, edges are inherited from Circle }
Inheritance introduces subtyping: ColoredCircle inherits all fields & methods Circle is the supertype of ColoredCircle ColoredCircle is the subtype of Circle
18
Overriding
class DoubleCircle extends Circle {
pubic DoubleCircle(double r){super(r);} public int edges {return 2};
}
Subclass may redefine methods in super class
19
Object-Oriented Programming
Key elements: • Encapsulation • Subtyping
• Inheritance
How are these features implemented?
20
Running Example
Circle: edges return 1
subtype of
subtype of
ColoredCircle: inherits edges has a new filed “color”
DoubleCircle: edges return 2
21
Memory Layout (Fields) An object has
• Fields (and ones from super class)
Circleobject1: DoubleCircleobject: ColoredCircleobject:
radius
radius
radius color
foo (Circle s) {
s.radius; // offset?
}
Caution: new fields in subtype should extend inherited fields due to subtype polymorphism
22
Memory Layout (Functions)
Circle: edges return 1
subtype of
subtype of
ColoredCircle: inherits edges
DoubleCircle: edges return 2
foo (Circle s) {
s.edges(); // which implementation?
}
Static dispatch: s.edges() always returns 1
Dynamic dispatch: return value of s.edges() controlled by the type of s
23
Static Dispatch
subtype of
subtype of
Circle: edges return 1
ColoredCircle: inherits edges
DoubleCircle: edges return 2
foo (Circle s) {
s.edges(); // which implementation?
}
Dispatch to the implementation in class Circle
Hence, s.edges() always returns 1
24
Implementation: Static
Circle: edges return 1
subtype of
subtype of
ColoredCircle: inherits edges
DoubleCircle: edges return 2
foo (Circle s) {
s.edges(); // which implementation?
}
The compiler can always tell which implementation at compile time (e.g., the edges method in class Circle)
25
# 37
OOP and Garbage Collection
CMPSC 461
Programming Language Concepts Penn State University
Fall 2020
Memory Layout (Functions)
Circle: edges return 1
subtype of
subtype of
ColoredCircle: inherits edges
DoubleCircle: edges return 2
foo (Circle s) {
s.edges(); // which implementation?
}
Static dispatch: s.edges() always returns 1
Dynamic dispatch: return value of s.edges() controlled by the type of s
2
Implementation: Static
Circle: edges return 1
subtype of
subtype of
ColoredCircle: inherits edges
DoubleCircle: edges return 2
foo (Circle s) {
s.edges(); // which implementation?
}
The compiler can always tell which implementation at compile time (e.g., the edges method in class Circle)
3
Dynamic Dispatch
subtype of
subtype of
Circle: edges return 1
ColoredCircle: inherits edges
DoubleCircle: edges return 2
foo (Circle s) {
s.edges(); // which implementation?
}
Dispatch to the implementation based on the type of the object s
Hence, s.edges() returns 2 when s is an object of class DoubleCircle
4
Implementation: Dynamic
Circle: edges return 1
subtype of
subtype of
ColoredCircle: inherits edges
DoubleCircle: edges return 2
foo (Circle s) {
s.edges(); // which implementation?
}
The compiler does not know the type of s. How can it dispatch the method call to the correct implementation?
5
Trivial Memory Layout
An object has
• Fields (and ones from super class)
• Methods (and ones from super class)
Circle object1: Circle object2: DoubleCircle object:
radius
edges’: binary area: binary
radius
edges: binary area: binary
radius
edges: binary area: binary
Issues: each object has a copy of impl. code (waste space) polymorphic functions need to distinguish different
layouts of classes (to find method offset)
6
Virtual Table (vtable) – Tentative design
A (shared) table containing method binaries To save memory: one table per Class
Circle objects:
Circle’s vtable
edges: binary
area: binary
radius
radius
Points to methods stored in Fields stored in each object one shared vtable
7
Virtual Table (vtable) – Tentative design
A (shared) table containing methods
Override?
Circle’s vtable
DoubleCircle’s vtable
edges: binary
area: binary
edges’: binary
area: binary
1010
1020
foo (Circle s) {
s.area(); // code has different offsets
}
Issue: foo is compiled to different binaries with different offsets for different types of s
8
Virtual Table (vtable)
A (shared) table containing pointers to methods
Circle’s vtable DoubleCircle’s vtable
Code for edges
edges
area
edges’
area
Code for
edges’
:
Code for area
9
Virtual Table (vtable)
Circle objects:
Circle’s vtable
Impl. code DoubleCircle’s vtable
edges
area
radius
radius
edges’
area
DoubleCircle objects:
ColoredCircle objects:
ColoredCircle vtable
radius
edges
area
getColor
radius
Color
10
Downcasting/Upcasting
Circle objects:
Circle’s vtable
edges
area
radius
Impl. code
Upcasting (always type safe)
ColoredCircle objects:
Downcasting (be careful!)
ColoredCircle vtable
edges
area
getColor
radius
Color
11
Member Lookup: Case 1
When s is an object of class Circle
foo (Circle s) {
s.radius;
s.area();
s.edges(); }
this this+4
vt vt+4
foo (Circle s) {
vt = *this;
*(this+4); //value of radius
call *(vt+4); // method area
call *vt; // method edges
}
s.edges returns 1
12
Member Lookup: Case 2
When s is an object of class ColoredCircle
foo (Circle s) {
s.radius;
s.area();
s.edges(); }
this this+4
vt vt+4
foo (Circle s) {
vt = *this;
*(this+4); //value of radius
call *(vt+4); // method area
call *vt; // method edges
}
s.edges returns 1 Upcasting is type safe
13
Member Lookup: Case 3
When s is an object of class DoubleCircle
foo (Circle s) {
s.radius;
s.area();
s.edges(); }
this this+4
vt vt+4
foo (Circle s) {
vt = *this;
*(this+4); //value of radius
call *(vt+4); // method area
call *vt; // method edges
}
s.edges returns 2
14
Dynamic Dispatch with VTables
foo (Circle s) {
s.radius;
s.area();
s.edges(); }
foo (Circle s) {
vt = *this;
*(this+4); //value of radius
call *(vt+4); // method area
call *vt; // method edges
}
One implementation for all subtypes!
15
foo (Circle s) {
vt = *this;
*(this+4); //value of radius
call *(vt+4); // method area
call *vt; // method edges
}
foo (Circle s) {
s.radius;
s.area();
s.edges(); }
foo (Circle s) {
*(this+4); //value of radius
call Circle.area; // not in vt
call Circle.edges;// not in vt
}
HD→ Dutt/binary Static vs. Dynamic Dispatching
Methods are dynamic
Methods are static
16
Cost of Dynamic Dispatch
Dynamic dispatch has costs, but is better for extensibility
In C++: static by default, except the virtual methods
In Java: dynamic by default, expect the ones that cannot be overridden (e.g., final and static methods)
In Python: all methods use dynamic dispatch
17
Static vs Dynamic Dispatch
foo (Circle s) {
s.edges(); // which implementation?
}
Static dispatch: s.edges() always returns 1
Dynamic dispatch: return value of s.edges() controlled by the type of s
Static dispatch is easier to implement and more efficient Dynamic dispatch less efficient, but provides better extensibility (central to object-oriented programming)
18
CMPSC 461: Programming Language Concepts Assignment 7. Due: Dec. 11, 11:59PM (no late submission)
For this assignment, you need to submit your solution to Gradescope. You may discuss the assignment with other students, but all submitted work must be your own work.
Problem 1 [8pt] Prove that the following two Hoare triples are valid. (Hint: in predicate logic P1 ) P2 is equivalent to ¬P1 _ P2).
a) (4pt)
{x = 1}
y := x + 2; y := y * 3; {y > 8}
b) (4pt)
{x = 2, y = 3}
if (x<1) y := 5; else y := y - 1; {y = 2}
Problem 2 [12pt] Below is the pseudo code that computes 2x + 3y, along with the pre- and post-conditions to be verified.
{z = 2x ^ n = 0 ^ y > 0} while (n < y) {
z = z + 3;
n = n + 1; }
{z = 2x + 3y}
Write down a loop invariant and prove the correctness of this program. That is, write down an invariant, and show that the loop invariant is 1) true before the first execution of the loop, 2) true after the execution of each loop iteration, given that the loop condition and the invariant are both true before the iteration, and 3) strong enough to prove the correctness of the code (i.e., z = 2x + 3y is true after the loop, if it terminates).
Problem 3 [12pt] Consider the following Java classes:
class A {
public int foo () { return 1;}
public void message () { System.out.println( "A"); }
}
class B extends A {
public void message () { System.out.println( "B"); } }
class C extends A {
public int foo() {return 3;}
}
class D extends C {
public void message () {System.out.println( "D");} }
1/2
a) (8pt) Draw the virtual tables of classes A, B, C, D, and the code implementations that they point to. b) (4pt) Consider the following function:
void func(C c) { c.message(); }
With subtype polymorphism, this function prints “A” given an instance of class C; prints “D” given an instance of class D. Give the pseudo code for the body of func after compilation to explain how does this happen with dynamic dispatching. Assume that the base address of c is stored in a pointer this, and that the target machine has 4-byte addresses. You can ignore the calling sequence (e.g., save and restore registers, store return address and so on).
Problem 4 [12pt] Consider the heap state as shown below before garbage collection:
a) (3pt) For the Mark-and-Compact algorithm, what objects remain on the heap after collection? Do they stay in the same location after collection?
b) (9pt) Consider algorithms Mark-and-Sweep, Mark-and-Compact, Stop-and-Copy. For each of them, write down the objects being visited during the collection in sequence. Assume 1) reachability analysis uses depth-first search, which will skip objects that have already been visited, 2) search starts from p, and then q, 3) when the entire heap is traversed, objects are visited from left to right. You don’t need to write down the newly created object copies, if any.
Problem 5 [6pt] For each of the following data characteristics, briefly explain which of the garbage col- lection algorithms discussed in class (Mark and Sweep, Mark and Compact, Stop and Copy) would be the most appropriate and why. Make no other assumptions than those given.
a (3pt) All objects are of the same size and many have relatively long lifetime.
b (3pt) Most objects have short lifetime and data locality greatly improves application’s performance.
Stack p
q
Heap
A
B
C
D
E
F
G
Assignment 7, Cmpsc 461 2020 Fall 2/2
#38
OOP and Garbage Collection
CMPSC 461
Programming Language Concepts Penn State University
Fall 2020
Final
Dec. 16 (Wednesday), 6:50PM – 8:40PM
Cumulative, covers everything in the semester Roughly a uniform distribution on course materials 30% of final grade
Format: ZOOM-based exam, similar to Midterm 2 Next Friday: review for the final, HW7 due
Teaching Evaluation Don’t forgot the teaching evaluation
Feedbacks are appreciated
Heaps
new(k) b new(k’) free(b)
Independent lifetimes of objects make heap management difficult
4
Heap Management
Allocator: a routine takes size of requested heap space, and search for free space
Usually, heap is managed in blocks. Allocator may return larger block than requested
Deallocator: collect free space and merge with other free space when possible
Heap compaction: move all used blocks to one end
5
Heap-Based Allocation
Higher Address Lower Address
Allocation Request
Internal fragmentation:
the allocation request is smaller than the assigned memory block
External fragmentation:
none of the scattered free space is large enough for the request6
Heap Management Algorithms (not covered in this course)
First-fit: select the first free block that is large enough
Best-fit: select the smallest free block that fits
Buddy system: maintain various pools of free blocks with size of 2k
Fibonacci heap: maintain various pools of free blocks with size of Fibonacci numbers
7
Heap Management
Programmer Management (C, C++)
• Pros: implementation simplicity, performance
• Cons: error prone (dangling pointers, memory leaks)
Node *p, *q;
p = new Node();
q = new Node();
q = p; // memory leaks
delete(p); // q becomes dangling pointer
8
Heap Management
Automatic Management (Java, Scheme)
• No dangling pointers, no memory leaks
• Cost: Slower than programmer management
9
Garbage Collection Garbage: inaccessible heap objects
void foo () {
int* a = new int[10];
return;
}
Issues of heap management:
• Collect too aggressively: dangling pointers • Collect too conservatively: memory leaks
• Key problem: collect only objects that are inaccessible from program
10
GC I: Reference Counting
Maintain a reference count with each heap object Set to 1 when object is created
Incremented each time new reference to it is created Decremented each time reference to it is deleted Collect when count becomes 0
11
Node *p, *q;
p = new Node();
q = new Node();
q.next = new Node();
q = p;
p = null;
Stack Heap p
q
1
Node
1
Node
1
Node
12
Node *p, *q;
p = new Node();
q = new Node();
q.next = new Node();
q = p;
p = null;
Stack Heap p
q
Object referenced by q has RC-1 Object referenced by p has RC+1
2
Node
0
Node
1
Node
13
Node *p, *q;
p = new Node();
q = new Node();
q.next = new Node();
q = p;
p = null;
Stack Heap p
q
Object referenced by p has RC-1
1
Node
0
Node
1
Node
14
Garbage Collection
Stack Heap p
q
Object where RC=0 is collected
1
Node
0
Node
1
Node
15
Garbage Collection
Stack Heap p
q
Object referenced by collected object has RC-1
1
Node
0
Node
16
GC I: Reference Counting
When is an object dereferenced?
• Reference is LHS of Assignment
• Reference on stack is destroyed when function returns
• Reference is destroyed when an object with count 0 is collected
17
GC I: Reference Counting Reference Counting is about Object Ownership
• when one object creates a reference to another object, it owns that object (retain)
• when the object deletes that reference, it relinquishes ownership (release)
Multiple owners of an object Zero owners of an object
18
Problem of Reference Counting
Ownership ≠ Accessibility
Garbage, but RC>0
19
Taxonomy
Garbage Collection
Reference Counting
Trace-Based
20
What Is Garbage
Ideally, any heap block not used in the future
In practice, the garbage collector identifies blocks inaccessible from program
Essentially a reachability problem (from alive variables)
21
GC II: Tracing Collection
What allocated blocks are still accessible (live)? • Reference in registers, static area, or stack
• Reference from reachable objects to other objects
Essentially a reachability problem (heap as directed graph)
Stack Heap
22
#39
OOP and Garbage Collection
CMPSC 461
Programming Language Concepts Penn State University
Fall 2020
HW7
• Last assignment
• Due on the last day of class (Dec. 11) midnight (no late submission)
2
Abstract Data Types
Primitive types: values and operations on values User-defined types: records, lists, …
Focus on values
ADT: defined by a set of operations on a type
Focus on operation
Stack is a type with new, pop, push, empty … Internal representation is less relevant
3
Classifying Operations Creators: create new objects of type
Producers: create new objects from old ones Mutators: change objects, e.g., list.add(n)
Observers: take objects of ADT and return objects with different type, e.g., list.size()
4
ADT Example int
Creators: numeric literals 1,2,3,… Producers: arithmetic operations +,-,*,/,…
Observers: comparison operators ==,!=,<,> Mutators: none (immutable)
5
ADT Example List
Creators: ArrayList, LinkedList, … Producers: Collections.unmodifiableList()
Observers: size(),get() Mutators: add(),remove(), …
6
ADT Example String
Creators: String(), String(char[]) Producers: concat(),substring(),…
Observers: length(), charAt(),… Mutators: none (immutable)
7
OOP Terminology
Class: a richer version of ADT
Object (Instance): a variable of a class (a
value of a type) Field: variable in a class
Method: operation in a class
8
Object-Oriented Programming
Key elements: • Encapsulation • Subtyping
• Inheritance
9
Encapsulation (Information Hiding)
• Group data and operations in one place (typically, in one class)
• Hide irrelevant details (using visibility modifiers, such as public, private, protected)
10
Subtyping
Sometimes, every value of type B is of type A
– e.g., a Rectangle is always a Shape We say B is a subtype of A, meaning
“every object that satisfies interface of B also satisfies interface of A”
11
Subtyping
interface Shape {
public double area();
public int edges();
}
class Circle implements Shape {
double radius;
public double area() {return 3.14*radius*radius}; public int edges() {return 1};
}
12
Subtyping Relation
We write A ≤ B to mean A is a subtype of B
Relation ≤ defines a partial ordering on types Object
Integer String Shape Rectangle
Reflexibility: ∀”. ” ≤ ”
Antisymmetry: ∀”!,”” .”! ≤ “” ∧ “” ≤ “! ⇒ “! = “” Transitivity:∀”!,””,”#.”! ≤””∧”” ≤”# ⇒”! ≤”#
Circle
13
Subtype Polymorphism
“Casting” a subtype to its supertype is ALWAYS safe (known as “Upcast”)
Hence, a function with parameter of type B can take any data with type A ≤ B
int f (Object o) {…) // takes object of any type
int g (Shape s) {…} // takes object of subtype
// of Shape, such as Circle
14
Upcasting and Downcasting
– Upcast: change to a supertype – Always safe
– Downcast: change to a subtype – Not always safe
Rectangle r = new Rectangle(2,3); // OK
Shape s = r; // Upcast is always safe
r = (Rectangle) s; // OK, but downcast
// is not always safe
Java checks type casts at run time
15
Motivation of Inheritence
class Circle implements Shape {
double radius;
public double area() {return 3.14*radius*radius}; public int edges() {return 1};
}
How to implement ColoredCircle? Option 1: copy-and-paste
Circle Circle (fixed bug)
ColoredCircle ?
16
Inheritance
class ColoredCircle extends Circle {
private Color color;
…
Color getColor {return color};
// methods area, edges are inherited from Circle }
Circle Circle (fixed bug)
ColoredCircle
17
Inheritance
class ColoredCircle extends Circle {
private Color color;
…
Color getColor {return color};
// methods area, edges are inherited from Circle }
Inheritance introduces subtyping: ColoredCircle inherits all fields & methods Circle is the supertype of ColoredCircle ColoredCircle is the subtype of Circle
18
Overriding
class DoubleCircle extends Circle {
pubic DoubleCircle(double r){super(r);} public int edges {return 2};
}
Subclass may redefine methods in super class
19
Object-Oriented Programming
Key elements: • Encapsulation • Subtyping
• Inheritance
How are these features implemented?
20
Running Example
Circle: edges return 1
subtype of
subtype of
ColoredCircle: inherits edges has a new filed “color”
DoubleCircle: edges return 2
21
Memory Layout (Fields) An object has
• Fields (and ones from super class)
Circleobject1: DoubleCircleobject: ColoredCircleobject:
radius
radius
radius color
foo (Circle s) {
s.radius; // offset?
}
Caution: new fields in subtype should extend inherited fields due to subtype polymorphism
22
Memory Layout (Functions)
Circle: edges return 1
subtype of
subtype of
ColoredCircle: inherits edges
DoubleCircle: edges return 2
foo (Circle s) {
s.edges(); // which implementation?
}
Static dispatch: s.edges() always returns 1
Dynamic dispatch: return value of s.edges() controlled by the type of s
23
Static Dispatch
subtype of
subtype of
Circle: edges return 1
ColoredCircle: inherits edges
DoubleCircle: edges return 2
foo (Circle s) {
s.edges(); // which implementation?
}
Dispatch to the implementation in class Circle
Hence, s.edges() always returns 1
24
Implementation: Static
Circle: edges return 1
subtype of
subtype of
ColoredCircle: inherits edges
DoubleCircle: edges return 2
foo (Circle s) {
s.edges(); // which implementation?
}
The compiler can always tell which implementation at compile time (e.g., the edges method in class Circle)
25
#40
OOP and Garbage Collection
CMPSC 461
Programming Language Concepts Penn State University
Fall 2020
Stop-the-World Collectors
A: number of alive objects; N: all objects on heap H: total heap size
Mark-And-Sweep
Mark-And-Compact
Complexity
O(N)
O(N)
Fragmentation?
Yes
No
Memory move
No
Yes
Effective heap size
H
H
2
Taxonomy
Garbage Collection
Reference Counting
Trace-Based Stop-the-World
Mark-and-Sweep Mark-and-Compact
Stop-and-Copy
3
Stop-and-Copy (Copying Collector) Time-Space Tradeoff:
• 2 Heaps: allocate space in one, copy to second when the other is full (half of the heap is unused)
• Only one pass over live objects is needed (much faster than previous algorithms)
MB bit is not needed
4
Stop-and-Copy
Triggered when heap in use is full, interrupt program
For each visited object
1. Copy it to the other heap (no fragmentation)
2. For the object in the old heap, keep a forwarding pointer to the new address
3. For each object it points to:
a) if visited: update links (follow forwarding pointer) b) otherwise, (recursively) visit object and update link
Retarget root references Switch heaps
5
Example
Initial state
Stack p
q
Old Heap
New Heap
6
Example
Visit red node
Stack p
q
Forwarding Pointer
Old Heap
New Heap
7
visit(o1) {
copy
visit next link
update links }
Example
Forwarding Pointer
Visit red node (following link from new heap)
Stack p
q
Old Heap
New Heap
8
visit(o1) { copy
visit(o2) {
copy
visit next link
visit next link
update links }
update links }
Example
Forwarding Pointer
Copy red node (internal link is updated)
Stack p
q
Old Heap
New Heap }
}
9
visit(o1) { copy
visit(o2) { copy
visit(o3)
visit next link visit next link
update links
update links
Example
Update links
Stack p
q
Forwarding Pointer
Old Heap
visit(o1) { copy
visit(o2) { copy
New Heap
visit next link visit next link
update links update links }}
10
Example
Update links
Stack p
q
Forwarding Pointer
Old Heap
visit(o1) { copy
visit next link
update links
New Heap
}
11
Example
Retarget root references
Stack
Forwarding Pointer
Old Heap
p q
New Heap
12
Example
Swap Heaps (no need to clean the old one)
Stack
p q
Heap for copying
Heap in use
13
Stop-the-World Collectors
A: number of alive objects; N: all objects on heap H: total heap size
Mark-And-Sweep
Mark-And-Compact
Stop-And-Copy
Complexity
O(N)
O(N)
O(A)
Fragmentation?
Yes
No
No
Memory move
No
Yes
Yes
Effective heap size
H
H
H/2
14
Stop-and-Copy
Pros:
Eliminate fragmentation
Collection time proportional to live objects Great for short-lived data
Cons:
Only half of heap is in use, requires more collecting (virtual memory alleviates this limitation)
Bad for long living data (copying data back and forth)
15
Taxonomy
Garbage Collection
Reference Counting
Trace-Based
Stop-the-World Short-Pause
Mark-and-Sweep Mark-and-Compact
Stop-and-Copy
Partial
16
Short-Pause Collection Partial
• stop the program, but only briefly, to garbage collect a part of the heap
17
Taxonomy
Garbage Collection
Reference Counting
Trace-Based
Stop-the-World Short-Pause
Mark-and-Sweep Mark-and-Compact
Stop-and-Copy
Partial
Generational
18
The Object Life-Cycle “Most objects die young.”
• ButthosethatsurviveoneGCarelikelytosurvivemany Idea: tailor GC to spend more time on regions of
the heap where objects have just been created A better ratio of reclaimed space per unit time
19
Generational
Divide heap into generations g1, g2 …
• gi holds older objects than gi-1
Create new objects in g1, until it fills up
GC g1 only; move reachable objects to g2 after several collections (typically one collection)
20
Generational
When g2 fills, garbage collect g1 and g2, and put the reachable objects in g3
In general: When gi fills, collect g1, g2, …, gi, and put the reachable objects in gi+1
What GC algorithm is better for young generation? How about old generation?
21
Algorithm for Young Generation A: number of alive objects; N: all objects on heap
A << N
Mark-And-Sweep
Mark-And-Compact
Stop-And-Copy
Complexity
O(N)
O(N)
O(A)
Fragmentation?
Yes
No
No
Memory move
No
Yes
Yes
Effective heap size
H
H
H/2
Stop-and-copy is the most efficient
22
Algorithm for Old Generation
A: number of alive objects; N: all objects on heap
A is close to N
Mark-And-Sweep
Mark-And-Compact
Stop-And-Copy
Complexity
O(N)
O(N)
O(A)
Fragmentation?
Yes
No
No
Memory move
No
Yes
Yes
Effective heap size
H
H
H/2
Mark-And-Compact removes fragmentation
23
Algorithm for Old Generation
A: number of alive objects; N: all objects on heap
What if most objects have the same size?
Mark-And-Sweep
Mark-And-Compact
Stop-And-Copy
Complexity
O(N)
O(N)
O(A)
Fragmentation?
Yes
No
No
Memory move
No
Yes
Yes
Effective heap size
H
H
H/2
Mark-And-Sweep saves the cost of moving objects
24
Generational
Pros:
Divide heap according to lifetimes of data Great for data with mixed lifetimes
Cons:
Garbage might survive a collection Small heap size for each generation More frequent collection
25
Case Study: Java SE1 Generational Collectors
• Young Generation, 3 partitions:
• Eden, 2 Survivor Spaces (stop-and-copy) • Minor collections (high mortality rate)
• Old Generation
• surviving objects promoted from Young Generation • Major collections (infrequent)
• Permanent Generation
• objects needed by JVM, Strings
26
CMPSC 461: Programming Language Concepts Final Exam Practice Questions
Use these problems in addition to previous HWs, midterms, practice problems to prepare for the final exam.
Problem 1 There are two approaches of checking the validity of a Hoare triple: forward reasoning and backward reasoning. The former starts from the desired precondition, and checks whether the computed strongest postcondition for the program implies the desired postcondition; the latter starts from the desired postcondition, and computes the weakest precondition rather than the strongest postcondition. What makes the second approach more appealing and popular in practice?
Problem 2 Use the following three rules we gave in lecture to prove that the following two Hoare triples are valid.
• wp(x:=e, Q)=Q[x e]
• wp(s1;s2, Q)=wp(s1,wp(s2,Q))
• wp(if (E) s1 else s2, Q) = E )wp(s1,Q) ^ ¬E )wp(s2,Q)
1. {x > 1}
y := x + x; y := y + 2; {y > 0}
2. {y>=0}
if (y<1) x := y+1;
else x := y; {x>0}
Problem 3 The following program sets the return value r to 1 + 2 + · · · + n where n is an input. Write down a loop invariant and show that the Hoare triple is valid. Recall that a loop invariant should be 1) true before the first execution of the loop, 2) true after the execution of each loop iteration, given that the loop condition and the invariant are both true before the iteration, and 3) strong enough to prove the correctness of the code (i.e., r = Sum(n) is true after the loop, if it terminates).
To simplify your notation, assume Sum(n) is a function in logics that returns
{n>0}
i := 1;
r := 0;
while (i n) {
r := r + i;
i++; }
Pn
i=1 i.
{r = Sum(n)}
Problem 4 What are the key elements in any OOP language? Briefly explain every element in your own
words.
Problem 5 What is the difference between static and dynamic dispatching? Which dispatching method is used to implement subtype polymorphism?
Problem 6 Consider the following classes defined in Java, which uses dynamic dispatching.
1/2
class A {
public int a () { return 5;}
public void b () { System.out.println( “A” + a()); }
}
class B extends A {
public void b () { System.out.println( “C” + a()); }
}
class C extends B {
public int a() {return 10;}
}
Givenapolymorphicfunctionvoid func(A a){a.b();},whataretheoutputsofthefollowingcode?
B b = new B(); C c = new C(); func(b); func(c);
Problem 7
Consider the heap state as shown below before garbage collection. For each of the following algorithms, write down the objects being visited during the collection in sequence. Assume 1) reachability analysis uses depth-first search, which will skip objects that have already been visited, 2) search starts from p, and then q, 3) when the entire heap is traversed, objects are visited from left to right. You don’t need to write down the newly created object copies, if any.
1. Mark-and-Sweep 2. Mark-and-Compact 3. Stop-and-Copy
Stack Heap p
q
A
B
C
D
E
F
G
Final Exam Practice Questions, Cmpsc 461 2020 Fall 2/2
loop invariant:
n
Inv Er : sunlit) nisntl before loop
3 Inv n i – n ⇒ rssumln) n =3
,i:L, r:O isnt) after one iteration:
i Tn> 03 n:3 i:L r:L sunk) ,,
is ? after two iteration :
valid n :3 :3 , r :3 sum (2)
V::O is i ,
{ Inv I after three iteration :
:3, i.4, r:b sump)
n
Wp(i: – 1;r:- O, Inv)
=wpfi:-. ?, wplr: :O, Inv) swpli:-. 1,0: sum(i- 1) a isntl)
(O) A1 sntl
= Os sum
: Osn
M O ⇒ Osn so triple is valid
:3
n before loop :
n:3, i:/,r:O
after one intention: isn → I c.3 r
n:3, i :L , ril sunlit
ite r::rti
after two intention : 253
V
n:3, i :3 r :3 sunk)
,
after three intention: 353 V
n:3, i: 4, r:b sunk)
loop invariant :
T.nu!rn. sum(i- 1 )A isntl
3 Env n in ⇒ rssumcn)
7
i
÷::÷:”
hear parents tunes
→
is
,
(Dfs)
depth- first – search
(Dfs)
D 7 Mark A. D. F. G
”
(7 Sweep :i:i÷÷
! ABCDEFG Sam& .
3) > stop loopy : ADH