11 Pointers and References
Pointers & References
We have discussed that we are modeling the memory using the environment. Such a model is a simplified view of the memory, where the memory is an associative list of elements (associated with variable names). The environment keeps track of the execution-stack of the program (including block-contexts).
In this chapter, we will review a finer abstraction of the memory model, which involves associ- ating the memory with locations. The facilitates the description of collection of values (arrays) and further allows to perform computation through side-e↵ects as is natural in procedural languages.
Di↵erent languages allow access to memory in di↵erent ways; we will review these approaches by augmenting the language we have developed so far with new constructs for specifically related to memory access/updates.
Directly Dealing with Memory
So far, the location of where a variables is stored is not considered . . . • Memory Allocation
• Memory Deallocation
• Memory Access (reference and dereference)
• Memory Operations
Languages di↵er in the way the above features are realized (more control does not necessarily imply
better capabilities).
Implication of Reference/Dereference
• Each variable/entity can be accessed using referencing mechanism • A variable can be updated by di↵erent program parts
• Call-by-reference
Side E↵ects
A computation may impact entities through references. Di↵erent languages deal with and allow side-e↵ects at di↵erent levels.
Pure Functional Languages are side-e↵ect free. Imperative Languages rely heavily on side-e↵ects.
90
Runtime Environment Outline
1. Static Area: Allocated when the execution starts. Primary function is to store the “static” entities. E.g.,
public static void main(..) { … }
2. Stack Area: Allocated when the execution requires LIFO data access. Primary function is to store function calls (recursion).
3. Heap Area: Allocated dynamically. Typically, this is user controlled/defined.
Recall: in the previous chapter, we discussed function-applications, where each application results in a new element (activation record) in the stack and the semantics is computed (program is evaluated) based on the element at the top of the stack.
Pointers: From Variables Perspective
A variable has
• a name (by which it is accessed) • a location (where it is stored)
• a value
From Pointer’s own Perspective
A pointer stores the address to some entity.
1. Pointee: The location at the address is what the pointer points to.
2. Dereference: The access operation of the pointee starting from the pointer.
3. Assignment: Assigning one pointer to another makes them to point to the same pointee (the pointers, therefore, hold the same address).
A variable has a name, an address and a value stored at that address. The value stored at an address depends on the type of the variable: e.g., the value stored at the address of an integer variable is an integer. A pointer variable is a type of variable, where the value stored at its address is the address of a memory location. The address can be an address of some other variable (as in C/C++) or can be the address of some complex structure (as in C/C++/Java).
If a pointer variable holds the address of another variable v, then the address may refer to a memory location in the stack. On the other hand, if a pointer variable holds the address of a complex structure (e.g., object), then the address may refer to the memory location in heap. In C/C++, it is possible for a pointer variable to hold addresses of any memory location; while in Java/C#, pointer variable typically holds addresses of heap memory area.
91
In addition to pointer variables, there is another way of handling addresses of variables— reference variables. A reference variable is assigned one-time and its address is same as the address of the variable that is assigned to it. A reference variable x assigned y has the same address as y; therefore, x and y have the same address and hold the same value. A reference variable is way to assign a di↵erent name to an existing variable.
While C/C++ allows explicit management of memory locations by explicitly accessing addresses of variables, structures and objects; Java does not allow explicitly access the address of any variable; rather the address of complex structures/objects are implicitly assigned to the variables.
Pointer
C/C++
int* p1; // a pointer variable: does not point to anything
p1 = malloc(sizeof int)); // pointee is a block that can hold some integer
*p1 = 10;
int* p2;
p2 = p1;
p1 %! p2
Reference
C++
int x = 10;
int& y = x;
y = 20;
int x = 10;
int& y = x;
y = 20;
int* p = &x;
*p = 30;
Variables, Pointers & References
// Deferencing to pointee, which now holds the value 10
// y and x both points to same pointee, which holds 10
int x = 10;
int& y = x;
int* p = &x;
Name x
y
p
Entity/Type integer variable reference variable pointer variable
Address Value L1 10 L1 10 L2 L1
10
// an integer variable
// y and x has the same location/reference
92
Variables in Functions
Given a function signature f : I ! O with a function definition as expression E.
1. Application of function: (f V ) results in the computation of E[I 7! V ] at the call site (aka
inline the computation).
Does change in variable values in V visible outside the scope of E? Call-by-value vs.
Call-by-Reference.
2. Usage of variables in E that are not local to the scope of E.
Static vs. Dynamic Scope.
Languages
C/C++ allows explicit call-by-value.
void swap(int x, int y) {
int temp;
temp = x;
x = y;
y = temp; }
…
int i = 1;
int j = 2;
swap(i, j);
Languages
C/C++ allows call-by-reference.
void swap(int* x, int* y) { // pointer parameters
int temp;
temp = *x;
*x = *y;
*y = temp;
// temp = the pointee-value of x
// pointee-value of x = pointee-value of y
// pointee-value of y = temp
}
…
int i = 1;
int j = 2;
swap(&i, &j); // address of i and j
Languages
C++ References
93
void swap(int& x, int& y) {
int temp;
temp = x;
x = y;
y = temp; }
…
int i = 1;
int j = 2;
swap(i, j);
Languages
Java only allows call-by-value. But remember that any object in Java is always implicitly refer- enced. This implies the reference-value gets copied, when actual argument is an object.
public class MyNumber {
public int val = 32;
}
public class SomeClass {
public static void main(String[] args) {
MyNumber n = new MyNumber();
change(n);
System.out.println(n.val);
}
public static void change(MyNumber input) {
input.val = input.val + 32;
}
}
The di↵erence between call-by-value and call-by-reference (using pointers or reference variables) is the entity that is being copied from the actual argument to formal parameter.
1. If both the actual argument and formal parameter represent values (not addresses to any location), then that value of the actual argument is copied to the formal parameter. I.e., they share the same value and changing one will not change the other. (call-by-value)
2. If the actual argument represents an address and the formal parameter represents a pointer to address, then the address is copied to the formal parameter. I.e., formal parameter points to the location where the actual argument is stored. Updating the pointee using the formal parameter results in the changes to the value of the actual argument. (call-by-reference using pointers)
3. If the actual argument represents a reference to a variable and the formal parameter is a variable, then the formal parameter holds the reference to the same variable as the actual argument. I.e., the formal parameter is a new name to the same variable that is referenced
94
by the actual argument. Any change to the value of the formal parameter is a change to the value of the variable referenced by the actual argument. (call-by-reference using reference variables)
Language-related Distinguishing Aspects
1. Memory Management: Does the language allows programmers to control the memory alloca- tion and deallocation explicitly?
(a) Manual: The programmer is required to deallocate memory when he/she is done with it. Major issues with such control: memory leakage (allocation without deallocation). C, C++.
(b) Automatic: The programmer is not responsible for deallocation. The runtime environ- ment for the language takes care of collecting unused memory. Java, C#.
Language-related Distinguishing Aspects
2. Referencing: Does the language allow programmers to refer to specific memory locations?
(a) Explicit: The programmers can control what he/she wants to refer. This allows the programmers to refer to a variable value as well as its location (call-by-value, call-by- reference). C, C++, C# (with some restrictions).
(b) Implicit: The programmers cannot access locations through references. Certain accesses are however reference based. Java (non-primitive structure accesses such as arrays, objects).
Language-related Distinguishing Aspects
3. Reference operations: What type of operations are allowed on references?
(a) Arithmetic: The programmer can perform arithmetic operations on locations to obtain new locations. This can result in vulnerabilities. C, C++, C# (with some restrictions)
(b) Dereferencing only: The programmer can find/update the value at the reference location and compare them.
Language-related Distinguishing Aspects
4. Typing: Does the language allow the locations to hold any type of values?
(a) Statically typed: Each pointer must have a type and it can only refer to a location holding the values of that type.
(b) Untyped or Dynamically typed:
95
Broadly speaking (w/o considering global and static contexts), any (non-structure/non-object) variable used during execution are stored in the execution stack. The structures and objects that are dynamically created are typically stored in the heap (though C++ do not specify how/where they should be stored, Java requires them to be stored in the heap).
A pointer variable in C/C++ therefore may point to locations in the stack and heap depending on its pointee. For instance, if a pointer variable is assigned the address of a local variable, then the pointer variable points to a variable that is present in the stack; on the other hand, if a pointer variable is pointing to a newly allocated memory location, then it is likely to be pointing to a heap location. If a pointer variable is part of a structure that is dynamically created, then the pointer variable, itself may be stored in the heap.
In case of Java, all objects are stored in the heap and any variable assigned an object is a variable stored in the stack that reference some location in the heap. Variables, in general, cannot reference any location in the stack.
In the following, we will have C/C++ style pointers but we will restrict the pointer-pointee relationship in a Java style. That is, we will only have pointer variables in stack and they can only point to locations in heap. Variables in the stack cannot be pointed to by any other variable. The features are outlined below.
Adding Pointers to our Language C-pointer flavor
• Allocating & Referencing: Declare pointers • Dereferencing: Accessing the pointees
• Deallocating: Reclaiming unused memory
Aliasing (Similar to Java)
Ability to access the same location using di↵erent names. • Aliasing for locations in the heap.
Necessary language-level extensions
• New expression constructs for the above operations • New expression to describe pointers
Adding Pointers to our Language
Program -> Expr
Expr -> Number | Var | …| FExpr | ApplyF | DRef | WRef | Ref
…
DRef -> (deref Expr)
WRef -> (wref Expr Expr)
Ref -> (ref Expr) | (free Expr)
96
Valid programs
(define prog1
’(deref (ref (+ 4 2)))
)
(define prog2
’(var (x (ref (+ 4 2)))
(deref x) )
)
Informal Semantics
Program
Expr
…
DRef
WRef
Ref
• •
• •
Expr): write the value of the expression in the first free block in the memory (heap) and return the location value.
int* p = malloc(…);
*p = 10; // same as (var (p (ref 10)))
free
free
(free Expr): remove the values stored in at the location whose address is the evaluation of Expr; the location is ready to be used in future (ref commands). Free returns the location that is freed. Same is free(pointer) command in C
97
-> Expr
-> Number | Var | …| FExpr | ApplyF | DRef | WRef | Ref
-> (deref Expr)
-> (wref Expr Expr)
-> (ref Expr) | (free Expr)
(deref R): read and return the value stored at the location of the address obtained by evaluating R.
(wref R Expr): write the value of Expr at the location of the address obtained by evaluating R and return
the value of Expr.
(ref Expr): write the value of Expr to the first available free location and return the address of the location
(similar to allocate).
(free R): deallocate (i.e., can be written to in future) the location of the address obtained by evaluating R and return the location.
ref
ref
(ref
deref deref
(deref R): evaluate R to find the value of an address.
int* p = …;
…
*p = 10;
p++;
int x = *p;
// Similar to (var (p (ref 10)) (var x (deref (+ p 1))))
wref
wref
(wref R Expr): evaluate R to find the address of a memory location and write at that location the valuation of Expr.
int* p = …;
…
*p = 20;
p++;
*p = 10;
Examples
// (var (p (ref 20))
// (var (p (+ p 1))
// (wref p 10)))
• (deref R): read and return the value stored at the location of the address obtained by evaluating R.
• (wref R Expr): write the value of Expr at the location of the address obtained by evaluating R and return
the value of Expr.
• (ref Expr): write the value of Expr to the first available free location and return the address of the location
(similar to allocate).
• (free Expr): deallocate (i.e., can be written to in future) the location of the address obtained by evaluating
Expr and return the location.
(define prog1
’(deref (ref (+ 4 2)))
)
(define prog2
’(var (x (ref (+ 4 2)))
(deref x) )
)
Examples
• (deref R): read and return the value stored at the location of the address obtained by evaluating R.
• (wref R Expr): write the value of Expr at the location of the address obtained by evaluating R and return the value of Expr.
98
• (ref Expr): write the value of Expr to the first available free location and return the address of the location (similar to allocate).
• (free Expr): deallocate (i.e., can be written to in future) the location of the address obtained by evaluating Expr and return the location.
(define prog1
’(wref (free 10) 20)
)
(define prog2
’(var (x (free 10))
(wref x 20) )
)
Examples: Side-e↵ects
(define prog
’(var (x (ref 10))
(var (y x)
(+ (deref x) (wref y 20)))))
(define prog1
’(var (x (ref 10))
(var (y x)
(+ (wref y 20) (deref x)))))
Change of heap while evaluating one expression can impact the evaluation of another expression (in a di↵erent lexical scope)
Examples: Side-e↵ects
(define prog
(define prog1
’(var (x (ref 10))
(+ (deref x)
((gt (deref x) 0)
(wref x 20)
(wref x 10) )
’(var (x (+
(ref 10))
((gt (deref x) 0)
(wref x 20)
(wref x 10) )
(deref x) ))
)) ))
What is the result of evaluating prog? 20 or 30 or 40? 20 or 30 or 40?
What is the result of evaluating prog1?
Side-e↵ects
(define prog
’(var (x (ref 10))
(+ ((gt (deref x) 0)
(wref x 20)
99
) )
(wref x 10) )
(deref x) )
• x is the value of first free location l which holds 10 • for the arith-expression,
– For the first operand,
⇤ evaluate (deref x) (which is 10) and check whether 10 > 0
⇤ evaluate (wref x 20)–now the location l holds 20
⇤ the result of the first operand of the arith-expression is 20 – for the second operand,
⇤ evaluate (deref x)
⇤ the result of the second operand of the arith-expression is 20
We will present here a formal specification of semantics. As stated before, we are going to restrict the pointer-pointee relationship to be such that pointer variables are only present in the stack and pointees are on the heap. Therefore, our stack representation using environment will remain as is (locations in the environment are not necessary as they can be only accessed via variable names). We will add a new structure to represent heap, which will contain entities that are associated with a location and a value. The location information is necessary because the entities in the heap can be only accessed via location.
Semantics
Semantics without Heap
[] : E ⇥ ⌃ ! N Evaluate the value of an expression in the context of environment
Semantics with Heap
[] : E ⇥⌃⇥H ! N⇥H Evaluate the value of an expression and the state of the heap in the context of environment and heap
Environment is a sequence of variable-value pair or function signature-definition pair Heap is a sequence of location-value pair. There is no duplicate location value in the heap.
Notations for Semantics
: Environment is a sequence of variable-value pair or function signature-definition pair h: Heap is a sequence of location-value pair
• (x v) : (x v)-pair is the first element in the sequence 100
• (x): value of the left-most occurrence of x in environment . if = (x 2) (x 3) (y 1), then (x) = 2 and (y) = 1
• h(l): value at the location l in heap h if h = (1 10) (2 20) (3 free) (4 20) (5 free), then h(2) = 20
• h 1(free): left-most location which holds the value free for the above heap h, h 1(free) = 3 Operational Semantics
[] : E ⇥ ⌃ ⇥ H ! N ⇥ H
[(+ E1 E2)]h =
[(CCond E1 E2)] [(var (x E1) E2)]h
Operational Semantics
= [E2]h0 if h[CCondi]h = (false h0)
[n]h = [x]h =
(n h) ( (x) h)
(v1+v2h2)
w(here [E1]h = (v h ) and [E2]h1 = (v h )
11 22 [E1]h0 if h[CCondi]h = (true h0)
h
= [E2]h1
(x v1)
where [E1]h = (v1 h1)
[] : E ⇥ ⌃ ⇥ H ! N ⇥ H
[(fun ((f (V1 V2 …)) Expr1) Expr2)]h =
[Expr2]h((f (V1 V2 …)) Expr1)
[(apply (f (E1 E2 …)))]h =
[(var (V1 E1) (var (V2 E2) … Expr1 )…))]h
where ((f (V1 V2 …)) Expr1) is the left-most occurrence of f 2 Operational Semantics
101
[] : E ⇥ ⌃ ⇥ H ! N ⇥ H
deref: returns the value from a location in heap
[(deref E)]h = (h0(`) h0) where [E]h = (` h0)
wref: writes at a location a value and returns the value [(wref E1 E2)]h = (v h3)
where[E1]h=(`h)^[E2]h1 =(vh)^ 1 2
h3 = h2/(h2(`) 7! v)
free: writes free at a location in the heap and returns that location [(free E)]h = (` h0)
where [E]h = (` h1) ^ h0 = h1/(h1(`) 7! free) ref: write to the first available free location and returns that location
Example
(define prog
’(var (x (ref (+ 4 2)))
(deref x) )
)
[ref E]h = (h 1(free) h0)
1 h 0 1
(free)) 7! v)
[Expr1]h = (v h ) and [Expr2]h1 = (v h ) 11 22
h[(eq Expr1 Expr2)i]h = (v1 == v2 h2)
[Expr1]h = (v h ) and [Expr2]h1 = (v h )
h[CCond1i]h = (v h ) and h[CCond2i]h1 = (v h ) 11 22
where [E] = (v h1) ^ h = h1/h1(h1 h[i]:B⇥⌃⇥H!{true, false}⇥H
Operational Semantics
h[(gt Expr1 Expr2)i]h = (v1 > v2 h2)
[Expr1]h = (v h ) and [Expr2]h1 = (v h )
11 22 h[(lt Expr1 Expr2)i]h = (v1 < v2 h2)
11 22 h[(or CCond1 CCond2)i] = (v1 _ v2 h2)
h[(not CCond)i]h
= (¬v h0) where h[CCondi]h = (v h0)
102
Let the heap contains free locations: h = (1 f ree) (2 f ree) (3 f ree)
[prog]h✏ = [(var (x (ref (+ 4 2))) (deref x))]h✏
Example
(define prog
’(var (x (ref 10))
(+ ((gt (deref x) 0)
(wref x 20)
(wref x 10)
)
(deref x) )
) )
8><>: [(ref (+ 4 2))]h✏ = (` h0 )
= [(deref x)]h0 where ^ [(+ 4 2)]h✏ = (6 h)
(x `)>^ h0 = h/h(h 1(free)) 7! 6
^ ` = h 1(free) = 1
= [(deref x)](1 6) (2 f ree) (3 f ree)
(x 1)
= (h0(`) h0) where ⇢ [x](1 6) (2 free) (3 free) = (` h0) (x 1)
` = 1^ h0 = (1 6) (2 free) (3 free) = (6 (1 6) (2 free) (3 free))
Example
(define prog
’(wref (ref 10) (+ (deref (ref 20)) (deref 1)))
)
Let the heap contains free locations: h = (1 f ree) (2 f ree) (3 f ree)
[prog]h✏ =[(wref(ref10)(+(deref(ref20))(deref1)))]h✏ = (v0 h0)
where [(ref 10)]h✏ = (` h1)
and [(+ (deref (ref 20)) (deref 1))]h1 = (v0 h2)
and h0 = h2/h2(`) 7! v0
[(ref10)]h✏ =(1(110) (2free) (3free))
[(+(deref(ref20))(deref1))](1 10) (2 free) (3 free) ✏
Let the heap contains free locations: h = (1 f ree) (2 f ree) (3 f ree) [prog]h✏ = [(var (x (ref 10)) …)]h✏
= [(+ ((gt (deref x) 0) …) (deref x))](1 10) (2 f ree) (3 (x 1)
f ree)
= (v1+v2h2)where
[((gt (deref x) 0) …)](1 10) (2 free) (3 free) = (v1 h1) =??
(x 1)
[deref x)]h1 = (v2 h2) =?? (x 1)
103
✏
??
Exceptions
1. if deref reads from a location which does not hold any value, then exception free memory access
2. if deref reads from a location which is not in the heap memory, then exception out of memory access
3. if wref writes to a location which does not hold any value, then exception free memory access
4. if wref writes to a location which is not in the heap memory, then exception out of memory
access
5. if free frees up a location which does not hold any value, then exception free memory access
6. if free frees up a location which is not in the heap memory, then exception out of memory access
7. if ref cannot find any free location to write to, then exception out of memory
Adding Exceptions to the Syntax and Semantics
Any operations over exception values returns the same exception value. di↵erent exceptions?
Example
Assume that the heap is ((1 free))
(define prog
’(+ (deref (free 1))
(deref (ref (deref (ref 10))))
)
)
Exception free memory access
How about?
(define prog
’(+ (deref (ref (deref (ref 10))))
What if there are two
(deref (free 1))
)
)
Dynamics of Evaluation
(define prog
’(var (x (ref (+ 4 2)))
(deref x) )
)
104
Evaluate the above program in the context of no free memory in the heap.
• ref will return Exception OOM
• var expression tries to map x to Exception OOM, which returns Exception OOM
(define prog1
’(var (x (ref (+ 4 2)))
(+ 1 2) )
)
The var expression returns an exception.
Example
Assume that there is only one free memory location available
(define prog
’(deref (ref (deref (ref 10))))
)
Exception out of memory
Exercise 11.1. Code Comprehension w/ pointers Consider the following programs and show the valuation of each variable as the program executes (you can present the address value symbolically, i.e., *x = 10 means x has the value a, where a is the address of the location where 10 is stored.)
1. void fun(int* i) { // what is the value of i? int* j = new int; // allocate memory
*j = 21; // j?
i = j; // i?
}
int main() {
int* x = new int; // allocate memory
}
*x=11;
fun(x);
return 0;
// x has the value a, where a is the address
// of the location where 11 is stored.
// call to function fun with actual argument x
// x?
2. void fun(int* i) {
int* j = new int; // allocate memory *j = 21; // j?
*i = *j; // i?
}
int main() {
int* x = new int; // allocate memory
}
*x=11;
fun(x);
return 0;
// x has the value a, where a is the address
// of the location where 11 is stored.
// call to function fun with actual argument x
// x?
105
// what is the value of i?
3. int main() {
int* x = new int; // allocate memory
*x = 11; // x has the value a, where a is the address
}
}
// of the location where 11 is stored.
int* y = new int;
if (true) {
y = x; *y=21;
}
return 0;
// y? // y?
// x?
4. int main() {
int* x = new int; // allocate memory
*x = 11; // x has the value a, where a is the address
// of the location where 11 is stored.
int* y = new int;
if (true) {
x = y; *y=21;
}
return 0;
// x? // y?
// x?
5. int main() {
int* x = new int; // allocate memory
*x = 11;
if (true) {
int y = 21;
Exercise 11.2 (Expression Language with References). Write the syntax checker and interpreter for the grammar discussed in the section.
Exercise 11.3 (Freeing Un-referred Memory). Write a makefree function, which is invoked after each expression and checks whether the environment holds values of variables corresponding to the location for every non-free location in the memory (i.e., whether every non-free memory location can be accessed directly by a variable). If there is some non-free memory location that is not directly accessible, the function frees that location.
Exercise 11.4 (Semantic Di↵erences at Function Calls). Review the semantics of function/method calls in Java and C/C++. Discuss the similarities and di↵erences between the call semantics of these languages and the call semantics discussed in this section.
Exercise 11.5 (Arrays on the Heap). Add array (of numbers) structure in the language. An array is described by its size (number of elements that can be stored in the array). Elements in the array can be accessed by providing the index value (you can assume that the first index is 1). We can
106
} }
x = &y;
return 0;
// x has the value a, where a is the address
// of the location where 11 is stored.
// x? // x?
write to an array index as well. If the index value (for reading from and writing to array) is larger than the array size, then there is out of bound exception.
1. Write the semantics of the language.
2. Implement the semantics.
3. Discuss the problems that might occur if the makefree function in Exercise 11.3 is included in the implementation. Modify makefree to address the problems.
Exercise 11.6 (Analyzer for Sequential Language w/ References). We add the following statements in the sequential language:
Stmt -> (ref Var Expr) | (deref Var Expr) | (free Expr) | (wref Expr Expr)
The intuitive meaning of these newly added statements are as follows:
1. (ref Var Expr): write the value of the Expr at the first free location in the heap and writes the location value to the variable Var. It may result in out of memory exception if there is no free memory available.
2. (deref Var Expr): reads the value at the memory location defined by the value of the Expr and writes the value to variable Var. It may result in out of memory access if the location is not present in the heap; free memory access if the location being accessed is free.
3. (free Expr): frees up the memory location defined by the value of the Expr. It may result in out of memory access if the location is not present in the heap.
4. (wref Expr1 Expr2): write value of Expr2 at the memory location defined by the value of the Expr1. It may result in out of memory access if the location is not present in the heap.
Write the operational semantics of the language following the above grammar and implement the semantics in Racket.
107