CPSC 213 Introduction to Computer Systems
Winter Session 2020, Term 2
Unit 1c – Feb 1
Structs and Dynamic Allocation
Overview
‣ Reading
• Companion: 2.4.4-2.5
‣ Reference
• Textbook: 3.9.1, 9.9, 3.10
‣ Learning Objectives
• read and write C code that includes structs
• compare Java classes/objects with C structs
• explain the difference between static and non-static variables in Java and C
• explain why ISAs have both base-plus-offset and indexed addressing modes by showing how each is useful for implementing specific features of programming languages like C and Java
• distinguish static and dynamic computation for access to members of a static struct variable in C
• distinguish static and dynamic computation for access to members of a non-static struct variable in C
• translate C struct-access code into assembly language
• count memory references required to access struct elements
• compare Java dynamic allocation the C’s explicit-delete approach by identifying relative strengths and weaknesses
• identify and correct dangling-pointer and memory leak bugs in C caused by improper use of free()
• write C code that uses techniques that avoid dynamic allocation as a way to minimize memory-allocation bugs
• write C code that uses reference counting as a way to minimize memory-allocation bugs
• explain why Java code does not have dangling-reference errors but can have memory-leak errors
2
So Far …
The Simplest Variables in Any Language
‣ Static Variables
• the compiler allocates them; i.e., their memory address is a constant
• ADVANTAGES:
• DISADVANTAGES:
‣ Scalars and Arrays
• a scalar is a variable that stores a single value
• an array stores multiple values named by a single variable and an index – the array data can be allocated either statically or dynamically
– the array variable can either BE the array (static) or can store a pointer to it (dynamic) – the index is generally a dynamic value
‣ Now for some other types of variables … 3
Instance Variables
Class X
static int i;
int j;
X anX ;nce of X
X.i anX.j
‣ Variables that are an instance of a class or struct • created dynamically
• many instances of the same variable can co-exist
‣Java vs C
• Java: objects are instances of non-static variables of a class
• C: structs are named variable groups, instance is also called a struct
‣ Accessing an instance variable
• requires a reference to a particular object (pointer to a struct)
• then variable name chooses a variable in that object (struct)
4
Object instance of X Object instance of X
Object instance of X
Objec
Object i
j
t
in
i
n
t
a
in
s
st
ta
n
;ce of X int j;
n
t
j
int j; int j;
Structs in C (S4-instance-var)
structD{ ≈classD{
int e; public int e; int f; public int f;
}; }
‣A struct is a
• collection of variables of arbitrary type, allocated and accessed together
‣ Declaration
• similar to declaring a Java class without methods
• name is “struct” plus name provided by programmer
• static
• dynamic
‣ Access
• direct/static
• indirect/dynamic
struct D d0;
struct D *d1;
d0.e = d0.f;
d1->e = d1->f; // or (*d1).e = (*d1).f; 5
Struct Allocation
struct D {
int e;
int f;
};
‣ Static structs are allocated by the compiler Static Memory Layout
struct D d0;
0x1000: value of d0.e
0x1004: value of d0.f
‣ Dynamic structs are allocated at runtime
• the variable that stores the struct pointer may be static or dynamic
• the struct itself is allocated when the program calls malloc Static Memory Layout
struct D *d1;
0x1000: value of d1 6
struct D {
int e;
int f;
};
• runtime allocation of dynamic struct
void foo() {
d1 = malloc(sizeof(struct D));
}
• assume that this code allocates the struct at address 0x2000
0x1000: 0x2000
0x2000: value of d1->e
0x2004: value of d1->f
7
Struct Access
struct D {
int e;
int f;
};
‣ Static and dynamic differ by an extra memory access
• dynamic structs have dynamic address that must be read from memory • in both cases the offset to variable from base of struct is static
d0.e = d0.f;
m[0x1000] ← m[0x1004]
r[0] ← 0x1000
r[1] ← m[r[0]+4]
m[r[0]] ← r[1]
d1->e = d1->f;
m[m[0x1000]+0] ← m[m[0x1000]+4]
r[0] ← 0x1000
r[1] ← m[r[0]] load d1
r[2] ← m[r[1]+4]
m[r[1]] ← r[2]
8
struct D {
int e;
int f;
};
d0.e = d0.f;
r[0] ← 0x1000
r[1] ← m[r[0]+4]
m[r[0]] ← r[1]
ld $0x1000, r0 # r0 = address of d0
ld4(r0),r1 #r1=d0.f str1,(r0) #d0.e=d0.f
d1->e = d1->f;
r[0] ← 0x1000 r[1] ← m[r[0]] r[2] ← m[r[1]+4] m[r[1]] ← r[2]
load d1
‣ The load/store base plus offset instructions
• dynamic base address in a register plus a static offset (displacement)
ld $0x1000, r0 ld(r0),r1 ld4(r1),r2 st r2, (r1)
# r0 = address of d1 #r1=d1 #r2=d1->f
# d1->e = d1->f
ld 4(r1), r2
9
The Load-Store ISA
‣ Machine format for base + offset
• note that the offset will in our case always be a multiple of 4
• also note that we only have a single instruction byte to store it • and so, we will store offset / 4 in the instruction
‣ The Revised ISA
Name
Semantics
Assembly
Machine
load immediate
r[d] ← v
ld $v, rd
0d– vvvvvvvv
load base+offset
r[d] ← m[r[s]+(o=p*4)]
ld o(rs), rd
1psd
load indexed
r[d] ← m[r[s]+4*r[i]]
ld (rs,ri,4), rd
2sid
store base+offset
m[r[d]+(o=p*4)] ← r[s]
st rs, o(rd)
3spd
store indexed
m[r[d]+4*r[i]] ← r[s]
st rs, (rd,ri,4)
4sdi
10
Memory Addressing Modes
‣ Scalars
• address in register (r1)
• access memory at address in register
ld (r1), r0
‣ Arrays
• base address in register (r1)
i=a
r1 stores address of a r2 stores value of j
‣ Struct Members (Instance Variables)
• dynamic index in register (r2)
• access memory at base plus index times element-size (4)
i = a[j]
ld (r1,r2,4), r0
• base address in register (r1)
• static/constant offset (X)
r1 stores address of a or value of b
X is offset to j/k from beginning of struct
• access memory at base plus offset
i=a.j i=b->k
ld X(r1), r0
11
Structs Declared Inside of Other Structs
‣ The struct variable is
• the variable that stores the struct for static structs or
• the variable that stores a pointer to the struct for dynamic structs
‣ Struct variables can be declared inside of other structs
struct W {
int a;
int j;
};
struct A {
int a;
int b;
};
struct X {
int i;
struct A s;
int j;
struct Y {
int i;
};
};
12
struct A *s;
int j;
Question 1c.1
‣ Struct variables can be declared inside of other structs
struct W { struct A { int a; int a; int j; int b;
}; };
struct X {
int i;
struct A s;
int j;
struct Y {
int i;
struct A *s;
int j;
};
‣ What are the offsets to j within X and Y on a 32-bit machine? A. 4, 4
B. 8, 4 C. 8, 8 D. 12, 8 E. 12, 12
};
13
Structs Declared Inside of Other Structs
‣ A struct variable of type X can be declared inside a struct X
struct Z {
int i;
struct Z *s;
int j; };
‣ But not
struct R {
int i;
struct R s;
int j; };
‣ Why?
🚫
14
Size and Alignment
‣ Alignment rules apply inside of a struct
• each instance variable will be aligned according to its type size
• structs are aligned according to their largest instance variable type
‣ Structs can mix types – but padding might be added
struct X {
char a;
char b;
int i;
struct Y {
char a;
int i; char b;
};
};
• what are the sizes of the structs above? (i.e. sizeof(struct X)) ‣ Similarly, struct members can be arrays
struct X { inti;
int a[10]; intj;
struct Y {
int i;
int *a;
int j;
};
• what are the offsets to the j’s? 15
};
Arrays of Structs
‣ Array of structs struct S a[10];
‣ Array of pointers to structs struct S *b[10];
‣ Or combine struct and array declaration • struct name is optional (but recommended)
};
Aside: What’s this?
int *e[10];
16
struct S {
struct {
int i[2];
char c;
int i[2];
char c;
struct {
int i[2];
char c;
} *d[10];
} c[10];
Struct Values
‣ Structs can be copied (by value!)
‣ Structs can be returned (by value!) }
‣ Be careful with pointers inside structs! 17
};
struct S s0;
struct S s1;
struct S {
s0 = s1;
struct S makeS() {
int i[2];
char c;
struct S s = { { 0, 1 }, ‘c’};
return s;
Dynamic Allocation
18
Dynamic Allocation in C and Java
‣ Programs can allocate memory dynamically
• allocation reserves a range of memory for a purpose
• in Java, instances of classes are allocated by the new statement • in C, byte ranges are allocated by call to malloc procedure
‣
In C
void *malloc (n)
• n is the number of bytes to allocate
• void*
– is a pointer that is automatically cast to/from any other pointer – can not be dereferenced directly
‣ Use sizeof to determine number of bytes to allocate • sizeof(x) statically computes # bytes in type or variable
– why can’t you use sizeof() to compute the size of a dynamic array?
struct Foo *f = malloc(sizeof(struct Foo)) 19
Deallocation in C and Java
‣ Wise management of memory requires deallocation • memory is a scarce resource
• deallocation frees previously allocated memory for later re-use • Java and C take different approaches to deallocation
‣ Memory Leak
• when dynamically allocated data is not deallocated when it is no longer needed • size of program gradually increases … available memory leaks away
‣ How is memory deallocated in Java?
‣ Deallocation in C
• programs must explicitly deallocate memory by calling the free procedure • free releases the memory immediately, with no check to see if its still in
‣ The Heap
• malloc and free work together to manage an abstraction called the heap
• the heap is a large section memory from which malloc allocates objects • all objects are stored in the heap
20
malloc
‣ Dumb but functional, illustrative implementation int next = 0;
int memory[100000000];
void *malloc(int size) {
int *result = &memory[next];
next += (size + 3) / 4;
return result;
}
‣ The hard part is free – releasing memory you don’t need
21
The Problem with Explicit Delete
struct Foo *f = malloc(sizeof(struct Foo))
free(f);
You should add this after the free (but its not enough)
f = 0;
‣ What free(f) does
• deallocates “object” at address f
• so that this memory can be reused by subsequent call to malloc
‣ What free(f) does not do • it doesn’t change the value of f
– what would have to be true of the language to change the value of f
‣ Therefore
• after a call free(f)
• the variable f still points at the freed object
• other variables may point there too …
22
Considering Explicit Delete
‣ Let’s look at this example
How can you be sure that this memory is eventually freed?
struct MBuf *receive() {
struct MBuf *mBuf = malloc(sizeof(struct MBuf));
…
return mBuf;
}
void foo() {
struct MBuf *mb = receive();
bar(mb);
free(mb);
mb = 0;
}
• what bad things can happen?
Is it safe to free it here?
23
What actually happens in malloc/free?
‣ Organization of memory
• statically allocated – code
– static data
• the rest of memory
– unused or dynamically allocated
‣ Malloc/Free
• implemented in library
• manage a chunk of memory called the Heap • keep track of what’s allocated or free
‣Malloc a=malloc(16); • find a free memory
• mark it allocated
• return pointer to it
– keeping track of its size
?
?
Code
Static Data
Heap
‣ Free
• use pointer to determine allocated size • mark referenced memory as free
free(a);
24
The Problem with Explicit Delete
‣ Lets extend the example to see
• what might happen in bar()
• and why a subsequent call to baz() would expose a serious bug
struct MBuf *receive() {
1 struct MBuf *mBuf = malloc(sizeof(struct MBuf));
…
return mBuf;
}
void foo() {
struct MBuf *mb = receive();
baz(); }
struct MBuf *aMB;
void bar(struct MBuf *mb) {
3 aMB = mb; }
void baz() { 5 aMB->x = 0;
}
0
2
bar(mb); 4 free(mb);
This statement writes to
unallocated (or re-allocated) memory.
25
Dangling Pointers
‣ A dangling pointer
• is a pointer to an object that has been freed
• could point to unallocated memory or to another object
‣ Why they are a problem
• program thinks its writing to object of type X, but isn’t
• it may be writing to an object of type Y, consider this sequence of events
(1) Before free:
aMB: 0x2000
0x2000: a struct mbuf
(3) After another malloc:
aMB: 0x2000
0x2000: another thing
(2) After free:
aMB: 0x2000
0x2000: free memory
dangling pointer that is really dangerous
26
dangling pointer
Avoiding Dangling Pointers in C
‣ Understand the problem
• when allocation and free appear in different places in your code
• for example, when a procedure returns a pointer to something it allocates
‣ Avoid the problem cases, if possible
• restrict dynamic allocation/free to single procedure or module, if possible
• don’t write procedures that return pointers, if possible
• use local variables instead, where possible
– we’ll see later that local variables are automatically allocated on call and freed on return
‣ Engineer for memory management, if necessary
• define rules for which procedure is responsible for deallocation, if possible
• implement explicit reference counting if multiple potential deallocators
• define rules for which pointers can be stored in data structures
• use coding conventions and documentation to ensure rules are followed
27
Co-locate Allocation and Deallocation
‣ If procedure returns value of dynamically allocated object • allocate that object in caller and pass pointer to it to callee
• good if caller does both malloc / free itself
struct MBuf *receive() {
struct MBuf *mBuf = malloc(sizeof(struct MBuf));
…
return mBuf;
}
void foo() {
struct MBuf *mb = receive();
free(mb); }
void receive(struct MBuf *mBuf) {
… Delegate problem to caller }
void foo() {
struct MBuf *mb = malloc(sizeof(struct MBuf);
receive(mb);
free(mb); }
Transfer malloc to foo
28
Another Example – Strings
‣ A string “like this” is in C • an array of characters
• the end of the string is indicated by the first 0 (null) in the array
• so, every string has a
– maximum length (the length of the array – 1)
– and a length determined by the position of the first null
‣ The standard C library has many operations on strings • e.g., strlen(s) returns the length of a string
‣ Lets consider
• how best to create a new string
• that is a copy of an existing string
29
String Copy Version 1
// in the string library
char *copy(char *s) {
int len = strlen(s);
char *d = malloc(len + 1); for (int i=0; i
sLen = dSize – 1;
for (int i=0; i < sLen; i++)
d[i] = s[i];
d[sLen] = 0; }
// in your application program
void foo(char *s) {
int len = strlen(s);
char* d = malloc(len + 1);
copy(d, s, len+1);
printf ("%s\n", d);
free(d);
}
32
But, sometimes we need more
‣ If a reference needs to be passed among multiple modules • then each module knows when it needs and doesn’t need reference
• but, what do you need to know in order to correctly free referent? • why is this bad?
• can we do better?
struct MBuf *receive() {
struct MBuf *mBuf = malloc(sizeof(struct MBuf));
...
return mBuf;
}
void foo() {
struct MBuf *mb = receive(); bar(mb);
free(mb); // ???
mb = 0;
}
33
Consider interaction between foo & bar
struct MBuf *receive() {
struct MBuf *mBuf = malloc(sizeof(struct MBuf));
// ...
return mBuf;
}
void foo() {
struct MBuf *mb = receive();
bar(mb);
// maybe do something else with mb free(mb) ??? is it okay to do this ???
}
struct MBuf *aMB = 0;
void bar(struct MBuf *mb) { if(something_complicated_or_private)
aMB = mb; }
34
Reference Counting
‣ Struct can store a reference count
• that records the number of variables that store pointers to the struct
• maintained explicitly by code the programmer adds (i.e., this is manual reference counting)
‣ Allocation of reference-counted struct • set the reference count to 1
• because the caller has a reference
‣ Saving additional references
• call rc_keep_ref to increment the reference count
‣ Just before a reference is removed
• call rc_free_ref to decrement reference count
‣ Never call free directly
• free_ref calls free automatically when the reference count goes to zero
‣ Implementation
• this is something you would have to implement yourself
• for example, need a sort of super class that implements rc_keep_ref() and rc_free_ref() • more shortly ...
35
‣ The example code then uses reference counting like this
struct MBuf *receive() {
struct MBuf *mBuf = rc_malloc(sizeof(*mBuf));
...
return mBuf;
}
void foo() {
struct MBuf *mb = receive();
bar(mb);
rc_free_ref(mb);
}
struct MBuf *aMB = 0;
void bar(struct MBuf *mb) {
// free existing reference in aMB
if(aMB != 0)
rc_free_ref(aMB);
rc_keep_ref(mb);
aMB = mb;
}
mb->ref_count==1
mb==aMB; mb->ref_count==2 mb==aMB; mb->ref_count==1
if aMB currently points to an MBuf, changing its value removes a reference to that MBuf. As this is the last reference to the MBuf, it is freed from the heap.
aMB!=mb; aMB->ref_count==1 free (aMB) is called
mb->ref_count==2
aMB==mb; aMB->ref_count==2
36
Implementing Reference Counting
refcount.h
refcount.c
}
}
alternative malloc with room for ref_count call when reference added
call when reference removed
void *rc_malloc (int nbytes);
void rc_keep_ref (void *p);
void rc_free_ref (void *p);
void *rc_malloc(int nbytes) {
}
int *ref_count = malloc(nbytes + 8);
*ref_count = 1;
return ((void *)ref_count) + 8;
void rc_keep_ref(void *p) {
int *ref_count = p – 8;
(*ref_count)++;
void rc_free_ref(void *p) {
int *ref_count = p – 8;
(*ref_count)–;
if(*ref_count == 0) free(ref_count);
SIDE NOTES: including header files, compiling multiple files, and valgrind. 37
Reference Counting Example
38
Reference Counting Limitations
‣ What if you have state you need to clean up at free()?
• Solution: Finalizers – call a function (specific to the object being allocated)
when you’re about to deallocate it
• Problem: What happens if the finalizer itself frees references? Finalizer (and free_ref) might recurse – unexpected side effect
• Problem: What happens if the finalizer needs to do something complicated? Memory management has to wait. (For example, some finalizers shut down a network connection – need to wait for network activity)
‣ What if you create a cycle in your memory?
• Solution: Garbage Collection – walk through memory and clear out stuff
that isn’t being used
• Problems: When do you run the garbage collection cycle? Can your program still run during garbage collection? What happens if you touch memory while the garbage collector is running?
39
Garbage Collection
‣ In Java objects are deallocated implicitly • the program never says free
• the runtime system tracks every object reference
• when an object is unreachable then it can be deallocated
• a garbage collector runs periodically to deallocate unreachable objects
‣ Advantage compared to explicit delete • no dangling pointers
• reference cycles are not a problem
– cycles & reference counting => memory leak – cycles are collected by garbage collector
MBuf receive() {
MBuf mBuf = new MBuf();
…
return mBuf;
}
void foo() {
MBuf mb = receive();
bar(mb);
}
40
Memory Management in Java
‣ Memory leak
• occurs when the garbage collector fails to reclaim unneeded objects
• memory is a scarce resource and wasting it can be a serous bug
• its huge problem for long-running programs where the garbage accumulates
‣ How is it possible to create a memory leak in Java? • Java can only reclaim an object if it is unreachable
• but, unreachability is only an approximation of whether an object is needed • an unneeded object in a hash table, for example, is never reclaimed
‣ The solution requires engineering
• just as in C, you must plan for memory deallocation explicitly
• unlike C, however, if you make a mistake, you can not create a dangling pointer • in Java you remove the references, Java reclaims the objects
‣ Further reading
• http://www.ibm.com/developerworks/library/j-leaks/
41
Ways to Avoid Unintended Retention
ENRICHMENT: You are not required to know this
‣ imperative approach with explicit reference annulling
• explicitly set references to NULL when referent is longer needed
• add close() or free() methods to classes you create and call them explicitly • use try-finally block to ensure that these clean-up steps are always taken • these are imperative approaches; drawbacks?
‣ declarative approach with reference objects
• refer to objects without requiring their retention
• store object references that the garbage collector can reclaim
WeakReference
Widget widget = weakRef.get() // may return NULL
• different levels of reference stickiness
– soft
– weak
– phantom
discarded only when new allocations put pressure on available memory discarded on next GC cycle when no stronger reference exists
unretrievable (get always returns NULL), used to register with GC reference queue
42
Using Reference Objects
ENRICHMENT: You are not required to know this
‣ Creating a reclaimable reference
• the Reference class is a template that be instantiated for any reference • store instances of this class instead of the original reference
void bar(MBuf mb) {
aMB = new WeakReference
}
• allows the garbage collector to collect the MBuf even if aMB points to it
‣ This does not reclaim the weak reference itself
• while the GC will reclaim the MBuf, it can’t reclaim the WeakReference
• the problem is that aMB stores a reference to WeakReference • not a big issue here, there is only one
• but, what if we store a large collection of weak references?
43
Using Reference Queues
ENRICHMENT: You are not required to know this
‣ The problem
• reference objects will be stored in data structures
• reclaiming them requires first removing them from these data structures
‣ The reference queue approach
• a reference object can have an associated reference queue
• the GC adds reference objects to the queue when it collects their referent • your code scans the queue periodically to update referring data structures
ReferenceQueue
void bar(MBuf mb) {
aMB = new WeakReference
}
void removeGarbage() {
while((WeakReference
// remove ref from data structure where it is stored
if(aMB == ref)
aMB = null;
}
44