Carnegie Mellon
Malloc Bootcamp
Adithi, Anja, Eugene June 28, 2020
Carnegie Mellon
Agenda
▪ Conceptual Overview ▪ Explicit List
▪ Segregated list
▪ Splitting, coalescing ▪ Hints on hints
▪ Advanced debugging with GDB ▪ Fun GDB tricks
▪ Writing a good heap checker ▪ Appendix
Carnegie Mellon
Conceptual Outline
Carnegie Mellon
Dynamic Memory Allocation
■ Used when
■ we don’t know at compile-time how much memory we will need
■ when a particular chunk of memory is not needed for the entire run
■ lets us reuse that memory for storing other things
■ Important terms:
■ malloc/calloc/realloc/free
■ mem_sbrk
■ payload
■ fragmentation (external vs internal)
■ Splitting / coalescing
4
Carnegie Mellon
Anonymous Structs/Unions
struct A {
int x;
struct
name
Same idea with unions. For the difference between unions and structs, refer to the C bootcamp slides.
struct B {
int y;
float z; } my_b;
} my_a;
int
my_a.x
struct B
my_a.my_b.y
member
name
struct A {
int x;
struct B {
int y;
float z; };
} my_a;
● What is the type of x?
● How do we access x or my_a?
● What is the type of my_b?
● How do we access y of my_a?
struct B
my_a.y
5
Carnegie Mellon
Zero-Length Arrays
struct line { int length;
char contents[0]; };
int main() {
struct line my_line;
printf(“sizeof(contents) = %zu\n”,
printf(“sizeof(struct line) = %zu\n”, sizeof(struct line)); // 4
}
● It’s a GCC extension – not part of the C specification!
● Must be at the end of a struct / union
● It simply allows us to represent variable-length structures
● sizeof on a zero-length array returns zero
sizeof(L.contents)); // 0
7
Carnegie Mellon
mm_init
▪ Why prologue footer and epilogue header?
▪ Payload must be 16-byte aligned
▪ But, the size of payload doesn’t have to be a multiple of 16 – just the block does!
▪ Things malloc’d must be within the prologue and epilogue
n
n +8
n
Prologue footer
Epilogue header
n+ 8
Prologue footer
Size = chunk size rounded up
0
prev
Size = chunk size rounded up
0
Epilogue header
n + 16
8
Carnegie Mellon
If We Can’t Find a Usable Free Block ■ Assume an implicit list implementation
■ Need to extend the heap ■ mem_sbrk()
■ sbrk(num_bytes) allocates space and returns pointer to start
■ sbrk(0) returns a pointer to the end of the current heap
■ For speed, extend the heap by a little more than you need immediately
■ use what you need out of the new
space, add the rest as a free block
■ What are some tradeoffs you can
make?
current brk pointer
stack
heap
uninitialized data
initialized data
program code
0
6
Carnegie Mellon
Tracking Blocks: Explicit List
▪ Maintain a list of free blocks instead of all blocks
▪ means we need to store forward/backward pointers, not just sizes
▪ we only track free blocks, so we can store the pointers in the payload area!
▪ need to store size at end of block too, for coalescing
allocated block
free block
size
1
payload and padding
size
1
size
0
next
prev
unused
size
0
10
Carnegie Mellon
Splitting a Block
■ If the block we find is larger than we need, split it and leave the remainder for a future allocation
■ explicit lists: correct previous and
next pointers
■ Segregated lists: same as
explicit
■ When would we not split a
block?
n
0
m
next
payload
prev
m
n-m
next
prev
n
0
n-m
0
11
1
1
0
Carnegie Mellon
Coalescing Memory
■ Combine adjacent blocks if both are free
■ explicit lists: look forward and backward in the heap, using block
sizes, not next/prev
■ Four cases:
block to be freed
Allocated
Allocated
Allocated
Free
Free
Allocated
Free
Free
12
Carnegie Mellon
Coalescing Memory
m1
0
next
prev
m1
0
n
1
n
1
m2
1
payload
m2
1
n+m1
0
m1
0
next
next
prev
prev
m1
n
0
1
n+m1
m2
0
1
n
m2
payload
prev
1
0
next
m2
1
m2
0
n+m1+m2
next
0
prev
n+m1+m2
0
13
Carnegie Mellon
Design Considerations
■ Finding a matching free block
■ First fit vs. next fit vs. best fit vs. “good enough” fit
■ continue searching for a closer fit after finding a big-enough free block?
■ Free block ordering
■ LIFO, FIFO, or address-ordered?
■ When to coalesce
■ while freeing a block or while searching for free memory?
■ How much memory to request with sbrk()
■ larger requests save time in system calls but increase maximum
memory use
14
Carnegie Mellon
Hints on hints
For the final, you must greatly increase the utilization and keep a high throughput.
● Reducing external fragmentation requires achieving something closer to best-fit allocated
○ Using a better fit algorithm
○ Combine with a better data structure that lets you run more
complex algorithms
● Reducing internal fragmentation requires reducing data structure overhead and using a ‘good’ free block
15
Carnegie Mellon
Segregated Lists
• Multiple explicit lists where the free blocks are of a certain size range
• Increases throughput and raises probability of choosing a better-sized block
• Need to decide what size classes (only 128 bytes of stack space)
○ Diminishing returns
○ What do you do if you can’t find something in the current size class?
• RootSizeClass1 -> free-block 1 -> free-block 2 -> free-block 3 ->
• RootSizeClass2 -> free-block 1 -> free-block 2 -> free-block 3 -> …
• RootSizeClass3 -> free-block 1 -> free-block 2 -> free-block 3 -> … • …
16
Carnegie Mellon
Modularity and Design
● Now you need to have more than one list
○ List operations are the same for all lists
■ Insert
■ Remove
○ Deciding which size class a block should go into
■ 14 if statements 🙁
■ A small const array of sizes + a loop 🙂
● It would be quite painful to maintain copy-pasted code
○ Abstractions are nice – it’s what CS is all about!
17
Carnegie Mellon
Modularity and Design
● Make sure you have modular, extensible code
○ It will save you a lot of time spent debugging and style points
○ It will make you happy when you come back to your code
■ In 6 days when you start the final submission
■ Or in 6 hours if you’re missing sleep – please get some rest!
○ It will make it easier to explain to students when you become a TA later 🙂
● Labs in this course are NOT meant to be done in one sitting
● Labs in this course are NOT meant to be done in 2-3 nights
● Plan ahead, leave plenty of time for design
○ Measure twice, cut once ● Take a break between sittings
○ Your brain can keep working subconsciously
○ Leave time for “aha!” moments
18
Carnegie Mellon
Coalescing Memory
■ Combine adjacent blocks if both are free
■ segregated lists: look forward and back using block sizes, then
■ Use the size of the coalesced block to determine the proper list ● What else might you need to do to maintain your seglists?
■ Insert into list using the insertion policy (LIFO, address-ordered, etc.)
■ Four cases:
block to be freed
Allocated
Allocated
Allocated
Free
Free
Allocated
Free
Free
19
Carnegie Mellon
Eliminate footers in allocated
b●loRcekdusces internal fragmentation (increase utilization)
● Why do we need footers?
○ Coalescing blocks
○ What kind of blocks do we coalesce?
● Do we need to know the size of a block if we’re not going to coalesce it?
● Based on that idea, can you design a method that helps you determine when to coalesce?
○ Hint: where could you store a little bit of extra information for each block?
free blocks still have footers
m1
next
0
prev
m1
m2
0
1
allocated blocks don’t have footers!
payload
m3
1
payload
20
Carnegie Mellon
Coalescing Memory
■ Combine adjacent blocks if both are free
■ footerless: if free, obtain info from footer then use next/prev
■ Four cases:
block to be freed
Allocated
Allocated
Allocated
Free
Free
Allocated
Free
Free
21
Carnegie Mellon
Decrease the minimum block size
● Reduces internal fragmentation (increase utilization)
● Currently, min block size is 32.
○ 8 byte header
○ 16 byte payload (or 2 8 byte pointers for free)
○ 8 byte footer
● If you just need to malloc(5), and the payload size
is 16 bytes, you waste 11 bytes.
● Must manage free blocks that are too small to hold
the pointers for a doubly linked free list
9 bytes are wasted!
How can we prevent this?
22
Carnegie Mellon
Debugging: GDB & The Almighty Heap Checker
Carnegie Mellon
What’s better than printf? Using GDB
● Use GDB to determine where segfaults happen!
● gdb mdriver will open the malloc driver in gdb
○ Type run and your program will run until it hits the segfault! ● step/next – (abbrev. s/n) step to the next line of code
○ next steps over function calls
● finish – continue execution until end of current function, then break
● print
results of function calls!)
○ Consider writing a heap printing function to use in GDB!
● x
○ x /a
○ See help p and help x for information about more formats
24
Carnegie Mellon
Using GDB – Fun with frames
■ backtrace – (abbrev. bt) print call stack up until current function
■ backtrace full – (abbrev. bt full) print local variables in each frame
(gdb) backtrace
#0 find_fit (…)
#1 mm_malloc (…)
#2 0x0000000000403352 in eval_mm_valid
(…) #3 run_tests (…)
#4 0x0000000000403c39 in main (…)
■ frame 1 – (abbrev. f 1) switch to mm_malloc’s stack frame ■ Good for inspecting local variables of calling functions
25
Carnegie Mellon
Using GDB – Setting breakpoints/watchpoints
■ break mm_checkheap – (abbrev. b) break on “mm_checkheap()” ■ b mm.c:25 – break on line 25 of file “mm.c” – very useful!
■ b find_fit if size == 24 – break on function “find_fit()” if the local variable “size” is equal to 24 – “conditional breakpoint”
■ watch heap_listp – (abbrev. w) break if value of “heap_listp” changes – “watchpoint”
■ w block == 0x80000010 – break if “block” is equal to this value
■ w *0x15213 – watch for changes at memory location 0x15213
■ Can be very slow
■ rwatch
■ awatch
26
What’s better than GDB? Using CGDB! ● CGDB is just like GDB
○ but with COLOR and SOURCE_CODE
● Breaking at mm_malloc in GDB vs CGDB
Colors! C Code!
GDB terminal!
Using CGDB
● Initializes the same as GDB ○ Just write cgdb mdriver-dbg
instead of gdb mdriver-dbg
Source
● Source and gdb windows
○ To go from gdb (default) to source
press esc (just like normal mode
in vim!!!!!)
○ to go from source to gdb press i
(just like insert mode in vim!!!!!)
GDB
Source Mode
● Benefits
○ see breakpoints in the file
○ reread code while debugging
○ review what is coming
● Usage
○ Very similar to vim!!
○ j – move down a line
○ k – move up a line
○ :### jump to line ###
○ /### search for ###
○ Most other vim commands!
Current Line to execute
Viewed line
● Notes
○ Green line number is current line
○ Red line number is breakpoint
Breakpoint at mm.c 610
Next Breakpoint
CGDB Misc
● GDB mode functions exactly like normal GDB!
○ All the commands you know and love work the same! ● Shark machines use version 0.6.8
○ Means unfortunately no assembly viewer 🙁 though that is not often needed. ● Website
○ https://cgdb.github.io/
● Documentation
○ https://cgdb.github.io/docs/cgdb.pdf
○ Version 0.7.1
Carnegie Mellon
Heap Checker
■ int mm_checkheap(int verbose); ■ critical for debugging
■ write this function early!
■ update it when you change your implementation
■ check all heap invariants, make sure you haven’t lost track of any part
of your heap
■ check should pass if and only if the heap is truly well-formed
■ should only generate output if a problem is found, to avoid cluttering up your program’s output
■ meant to be correct, not efficient
■ call before/after major operations when the heap should be
well-formed
31
Carnegie Mellon
Heap Invariants (Non-Exhaustive)
■ Block level
■ What are some things which should always be true of every block
in the heap?
32
Carnegie Mellon
Heap Invariants (Non-Exhaustive)
■ Block level
■ header and footer match
■ payload area is aligned, size is valid
■ no contiguous free blocks unless you defer coalescing
■ List level
■ What are some things which should always be true of every
element of a free list?
33
Carnegie Mellon
Heap Invariants (Non-Exhaustive)
■ Block level
■ header and footer match
■ payload area is aligned, size is valid
■ no contiguous free blocks unless you defer coalescing
■ List level
■ next/prev pointers in consecutive free blocks are consistent
■ no allocated blocks in free list, all free blocks are in the free list
■ no cycles in free list unless you use a circular list
■ each segregated list contains only blocks in the appropriate size
class
■ Heap level
■ What are some things that should be true of the heap as a whole? 34
Carnegie Mellon
Heap Invariants (Non-Exhaustive) ■ Block level
■ header and footer match
■ payload area is aligned, size is valid
■ no contiguous free blocks unless you defer coalescing ■ List level
■ next/prev pointers in consecutive free blocks are consistent
■ no allocated blocks in free list, all free blocks are in the free list
■ no cycles in free list unless you use a circular list
■ each segregated list contains only blocks in the appropriate size
class
■ Heap level
■ all blocks between heap boundaries, correct sentinel blocks (if 29 used)
Carnegie Mellon
How to Ask for Help
● Be specific about what the problem is, and how to cause it
○ BAD: “My program segfaults.”
○ GOOD: “I ran mdriver in gdb and it says that a segfault occurred due to
an invalid next pointer, so I set a watchpoint on the segfaulting next
pointer. How can I figure out what happened?”
○ GOOD: “My heap checker indicates that my segregated list has a block
of the wrong size in it after performing a coalesce(). Why might that be
the case?”
○ What sequence of events do you expect around the time of the
error? What part of the sequence has already happened?
● Have you written your mm_checkheap function, and is it working?
○ We WILL ask to see it! ● Use a rubber duck!
36
Carnegie Mellon
If You Get Stuck
■Please read the writeup!
■ CS:APP Chapter 9
■ View lecture notes and course FAQ at
http://www.cs.cmu.edu/~213
■ Office hours Sunday through Friday 5:30-9:30pm over Zoom. See pinned Piazza post!
■ Post a private question on Piazza
■ Obtain a rubber duck at
https://tinyurl.com/malloc-f18
37
Carnegie Mellon
APPENDIX
38
Carnegie Mellon
Internal Fragmentation
■ Occurs when the payload is smaller than the block size
■ due to alignment requirements
■ due to management overhead
■ as the result of a decision to use a larger-than-necessary block
■ Depends on the current allocations, i.e. the pattern of previous requests
39
Carnegie Mellon
Internal Fragmentation
■ Due to alignment requirements – the allocator doesn’t know how you’ll be using the memory, so it has to use the strictest alignment:
■ void *m1 = malloc(13); void *m2 = malloc(11);
■ m1 and m2 both have to be aligned on 8-byte boundaries
■
Dlue
ton
m1anpagae
myenlt
ov
earhedad1 (elaceh
cnell2isp2
baytyes)l:
o
a
d
2
40
Carnegie Mellon
External Fragmentation
■ Occurs when the total free space is sufficient, but no single free block is large enough to satisfy the request
■ Depends on the pattern of future requests
■ thus difficult to predict, and any measurement is at best an estimate
■ Less critical to malloc traces than internal fragmentation
p5 = malloc(4)
free(p1)
p6 = malloc(5) Oops! Seven bytes available, but not in one chunk….
41
Carnegie Mellon
C: Pointer Arithmetic
■ Adding an integer to a pointer is different from adding two integers
■ The value of the integer is always multiplied by the size of the type that the pointer points at
■ Example:
■ type_a *ptr = …;
■ type_a *ptr2 = ptr + a;
■ is really computing
■ ptr2 = ptr + (a * sizeof(type_a));
■ i.e. lea (ptr, a, sizeof(type_a)), ptr2
■ Pointer arithmetic on void* is undefined (what’s the size of a void?) 42
Carnegie Mellon
C: Pointer Arithmetic
■int *ptr = (int*)0x152130; int *ptr2 = ptr + 1;
■ char *ptr = (char*)0x152130; char *ptr2 = ptr + 1;
■ char *ptr = (char*)0x152130; void *ptr2 = ptr + 1;
■ char *ptr = (char*)0x152130;
char *p2 = ((char*)(((int*)ptr)+1));
43
Carnegie Mellon
C: Pointer Arithmetic
■ int *ptr = (int*)0x152130;
int *ptr2 = ptr + 1; // ptr2 is 0x152134
■ char *ptr = char *ptr2 =
■ char *ptr = void *ptr2 =
■ char *ptr =
char *p2 = ((char*)(((int*)ptr)+1));// p2 is 0x152134
(char*)0x152130;
ptr + 1; // ptr2 is 0x152131
(char*)0x152130;
ptr + 1; // ptr2 is still 0x152131
(char*)0x152130;
44
Carnegie Mellon
Dynamic Memory Allocation: Example
p1 = malloc(3)
p2 = malloc(7)
p3 = malloc(5)
free(p2)
p4 = malloc(4)
p5 = malloc(4)
45
Carnegie Mellon
The Memory-Block Information Data Structure
■ Requirements:
■ tells us where the blocks are, how big they are, and whether
they are free
■ must be able to update the data during calls to malloc and free
■ need to be able to find the next free block which is a “good enough fit” for a given payload
■ need to be able to quickly mark a block as free or allocated
■ need to be able to detect when we run out of blocks
■ what do we do in that case?
■ The only memory we have is what we’re handing out
■ …but not all of it needs to be payload! We can use part of it to store the block information.
46
Carnegie Mellon
Finding a Free Block
■ First Fit
■ search from beginning, use first block that’s big enough
■ linear time in total number of blocks
■ can cause small “splinters” at beginning of list
■ Next Fit
■ start search from where previous search finished
■ often faster than first fit, but some research suggests worse fragmentation
■ Best Fit
■ search entire list, use smallest block that’s big enough
■ keeps fragments small (less wasted memory), but slower than first
fit
42
Carnegie Mellon
Freeing Blocks
■ Simplest implementation is just clearing the “allocated” flag
■ but leads to external fragmentation root
4
4
4
4
8
free(p)
malloc(8) Oops!
p
4
4
4
4
8
48
Carnegie Mellon
Insertion Policy
■ Where do you put a newly-freed block in the free list?
■ LIFO (last-in-first-out) policy
■ add to the beginning of the free list
■ pro: simple and constant time (very fast)
block->next = freelist; freelist = block;
■ con: studies suggest fragmentation is worse ■ Address-ordered policy
■insert blocks so that free list blocks are always sorted by address addr(prev) < addr(curr) < addr(next)
■ pro: lower fragmentation than LIFO
■ con: requires search
49
Carnegie Mellon
C: Pointer Casting
■ Notation: (b*)a “casts” a to be of type b* ■ Casting a pointer doesn't change the bits!
■ type_a *ptr_a=...; type_b *ptr_b=(type_b*)ptr_a; makes ptr_a and ptr_b contain identical bits
■ But it does change the behavior when dereferencing ■ because we interpret the bits differently
■ Can cast type_a* to long/unsigned long and back
■ pointers are really just 64-bit numbers
■ such casts are important for malloclab
■ but be careful – this can easily lead to hard-to-find errors
50
Carnegie Mellon
Cycle Checking: Hare and Tortoise Algorithm
■ This algorithm detects cycles in linked lists
■ Set two pointers, called “hare” and “tortoise”, to the beginning of the list
H T
H
T
■ During each iteration, move
“hare” forward by two nodes, “tortoise” by one node
■ if “tortoise” reaches the end
of the list, there is no cycle
■ if “tortoise” equals “hare”, the list has a cycle
T
H
T
H
46
Carnegie Mellon
Debugging Tip: Using the Preprocessor
■ Use conditional compilation with #if or #ifdef to easily turn debugging code on or off
#ifdef DEBUG
#define DBG_PRINTF(...) fprintf(stderr, __VA_ARGS__) #define CHECKHEAP(verbose) mm_checkheap(verbose)
#else
#define DBG_PRINTF(...)
#define CHECKHEAP(verbose)
#endif /* DEBUG */
// comment line below to disable debug code!
#define DEBUG
void free(void *p) {
DBG_PRINTF(“freeing %p\n”, p);
CHECKHEAP(1);
...
}
47
Carnegie Mellon
Debugging Tip: Using the Preprocessor (contd)
#define DEBUG
void free(void *p) {
DBG_PRINTF(“freeing %p\n”, p);
CHECKHEAP(1);
...
}
void free(void *p) {
fprintf(stderr, “freeing %p\n”, p);
mm_checkheap(1);
...
}
𝓻 𝓶𝓪𝓰𝓲𝓬
Replaced with debug code!
𝓹𝓻𝓮𝓹𝓻𝓸𝓬𝓮𝓼
𝓼𝓸
// #define DEBUG
void free(void *p) {
DBG_PRINTF(“freeing %p\n”, p);
CHECKHEAP(1);
...
}
𝓹𝓻𝓮𝓹𝓻𝓸𝓬𝓮𝓼
𝓼𝓸
void free(void *p) {
...
}
𝓻 𝓶𝓪𝓰𝓲𝓬
Debug code gone!
48
Carnegie Mellon
16 byte
Header Reduction
● Note: this is completely optional and generally discouraged due to its relative difficulty
○ Do NOT attempt unless you are satisfied with your implementation as-is
● When to use 8 or 4 byte header? (must support all possible block sizes)
● If 4 byte, how to ensure that payload is aligned?
● Arrange accordingly
● How to coalesce if 4 byte header block is followed
by 8 byte header block?
● Store extra information in headers
footerless
hd1
payload
hd1
free
hd1
0
ftr1
0
hd2
1
54
1
1