Multi-Level Paging, Inverted Page Table, TLB
1
Reading assignment
Dinosaur Chapter 8
Comet Chapters 16, 17, 18
Midterm: 3/7 8pm-10pm EE170
2
12
[lec 13] Summary: Evolution of Memory Management (so far)
Scheme
Simple uniprogramming
Simple multiprogramming
Base & Bound
Multiple segments
How
1 segment loaded to starting address 0
1 segment relocated at loading time
Dynamic mem relocation at runtime
Dynamic mem relocation at runtime
Pros
Simple
Simple,
Multiple processes
Simple hardware, Multiple processes Protection
Sharing, Protection, multi segs/process
Cons
1 process
1 segment No protection
1 segment/process No protection External frag.
1 segment/process, External frag.
More hardware, External frag.
3
[lec13] managing memory
Simple multiprogramming – 4 drawbacks 1. No protection
2. Low utilization — Cannot relocate dynamically
Cannot do anything about holes
3. No sharing — Single segment per process 4. Entire address space needs to fit in mem
Need to swap whole, very expensive!
4
34
[lec13] Paging implementation – how does it really work?
Virtual address
> Page table
Physical address
Where to store page table? error How to use MMU?
Even small page tables too large to load into MMU
Page tables kept in mem and MMU only has their base addresses
What does MMU have to do?
Page size?
Small page -> big table
32-bit with 4k pages
Large page ->small table but large internal fragmentation 5
page table size
VPage #
offset
PPage# … . …
. PPage# …
PPage #
offset
[lec13] Deep thinking
Why does the page table we talked about so far have to be contiguous in the physical memory?
Why did a segment have to be contiguous in memory?
For a 4GB virtual address space (32 bit machine), we just need 1M PTE (~4MB), what is the big deal?
My PC has 2GB, why do we need PTEs for the entire 4GB address space?
6
56
Page table
The page table has to be consecutive in mem
Potentially large
Consecutive pages in mem hard to find
How can we be flexible?
“All computer science problems can be solved with an extra level of indirection.”
7
Two-level page tables
Dir
Page
offset
Directory
Virtual address
Page table
Physical address
table addr
.
PPage# … . …
. PPage# …
PPage #
offset
8
78
32-bit Two-Level Page Table
What does this buy us?
Have we saved memory space to store page tables?
9
Multiple-level page tables
10
9 10
Multi-level page tables
Advantages
All L2 page tables do not have to be contiguous
Note: a single L2 page table is still contiguous
All L2 page tables not have to be allocated altogether
Individual L2 page tables can be allocated on demand Individual L2 page tables can swapped out to disk
An advanced OS feature
The power of an extra level of indirection! 11
[review] How many PTEs do we need?
Worst case for 32-bit address machine
# of processes 220 (if page size is 4096 bytes)
What about 64-bit address machine? # of processes 236
Hmm, but my PC only has 1GB, 256K PTEs should be enough?!
17
11 17
Inverted Page Table
Motivation
Example: 2 processes, page table has 1M entries, 10 phy pages
(An extreme case of very low phys memory) Is there a way to save page table space?
18
Ideally,
One PTE for each physical page frame,
disregarding how many processes
Assuming rest virtual addressed not allocated/used
i.e., linear inverted page table (an array of phy pages mapped to virtual addresses)
0
k n-1
pid
vpage
19
18 19
But,
How do we do lookups with linear inverted page table?
Has to go through the entire array and compare!
0
k n-1
CPU
virtual address
physical address
pid
vpage
Translation (MMU)
Physical memory
I/O device
20
(Hashed) Inverted page tables
Virtual address
Physical address
0
k n-1
Main idea
One PTE for each
physical page frame
Hash (Vpage, pid) to
Ppage# Pros
Small page table for large address space
Cons
Lookup is difficult
Overhead of managing
hash chains, etc
Ex: 64-bit UltraSPARC,21 PowerPC
pid
vpage
offset
k
offset
pid
vpage
Inverted page table
20 21
Deep thinking
How can two processes share memory under
inverted page table?
Since the inverted page table provides only one forward mapping, it is very difficult to share memory among processes. For this reason most modern OSs use multi-level page tables.
22
Performance problem with paging
How many extra memory references to access page tables?
One-level page table?
Two-level page table?
Solution: reference locality!
In a short period of time, a process is likely accessing only a few pages
Instead of storing only page table starting address in hardware (MMU), store part of the page table 24 that is “hot”
22 24
Translation Look-aside Buffer (TLB)
Virtual address
TLB Hit
Physical address
TLB often fully set-associativeleast conflict misses Expensive typically 64 – 1024 entries
VPage #
offset
PPage# … VPage# PPage# …
. VPage# PPage# …
VPage#
Miss
Real page table
PPage #
offset
25
Bits in a TLB Entry
Common (necessary) bits
Virtual page number: match with the virtual address Physical page number
Optional (useful) bits
ASIDs — Address-space identifiers (process tags)
TLB
26
VPage# PPage# … VPage# PPage# …
. VPage# PPage# …
25 26
Handling of TLB miss:
Hardware-controlled TLB
On a TLB hit, MMU checks the valid bit
If valid, perform address translation
If invalid (e.g. page not in memory), MMU generates a page
fault
OS performs fault handling
Restart the faulting instruction
On a TLB miss
MMU parses page table and loads PTE into TLB
Needs to replace if TLB is full
Page table layout is defined by hardware design Same as hit …
27
Miss handling:
Software-controlled TLB
On a TLB hit, MMU checks the valid bit
If valid, perform address translation
If invalid (e.g. page not in memory), MMU generates a page
fault
OS performs page fault handling Restart the faulting instruction
On a TLB miss, HW raises exception, traps to the OS OS parses page table and loads PTE into TLB
Needs to replace if TLB is full
Page table layout is defined by the OS Same as in a hit…
28
27 28
Hardware vs. software controlled
Hardware approach
Efficient – TLB misses
handled by hardware
OS intervention is required only in case of page fault
Page structure prescribed by MMU hardware — rigid
Software approach
Less efficient — TLB misses are handled by software
MMU hardware very simple, permitting larger, faster TLB
OS designer has complete flexibility in choice of MM data structure
e.g. 2-level page table, inverted page table
29
Deep thinking
Without TLB, how MMU locates PTEs is pre- determined by hardware designer
With TLB, it can be flexible, e.g. software- controlled is possible
What enables this?
TLB is an extra level of indirection!
30
29 30
More TLB Issues
Which TLB entry should be replaced? Random
LRU
What happens when changing a page table entry (e.g. because of swapping, change read/write permission)?
Change the entry in memory
flush (eg. invalidate) the TLB entry INGLPG on x86
31
What happens to TLB in a process context switch?
During a process context switch, cached translations cannot be used by the next process
Invalidate all entries during a context switch Lots of TLB misses afterwards
Tag each entry with an ASID
Add a HW register that contains the process id of the current
executing process
TLB hits if an entry’s process id matches that reg
32
31 32
Cache vs. TLB
Similarities:
Both cache a part of the physical memory
Differences: Associatively
TLB is usually fully associative
Cache can be direct mapped Consistency
TLB does not deal with consistency with memory TLB can be controlled by software
33
More on consistency Issues
Snoopy cache protocols can maintain consistency with DRAM, even in the presence of DMA
No hardware maintains consistency between DRAM
and TLBs:
OS needs to flush related TLBs whenever changing a page table entry in memory
On multiprocessors, when you modify a page table entry, you need to do “TLB shoot-down” to flush all related TLB entries on all processors
34
33 34