Virtual Memory III CSE 351 Autumn 2016
Virtual Memory III
https://xkcd.com/648/
CMPT 295
L21 Virtual Memory
Address Translation: Page Hit
2
1) Processor sends virtual address to MMU (memory management unit)
2-3) MMU fetches PTE from page table in cache/memory
(Uses PTBR to find beginning of page table for current process)
4) MMU sends physical address to cache/memory requesting data
5) Cache/memory sends data to processor
MMU
Cache/
Memory
PA
Data
CPU
VA
CPU Chip
PTEA
PTE
1
2
3
4
5
VA = Virtual Address PTEA = Page Table Entry Address PTE= Page Table Entry
PA = Physical Address Data = Contents of memory stored at VA originally requested by CPU
CMPT 295
L21 Virtual Memory
2
Address Translation: Page Fault
3
1) Processor sends virtual address to MMU
2-3) MMU fetches PTE from page table in cache/memory
4) Valid bit is zero, so MMU triggers page fault exception
5) Handler identifies victim (and, if dirty, pages it out to disk)
6) Handler pages in new page and updates PTE in memory
7) Handler returns to original process, restarting faulting instruction
MMU
Cache/
Memory
CPU
VA
CPU Chip
PTEA
PTE
1
2
3
4
5
Disk
Page fault handler
Victim page
New page
Exception
6
7
CMPT 295
L21 Virtual Memory
3
Hmm… Translation Sounds Slow
The MMU accesses memory twice: once to get the PTE for translation, and then again for the actual memory request
The PTEs may be cached in L1 like any other memory word
But they may be evicted by other data references
And a hit in the L1 cache still requires 1-3 cycles
What can we do to make this faster?
Solution: add another cache! 🎉
4
CMPT 295
L21 Virtual Memory
Speeding up Translation with a TLB
Translation Lookaside Buffer (TLB):
Small hardware cache in MMU
Split VPN into TLB Tag and TLB Index based on # of sets in TLB
Maps virtual page numbers to physical page numbers
Stores page table entries for a small number of pages
Modern Intel processors have 128 or 256 entries in TLB
Much faster than a page table lookup in cache/memory
5
Virtual Page Number
Page offset
TLBT
TLBI
TLB
PTE
TLBT
PTE
PTE
PTE
Set
0
1
V
TLBT
V
TLBT
V
V
TLBT
CMPT 295
L21 Virtual Memory
“The architecture of the IBM System/370” R.P. Case and A. Padegs Communications of the ACM. 21:1, 73-96, January 1978.
Perhaps the first paper to use the term translation lookaside buffer. The name arises from the historical name for a cache, which was a lookaside buffer as called by those developing the Atlas system at the University of Manchester; a cache of address translations thus became a translation lookaside buffer. Even though the term lookaside buffer fell out of favor, TLB seems to have stuck, for whatever reason.
“x86info -c” on Intel Core2 Duo CPU:
L1 Data TLB: 4KB pages, 4-way set associative, 16 entries
Data TLB: 4K pages, 4-way associative, 256 entries.
L1 Data TLB: 4MB pages, 4-way set associative, 16 entries
Data TLB: 4MB pages, 4-way associative, 32 entries
5
TLB Hit
A TLB hit eliminates a memory access!
6
MMU
Cache/
Memory
PA
Data
CPU
VA
CPU Chip
PTE
1
2
4
5
TLB
VPN
3
TLB
PTE
VPN
→
PTE
VPN
→
PTE
VPN
→
CMPT 295
L21 Virtual Memory
6
TLB Miss
A TLB miss incurs an additional memory access (the PTE)
Fortunately, TLB misses are rare
7
MMU
Cache/
Memory
PA
Data
CPU
VA
CPU Chip
PTE
1
2
5
6
TLB
VPN
4
PTEA
3
TLB
PTE
VPN
→
PTE
VPN
→
PTE
VPN
→
CMPT 295
L21 Virtual Memory
Does a TLB miss require disk access?
7
Fetching Data on a Memory Read
Check TLB
Input: VPN, Output: PPN
TLB Hit: Fetch translation, return PPN
TLB Miss: Check page table (in memory)
Page Table Hit: Load page table entry into TLB
Page Fault: Fetch page from disk to memory, update
corresponding page table entry, then load entry into TLB
Check cache
Input: physical address, Output: data
Cache Hit: Return data value to processor
Cache Miss: Fetch data value from memory, store it in cache, return it to processor
8
CMPT 295
L21 Virtual Memory
8
Address Translation
9
Virtual Address
TLB Lookup
Check the
Page Table
Update
TLB
Page Fault
(OS loads page)
Protection
Check
Physical
Address
TLB Miss
TLB Hit
Page not
in Mem
Access
Denied
Access
Permitted
Protection
Fault
SIGSEGV
Page
in Mem
Check cache
Find in Disk
Find in Mem
Hit
Miss
CMPT 295
L21 Virtual Memory
CS252 S05
9
Need to restart instruction.
Soft and hard page faults.
Address Manipulation
10
Page offset
Page Offset
Virtual Page Number
TLB Index
request from CPU:
-bit physical address:
split to access TLB:
(on TLB miss) access PT:
-bit virtual address
Page offset
Physical Page Number
Offset
Cache Index
TLB Tag
Cache Tag
split to access cache:
TRANSLATION
CMPT 295
L21 Virtual Memory
CS252 S05
10
Need to restart instruction.
Soft and hard page faults.
Context Switching Revisited
What needs to happen when the CPU switches processes?
Registers:
Save state of old process, load state of new process
Including the Page Table Base Register (PTBR)
Memory:
Nothing to do! Pages for processes already exist in memory/disk and protected from each other
TLB:
Invalidate all entries in TLB – mapping is for old process’ VAs
Cache:
Can leave alone because storing based on PAs – good for shared data
11
CMPT 295
L21 Virtual Memory
Summary of Address Translation Symbols
Basic Parameters
Number of addresses in virtual address space
Number of addresses in physical address space
Page size (bytes)
Components of the virtual address (VA)
VPO Virtual page offset
VPN Virtual page number
TLBI TLB index
TLBT TLB tag
Components of the physical address (PA)
PPO Physical page offset (same as VPO)
PPN Physical page number
12
CMPT 295
L21 Virtual Memory
12
Address Translation
VM is complicated, but also elegant and effective
Level of indirection to provide isolated memory & caching
TLB as a cache of page tables
avoids two trips to memory
for every memory access
13
Virtual Address
TLB
Lookup
Check the
Page Table
Update
TLB
Page Fault
(OS loads page)
Protection
Check
Physical
Address
TLB Miss
TLB Hit
Page not
in Mem
Access
Denied
Access
Permitted
Protection
Fault
SIGSEGV
Page
in Mem
Check cache
Find in Disk
Find in Mem
CMPT 295
L21 Virtual Memory
CS252 S05
13
Need to restart instruction.
Soft and hard page faults.
Simple Memory System Example (small)
Addressing
14-bit virtual addresses
12-bit physical address
Page size = 64 bytes
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
VPO
VPN
Virtual Page Number
Virtual Page Offset
11
10
9
8
7
6
5
4
3
2
1
0
PPO
PPN
Physical Page Number
Physical Page Offset
CMPT 295
L21 Virtual Memory
Simple Memory System: Page Table
Only showing first 16 entries (out of _____)
Note: showing 2 hex digits for PPN even though only 6 bits
Note: other management bits not shown, but part of PTE
15
VPN PPN Valid
0 28 1
1 – 0
2 33 1
3 02 1
4 – 0
5 16 1
6 – 0
7 – 0
VPN PPN Valid
8 13 1
9 17 1
A 09 1
B – 0
C – 0
D 2D 1
E – 0
F 0D 1
CMPT 295
L21 Virtual Memory
Size of page table = 2^(n-p) entries
15
Simple Memory System: TLB
16 entries total
4-way set associative
16
13
12
11
10
9
8
7
6
5
4
3
2
1
0
virtual page offset
virtual page number
TLB index
TLB tag
0
–
02
1
34
0A
1
0D
03
0
–
07
3
0
–
03
0
–
06
0
–
08
0
–
02
2
0
–
0A
0
–
04
0
–
02
1
2D
03
1
1
02
07
0
–
00
1
0D
09
0
–
03
0
Valid
PPN
Tag
Valid
PPN
Tag
Valid
PPN
Tag
Valid
PPN
Tag
Set
Why does the TLB ignore the page offset?
CMPT 295
L21 Virtual Memory
Calculate VPN for valid TLB entry using Tag & Set
16
Simple Memory System: Cache
Direct-mapped with = 4 B, = 16
Physically addressed
17
11
10
9
8
7
6
5
4
3
2
1
0
physical page offset
physical page number
cache offset
cache index
cache tag
Note: It is just coincidence that the PPN is the same width as the cache Tag
Index Tag Valid B0 B1 B2 B3
0 19 1 99 11 23 11
1 15 0 – – – –
2 1B 1 00 02 04 08
3 36 0 – – – –
4 32 1 43 6D 8F 09
5 0D 1 36 72 F0 1D
6 31 0 – – – –
7 16 1 11 C2 DF 03
Index Tag Valid B0 B1 B2 B3
8 24 1 3A 00 51 89
9 2D 0 – – – –
A 2D 1 93 15 DA 3B
B 0B 0 – – – –
C 12 0 – – – –
D 16 1 04 96 34 15
E 13 1 83 77 1B D3
F 14 0 – – – –
CMPT 295
L21 Virtual Memory
17
Current State of Memory System
Cache:
TLB:
Page table (partial):
Index Tag V B0 B1 B2 B3
0 19 1 99 11 23 11
1 15 0 – – – –
2 1B 1 00 02 04 08
3 36 0 – – – –
4 32 1 43 6D 8F 09
5 0D 1 36 72 F0 1D
6 31 0 – – – –
7 16 1 11 C2 DF 03
Index Tag V B0 B1 B2 B3
8 24 1 3A 00 51 89
9 2D 0 – – – –
A 2D 1 93 15 DA 3B
B 0B 0 – – – –
C 12 0 – – – –
D 16 1 04 96 34 15
E 13 1 83 77 1B D3
F 14 0 – – – –
Set Tag PPN V Tag PPN V Tag PPN V Tag PPN V
0 03 – 0 09 0D 1 00 – 0 07 02 1
1 03 2D 1 02 – 0 04 – 0 0A – 0
2 02 – 0 08 – 0 06 – 0 03 – 0
3 07 – 0 03 0D 1 0A 34 1 02 – 0
VPN PPN V
0 28 1
1 – 0
2 33 1
3 02 1
4 – 0
5 16 1
6 – 0
7 – 0
VPN PPN V
8 13 1
9 17 1
A 09 1
B – 0
C – 0
D 2D 1
E – 0
F 0D 1
CMPT 295
L21 Virtual Memory
TLB and Cache are separate pieces of hardware; page table stored in memory
Show correlation between TLB & page table (VPN 0x0D and 0x0F in particular)
18
Memory Request Example #1
Virtual Address: 0x03D4
Physical Address:
19
TLBI
TLBT
0
13
0
12
0
11
0
10
1
9
1
8
1
7
1
6
0
5
1
4
0
3
1
2
0
1
0
0
VPO
VPN
11
10
9
8
7
6
5
4
3
2
1
0
PPO
PPN
CO
CI
CT
VPN ______ TLBT _____ TLBI _____ TLB Hit? ___ Page Fault? ___ PPN _____
CT ______ CI _____ CO _____ Cache Hit? ___ Data (byte) _______
Note: It is just coincidence that the PPN is the same width as the cache Tag
CMPT 295
L21 Virtual Memory
TLB Hit, Cache Hit
19
Memory Request Example #2
Virtual Address: 0x038F
Physical Address:
20
TLBI
TLBT
0
13
0
12
0
11
0
10
1
9
1
8
1
7
0
6
0
5
0
4
1
3
1
2
1
1
1
0
VPO
VPN
11
10
9
8
7
6
5
4
3
2
1
0
PPO
PPN
CO
CI
CT
VPN ______ TLBT _____ TLBI _____ TLB Hit? ___ Page Fault? ___ PPN _____
CT ______ CI _____ CO _____ Cache Hit? ___ Data (byte) _______
Note: It is just coincidence that the PPN is the same width as the cache Tag
CMPT 295
L21 Virtual Memory
TLB Miss, page fault
20
Memory Request Example #3
Virtual Address: 0x0020
Physical Address:
21
TLBI
TLBT
0
13
0
12
0
11
0
10
0
9
0
8
0
7
0
6
1
5
0
4
0
3
0
2
0
1
0
0
VPO
VPN
11
10
9
8
7
6
5
4
3
2
1
0
PPO
PPN
CO
CI
CT
VPN ______ TLBT _____ TLBI _____ TLB Hit? ___ Page Fault? ___ PPN _____
CT ______ CI _____ CO _____ Cache Hit? ___ Data (byte) _______
Note: It is just coincidence that the PPN is the same width as the cache Tag
CMPT 295
L21 Virtual Memory
TLB Miss, Page Table Hit, Cache Miss
21
Memory Request Example #4
Virtual Address: 0x036B
Physical Address:
22
TLBI
TLBT
0
13
0
12
0
11
0
10
1
9
1
8
0
7
1
6
1
5
0
4
1
3
0
2
1
1
1
0
VPO
VPN
11
10
9
8
7
6
5
4
3
2
1
0
PPO
PPN
CO
CI
CT
VPN ______ TLBT _____ TLBI _____ TLB Hit? ___ Page Fault? ___ PPN _____
CT ______ CI _____ CO _____ Cache Hit? ___ Data (byte) _______
Note: It is just coincidence that the PPN is the same width as the cache Tag
CMPT 295
L21 Virtual Memory
TLB Hit, Cache Hit
22
Memory Overview
23
Disk
Main memory
(DRAM)
Cache
CPU
Page
Page
Line
Block
requested 32-bits
movl 0x8043ab, %rdi
TLB
MMU
CMPT 295
L21 Virtual Memory
Page Table Reality
Just one issue… the numbers don’t work out for the story so far!
The problem is the page table for each process:
Suppose 64-bit VAs, 8 KiB pages, 8 GiB physical memory
How many page table entries is that?
About how long is each PTE?
Moral: Cannot use this naïve implementation of the virtual→physical page mapping – it’s way too big
24
This is extra (non-testable) material
CMPT 295
L21 Virtual Memory
2^64/2^13 = 2^51
PPN = 20 bits; plus Valid, R/W/X; so ~ 3 B per PTE
So each page table takes up 2^52 + 2^51 B!
A Solution: Multi-level Page Tables
25
Page table
base register
(PTBR)
VPN 1
0
p-1
n-1
VPO
VPN 2
…
VPN k
PPN
0
p-1
m-1
PPO
PPN
Virtual Address
Physical Address
…
…
Level 1
page table
Level 2
page table
Level k
page table
TLB
PTE
VPN
→
PTE
VPN
→
PTE
VPN
→
This is called a page walk
This is extra (non-testable) material
CMPT 295
L21 Virtual Memory
Multi-level Page Tables
A tree of depth where each node at depth has up to children if part of the VPN has bits
Hardware for multi-level page tables inherently more complicated
But it’s a necessary complexity – 1-level does not fit
Why it works: Most subtrees are not used at all, so they are never created and definitely aren’t in physical memory
Parts created can be evicted from cache/memory when not being used
Each node can have a size of ~1-100KB
But now for a -level page table, a TLB miss requires cache/memory accesses
Fine so long as TLB misses are rare – motivates larger TLBs
26
This is extra (non-testable) material
CMPT 295
L21 Virtual Memory
Practice VM Question
Our system has the following properties
1 MiB of physical address space
4 GiB of virtual address space
32 KiB page size
4-entry fully associative TLB with LRU replacement
Fill in the following blanks:
27
________ Entries in a page table ________ Minimum bit-width of PTBR
________ TLBT bits ________ Max # of valid entries in a page table
CMPT 295
L21 Virtual Memory
Practice VM Question
One process uses a page-aligned square matrix mat[] of 32-bit integers in the code shown below:
#define MAT_SIZE = 2048
for(int i = 0; i < MAT_SIZE; i++)
mat[i*(MAT_SIZE+1)] = i;
What is the largest stride (in bytes) between successive memory accesses (in the VA space)?
28
CMPT 295
L21 Virtual Memory
28
Practice VM Question
One process uses a page-aligned square matrix mat[] of 32-bit integers in the code shown below:
#define MAT_SIZE = 2048
for(int i = 0; i < MAT_SIZE; i++)
mat[i*(MAT_SIZE+1)] = i;
Assuming all of mat[] starts on disk, what are the following hit rates for the execution of the for-loop?
29
________ TLB Hit Rate ________ Page Table Hit Rate
CMPT 295
L21 Virtual Memory
/docProps/thumbnail.jpeg