24. Virtual Memory: TLB and Caches
EECS 370 – Introduction to Computer Organization – Fall 2020
Satish Narayanasamy
EECS Department
University of Michigan in Ann Arbor, USA
© Narayanasamy 2020
The material in this presentation cannot be copied in any form without written permission
Final Exam
Online exam through Gradescope
Practice exam on Gradescope will be made available
Topics
Strong emphasis on topics since the midterm Pipelining
Branch prediction Caches
Virtual memory
2
Virtual Memory: An Example
Virtual memory
(2^20 bytes = 256 pages)
Physical Memory (16 KB = 4 pages)
Virtual address Physical address
Page size = 4 KB
Virtual page number
19 11 0
Virtual memory: 2^20 bytes 2^20 / 4 KB = 256 pages
Page Table (256 entries)
Physical page number Page offset
13 11 0
Page offset
Physical Memory: 16 KB (16 KB / 4 KB = 4 pages)
Disk
(swap partition)
3
Recap
Address Translation
Virtual address = 0x000040F3
Virtual page number
Page offset
0x00004
0x0F3
Every instruction fetch, load/store needs to do address translation
(optimized later with virtually-addressed caches)
Translation Process
0x020C0
0x0F3
Page offset Physical address = 0x020C00F3
Physical page number
4
Recap
Page table lookups for address translation
N-level page table
Each address translation requires N memory accesses, one per level
Single-level page table
(holds the base address of the page-table)
Virtual address = 0x 00004 0F3
5
Recap
Problem: Address translation overhead
Address translation is on the critical path
Can be done only after virtual address is known
Need to done before accessing memory (optimized later with virtually-addressed caches)
Address translation requires accesses to the page table(s) in physical memory A memory access (instruction fetch, load/store) performs N additional memory accesses,
where N is the number of level in a hierarchical page-table
Slow
After address translation, memory hierarchy is accessed to perform memory access
6
Solution: Translation look-aside buffer (TLB)
TLB is a special cache for page-tables.
Speeds-up address translation by reducing main memory accesses to page tables. On a TLB miss, access page-table(s) in main memory
Stores a small subset of valid page table entries. 16-512 entries common.
Typically, has low miss rate (< 1%).
7
Translation look-aside buffer (TLB)
Virtual address
Virtual page number
Page offset
v
tag
Physical page
Physical page
Page offset
TLB (Fully associative)
8
Physical address
Putting it all together
OS: loading program in memory Creates a new process P
Constructs a page table for P
Marks all page table entries as invalid with a pointer to the disk image of the program That is, point to the executable file containing the binary.
Runs the program
Will get an immediate page fault on the first instruction (everything is on disk initially)
9
Loading a program into memory
Page size = 4 KB, Page table entry size = 4 B
Page table register points to physical address 0x0000
D0 D1 D2 D3
D1000 D1001 D1002 D1003
M0 M1 M2 M3
VPN PPN
Disk Pages
Memory
2 entry TLB
Permission
Page Table
0 1 2 3 4 5 6 7
10
Physical Refs
References 0x0000 0x0004 0x7FFC 0x0008 0x2134
text1
text2
global data
Loading a program into memory
D0 D1 D2 D3
D1000 D1001 D1002 D1003
M0 M1 M2 M3
VPN PPN
Disk Pages
Memory
2 entry TLB
Permission
Page Table
0 1 2 3 4 5 6 7
Physical Refs
References 0x0000 0x0004 0x7FFC 0x0008 0x2134
text1
text2
global data
11
Step 1: Read executable header & initialize page table
D0 D1 D2 D3
D1000 D1001 D1002 D1003
M0 M1 M2 M3
VPN PPN
Disk Pages
Memory
2 entry TLB
Permission
reserved
Physical Refs
D1000
D1001
D1002
no map
no map
no map
no map
no map
0 1 2 3 4 5 6 7
Page Table
ro ro
References 0x0000 0x0004 0x7FFC 0x0008 0x2134
text1
text2
global data
12
Step 2: Load PC from header & start execution
D0 D1 D2 D3
D1000 D1001 D1002 D1003
M0 M1 M2 M3
VPN PPN
Disk Pages
Memory
2 entry TLB
Permission
MISS!
reserved
Physical Refs
D1000
D1001
D1002
no map
no map
no map
no map
no map
0 1 2 3 4 5 6 7
Page Table
ro ro
References
0x0000
0x0004 0x7FFC 0x0008 0x2134
text1
text2
global data
13
Fetching instruction 0000
D0 D1 D2 D3
D1000 D1001 D1002 D1003
M0 M1 M2 M3
VPN PPN
Permission
Disk Pages
Memory
2 entry TLB
reserved
Physical Refs
0x0000 Page fault
D1000
D1001
D1002
no map
no map
no map
no map
no map
0 1 2 3 4 5 6 7
Page Table
ro ro
References
0x0000
0x0004 0x7FFC 0x0008 0x2134
text1
text2
global data
14
Fetching instruction 0000
D0 D1 D2 D3
D1000 D1001 D1002 D1003
M0 M1 M2 M3
VPN PPN
Permission
Disk Pages
Memory
2 entry TLB
reserved
text1
0
M1
ro
Physical Refs
0x0000 Page fault
M1
D1001
D1002
no map
no map
no map
no map
no map
0 1 2 3 4 5 6 7
Page Table
ro ro
References
0x0000
0x0004 0x7FFC 0x0008 0x2134
text1
text2
global data
15
Fetching instruction 0000
D0 D1 D2 D3
D1000 D1001 D1002 D1003
M0 M1 M2 M3
VPN PPN
Permission
Disk Pages
Memory
2 entry TLB
reserved
text1
0
M1
ro
Physical Refs
0x0000 Page fault 0x1000
M1
D1001
D1002
no map
no map
no map
no map
no map
0 1 2 3 4 5 6 7
Page Table
ro ro
References
0x0000
0x0004 0x7FFC 0x0008 0x2134
text1
text2
global data
16
Fetching instruction 0004
D0 D1 D2 D3
D1000 D1001 D1002 D1003
M0 M1 M2 M3
VPN PPN
Permission
HIT!
Disk Pages
Memory
2 entry TLB
reserved
text1
0
M1
ro
Physical Refs
0x0000 Page fault 0x1000 0x1004
M1
D1001
D1002
no map
no map
no map
no map
no map
0 1 2 3 4 5 6 7
Page Table
ro ro
References 0x0000 0x0004 0x7FFC 0x0008 0x2134
text1
text2
global data
17
Reference 7FFC
Disk Pages
Memory
2 entry TLB
D0 D1 D2 D3
D1000 D1001 D1002 D1003
M0 M1 M2 M3
VPN PPN
Permission
MISS!
reserved
text1
0
M1
ro
Physical Refs
0x0000 Page fault 0x1000 0x1004
M1
D1001
D1002
no map
no map
no map
no map
no map
0 1 2 3 4 5 6 7
Page Table
ro ro
References 0x0000 0x0004 0x7FFC 0x0008 0x2134
text1
text2
global data
18
Reference 7FFC
Disk Pages
Memory
2 entry TLB
D0 D1 D2 D3
D1000 D1001 D1002 D1003
M0 M1 M2 M3
VPN PPN
Permission
reserved
text1
0
M1
ro
Physical Refs
0x0000
Page fault 0x1000 0x1004 0x001C
No map page fault
M1
D1001
D1002
no map
no map
no map
no map
no map
0 1 2 3 4 5 6 7
Page Table
ro ro
References 0x0000 0x0004 0x7FFC 0x0008 0x2134
text1
text2
global data
19
Reference 7FFC
Disk Pages
Memory
2 entry TLB
D0 D1 D2 D3
D1000 D1001 D1002 D1003
M0 M1 M2 M3
VPN PPN
Permission
reserved
text1
set to 0s
0
M1
ro
7
M2
rw
Physical Refs
0x0000
Page fault 0x1000 0x1004 0x001C
No map page fault 0x2FFC
M1
D1001
D1002
no map
no map
no map
no map
M2
0 1 2 3 4 5 6 7
Page Table
ro ro
References 0x0000 0x0004 0x7FFC 0x0008 0x2134
text1
text2
global data
20
Fetching instruction 0008
D0 D1 D2 D3
D1000 D1001 D1002 D1003
M0 M1 M2 M3
VPN PPN
Permission
HIT!
Disk Pages
Memory
2 entry TLB
reserved
text1
set to 0s
0
M1
ro
7
M2
rw
Physical Refs
0x0000
Page fault 0x1000 0x1004 0x001C
No map page fault 0x2FFC 0x1008
M1
D1001
D1002
no map
no map
no map
no map
M2
0 1 2 3 4 5 6 7
Page Table
ro ro
References 0x0000 0x0004 0x7FFC 0x0008 0x2134
text1
text2
global data
21
Reference 2134
Disk Pages
Memory
2 entry TLB
D0 D1 D2 D3
D1000 D1001 D1002 D1003
M0 M1 M2 M3
VPN PPN
Permission
MISS!
reserved
text1
set to 0s
0
M1
ro
7
M2
rw
Physical Refs
0x0000
Page fault 0x1000 0x1004 0x001C
No map page fault 0x2FFC 0x1008
M1
D1001
D1002
no map
no map
no map
no map
M2
0 1 2 3 4 5 6 7
Page Table
ro ro
References 0x0000 0x0004 0x7FFC 0x0008 0x2134
text1
text2
global data
22
Reference 2134
Disk Pages
Memory
2 entry TLB
D0 D1 D2 D3
D1000 D1001 D1002 D1003
M0 M1 M2 M3
VPN PPN
Permission
reserved
text1
set to 0s
0
M1
ro
7
M2
rw
Physical Refs
0x0000
Page fault 0x1000 0x1004 0x001C
No map page fault 0x2FFC 0x1008 0x0008
Page fault
M1
D1001
D1002
no map
no map
no map
no map
M2
0 1 2 3 4 5 6 7
Page Table
ro ro
References 0x0000 0x0004 0x7FFC 0x0008 0x2134
text1
text2
global data
23
Reference 2134
Disk Pages
Memory
2 entry TLB
D0 D1 D2 D3
D1000 D1001 D1002 D1003
M0 M1 M2 M3
VPN PPN
Permission
reserved
text1
set to 0s
global data
0
M1
ro
2
M3
rw
M1
D1001
M3
no map
no map
no map
no map
M2
0 1 2 3 4 5 6 7
Page Table
ro ro
References 0x0000 0x0004 0x7FFC 0x0008 0x2134
text1
text2
global data
24
Physical Refs
0x0000
Page fault 0x1000 0x1004 0x0007
No map page fault 0x2FFC 0x1008 0x0008
Page fault 0x3134
Can we skip address translation? Virtually-addressed caches
25
Address translation in a pipeline
Load/store: Memory stage Instruction fetch: Fetch stage
When to do address translation?
After VA is computed, but before memory access is performed
Is it possible to skip address translation for some memory accesses? Yes. Answer: Virtually-addressed caches.
26
Option 1: Address translation before cache access
Physically-addressed Cache
CPU
virtual address
VA
PA hit
cache
memory
Slower
Low complexity
miss
TLB
page table
Use physical address to access cache (tag, set index, and block offset bits)
Address translation only for all accesses
27
Physically addressed caches Step 1: Virtual address to Physical
Virtual address
Virtual page number
Page offset
tag
tag
tag
Physical Page Number
Physical Page Number
Physical Page Number
tag
Physical Page Number
TLB
Step 2: Access the cache with physical address obtained from translation
Cache
tag comp
Set0 tag
Set0 tag
Set1 tag
Set1 tag
Set2 tag
Set2 tag
Physical address
Physical Page Number
Page offset
tag comp
Tag
Set Index
Block offset
2-way set associative cache
28
Option 2: Address translation after cache access
Virtually-addressed Cache
High complexity
(process isolation is hard)
Faster
Use virtual address to access cache (tag, set index, and block offset bits)
Address translation only for cache misses
CPU
virtual address
VA PA
cache
miss
hit
memory
TLB
page table
29
Option 2: Address translation after cache access
Virtually-addressed Cache
High complexity
(process isolation is hard)
Faster
CPU
virtual address
cache
miss PA hit
memory
TLB
page table
TLB can be accessed in parallel VA with cache lookup.
Saves time on a cache miss, where physical address is needed to access main memory
But wasteful TLB accesses on cache hits
30
Virtually addressed caches
Virtual address
tag comp
tag
set index
block offset
Set0 tag
Set0 tag
Set1 tag
Set1 tag
Set2 tag
Set2 tag
tag comp
2-way set associative cache
31
Address translation: Before or After Cache Access
Physically-addressed cache
Generate virtual address ->
Access TLB ->
Access cache ->
if cache miss, access main memory
Before Cache
Virtually-addressed cache
Generate virtual address -> Access cache ->
if cache miss, access TLB -> Access main memory
After Cache
32
Tradeoffs
Physically-addressed caches: Slow; Simple Vs
Virtually-addressed caches: Fast; Complex
33
Physical Vs Virtual Caches: Latency
Physically-addressed caches
Cache is accessed with physical address (after VM translations).
– Slow.Cachecanbeaccessedonlyafteraddresstranslation – Inefficient,becauseallaccessesneedaddresstranslation
Virtually-addressed caches
Cache is accessed with virtual address (before VM translation).
– Fast. Skips address translation for cache hits.
– Efficient,becauseonlycachemissesneedaddresstranslation
34
Physical Vs Virtual Caches: Complexity
Problem for virtually-addressed cache:
The same virtual address refers to different physical addresses in two different processes.
To ensure process isolation, on a context switch:
Virtual cache need to be invalidated. Dirty cache blocks written back.
Physical cache need not be invalidated.
So, physical cache incurs fewer cache misses if context switches are very frequent (but generally they are not)
35
Typically, in modern processors
Level-1 (L1) caches are virtually-addressed. L1 needs to be fast.
L2 and L3 are physically-addressed. Larger structures.
Checking TLB before accessing them does not significantly affect their latency.
36
Physically addressed caches: detailed flow
CPU
virtual address 1 to 100s cycles
~ 1 cycle
VA
PA hit
1-10s cycles
cache
miss (cached or not)
valid page (update TLB)
1,000,000s cycles
invalid
page disk
bring in page (update memory, page table & TLB)
TLB
hit
page table
100s cycles
miss memory
data to processor
37
Virtually addressed caches: detailed flow
CPU
cache
hit
1-10s cycles
~ 1 cycle
virtual address
1 to 100s cycles (cached or not)
miss
valid page (update TLB)
1,000,000s cycles
invalid
page disk
bring in page (update memory, page table & TLB)
miss
page table
TLB
VA
PA hit
100s cycles
memory
(update cache)
data to processor
38
OS Support for Virtual Memory
OS must be able to modify the page table register, update page table values, etc.
To enable the OS to do this, BUT not the user program, we have different execution modes for a process.
• Executive (or supervisor or kernel level) permissions and
• User level permissions.
39
References (not part of Course Syllabus)
See how Intel’s memory management hardware works, Intel x86 Software Manual:
http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32- architectures-software-developer-vol-3a-part-1-manual.html
Chapter 4 is on Paging
Linux page table management:
https://www.kernel.org/doc/gorman/html/understand/un`derstand006.html
40