程序代写代做代考 html x86 kernel cache 24. Virtual Memory: TLB and Caches

24. Virtual Memory: TLB and Caches
EECS 370 – Introduction to Computer Organization – Fall 2020
Satish Narayanasamy
EECS Department
University of Michigan in Ann Arbor, USA
© Narayanasamy 2020
The material in this presentation cannot be copied in any form without written permission

Final Exam
Online exam through Gradescope
Practice exam on Gradescope will be made available
Topics
Strong emphasis on topics since the midterm Pipelining
Branch prediction Caches
Virtual memory
2

Virtual Memory: An Example
Virtual memory
(2^20 bytes = 256 pages)
Physical Memory (16 KB = 4 pages)
Virtual address Physical address
Page size = 4 KB
Virtual page number
19 11 0
Virtual memory: 2^20 bytes 2^20 / 4 KB = 256 pages
Page Table (256 entries)
Physical page number Page offset
13 11 0
Page offset
Physical Memory: 16 KB (16 KB / 4 KB = 4 pages)
Disk
(swap partition)
3
Recap

Address Translation
Virtual address = 0x000040F3
Virtual page number
Page offset
0x00004
0x0F3
Every instruction fetch, load/store needs to do address translation
(optimized later with virtually-addressed caches)
Translation Process
0x020C0
0x0F3
Page offset Physical address = 0x020C00F3
Physical page number
4
Recap

Page table lookups for address translation
N-level page table
Each address translation requires N memory accesses, one per level
Single-level page table
(holds the base address of the page-table)
Virtual address = 0x 00004 0F3
5
Recap

Problem: Address translation overhead
Address translation is on the critical path
Can be done only after virtual address is known
Need to done before accessing memory (optimized later with virtually-addressed caches)
Address translation requires accesses to the page table(s) in physical memory A memory access (instruction fetch, load/store) performs N additional memory accesses,
where N is the number of level in a hierarchical page-table
Slow
After address translation, memory hierarchy is accessed to perform memory access
6

Solution: Translation look-aside buffer (TLB)
TLB is a special cache for page-tables.
Speeds-up address translation by reducing main memory accesses to page tables. On a TLB miss, access page-table(s) in main memory
Stores a small subset of valid page table entries. 16-512 entries common.
Typically, has low miss rate (< 1%). 7 Translation look-aside buffer (TLB) Virtual address Virtual page number Page offset v tag Physical page Physical page Page offset TLB (Fully associative) 8 Physical address Putting it all together OS: loading program in memory Creates a new process P Constructs a page table for P Marks all page table entries as invalid with a pointer to the disk image of the program That is, point to the executable file containing the binary. Runs the program Will get an immediate page fault on the first instruction (everything is on disk initially) 9 Loading a program into memory Page size = 4 KB, Page table entry size = 4 B Page table register points to physical address 0x0000 D0 D1 D2 D3 D1000 D1001 D1002 D1003 M0 M1 M2 M3 VPN PPN Disk Pages Memory 2 entry TLB Permission Page Table 0 1 2 3 4 5 6 7 10 Physical Refs References 0x0000 0x0004 0x7FFC 0x0008 0x2134 text1 text2 global data Loading a program into memory D0 D1 D2 D3 D1000 D1001 D1002 D1003 M0 M1 M2 M3 VPN PPN Disk Pages Memory 2 entry TLB Permission Page Table 0 1 2 3 4 5 6 7 Physical Refs References 0x0000 0x0004 0x7FFC 0x0008 0x2134 text1 text2 global data 11 Step 1: Read executable header & initialize page table D0 D1 D2 D3 D1000 D1001 D1002 D1003 M0 M1 M2 M3 VPN PPN Disk Pages Memory 2 entry TLB Permission reserved Physical Refs D1000 D1001 D1002 no map no map no map no map no map 0 1 2 3 4 5 6 7 Page Table ro ro References 0x0000 0x0004 0x7FFC 0x0008 0x2134 text1 text2 global data 12 Step 2: Load PC from header & start execution D0 D1 D2 D3 D1000 D1001 D1002 D1003 M0 M1 M2 M3 VPN PPN Disk Pages Memory 2 entry TLB Permission MISS! reserved Physical Refs D1000 D1001 D1002 no map no map no map no map no map 0 1 2 3 4 5 6 7 Page Table ro ro References 0x0000 0x0004 0x7FFC 0x0008 0x2134 text1 text2 global data 13 Fetching instruction 0000 D0 D1 D2 D3 D1000 D1001 D1002 D1003 M0 M1 M2 M3 VPN PPN Permission Disk Pages Memory 2 entry TLB reserved Physical Refs 0x0000 Page fault D1000 D1001 D1002 no map no map no map no map no map 0 1 2 3 4 5 6 7 Page Table ro ro References 0x0000 0x0004 0x7FFC 0x0008 0x2134 text1 text2 global data 14 Fetching instruction 0000 D0 D1 D2 D3 D1000 D1001 D1002 D1003 M0 M1 M2 M3 VPN PPN Permission Disk Pages Memory 2 entry TLB reserved text1 0 M1 ro Physical Refs 0x0000 Page fault M1 D1001 D1002 no map no map no map no map no map 0 1 2 3 4 5 6 7 Page Table ro ro References 0x0000 0x0004 0x7FFC 0x0008 0x2134 text1 text2 global data 15 Fetching instruction 0000 D0 D1 D2 D3 D1000 D1001 D1002 D1003 M0 M1 M2 M3 VPN PPN Permission Disk Pages Memory 2 entry TLB reserved text1 0 M1 ro Physical Refs 0x0000 Page fault 0x1000 M1 D1001 D1002 no map no map no map no map no map 0 1 2 3 4 5 6 7 Page Table ro ro References 0x0000 0x0004 0x7FFC 0x0008 0x2134 text1 text2 global data 16 Fetching instruction 0004 D0 D1 D2 D3 D1000 D1001 D1002 D1003 M0 M1 M2 M3 VPN PPN Permission HIT! Disk Pages Memory 2 entry TLB reserved text1 0 M1 ro Physical Refs 0x0000 Page fault 0x1000 0x1004 M1 D1001 D1002 no map no map no map no map no map 0 1 2 3 4 5 6 7 Page Table ro ro References 0x0000 0x0004 0x7FFC 0x0008 0x2134 text1 text2 global data 17 Reference 7FFC Disk Pages Memory 2 entry TLB D0 D1 D2 D3 D1000 D1001 D1002 D1003 M0 M1 M2 M3 VPN PPN Permission MISS! reserved text1 0 M1 ro Physical Refs 0x0000 Page fault 0x1000 0x1004 M1 D1001 D1002 no map no map no map no map no map 0 1 2 3 4 5 6 7 Page Table ro ro References 0x0000 0x0004 0x7FFC 0x0008 0x2134 text1 text2 global data 18 Reference 7FFC Disk Pages Memory 2 entry TLB D0 D1 D2 D3 D1000 D1001 D1002 D1003 M0 M1 M2 M3 VPN PPN Permission reserved text1 0 M1 ro Physical Refs 0x0000 Page fault 0x1000 0x1004 0x001C No map page fault M1 D1001 D1002 no map no map no map no map no map 0 1 2 3 4 5 6 7 Page Table ro ro References 0x0000 0x0004 0x7FFC 0x0008 0x2134 text1 text2 global data 19 Reference 7FFC Disk Pages Memory 2 entry TLB D0 D1 D2 D3 D1000 D1001 D1002 D1003 M0 M1 M2 M3 VPN PPN Permission reserved text1 set to 0s 0 M1 ro 7 M2 rw Physical Refs 0x0000 Page fault 0x1000 0x1004 0x001C No map page fault 0x2FFC M1 D1001 D1002 no map no map no map no map M2 0 1 2 3 4 5 6 7 Page Table ro ro References 0x0000 0x0004 0x7FFC 0x0008 0x2134 text1 text2 global data 20 Fetching instruction 0008 D0 D1 D2 D3 D1000 D1001 D1002 D1003 M0 M1 M2 M3 VPN PPN Permission HIT! Disk Pages Memory 2 entry TLB reserved text1 set to 0s 0 M1 ro 7 M2 rw Physical Refs 0x0000 Page fault 0x1000 0x1004 0x001C No map page fault 0x2FFC 0x1008 M1 D1001 D1002 no map no map no map no map M2 0 1 2 3 4 5 6 7 Page Table ro ro References 0x0000 0x0004 0x7FFC 0x0008 0x2134 text1 text2 global data 21 Reference 2134 Disk Pages Memory 2 entry TLB D0 D1 D2 D3 D1000 D1001 D1002 D1003 M0 M1 M2 M3 VPN PPN Permission MISS! reserved text1 set to 0s 0 M1 ro 7 M2 rw Physical Refs 0x0000 Page fault 0x1000 0x1004 0x001C No map page fault 0x2FFC 0x1008 M1 D1001 D1002 no map no map no map no map M2 0 1 2 3 4 5 6 7 Page Table ro ro References 0x0000 0x0004 0x7FFC 0x0008 0x2134 text1 text2 global data 22 Reference 2134 Disk Pages Memory 2 entry TLB D0 D1 D2 D3 D1000 D1001 D1002 D1003 M0 M1 M2 M3 VPN PPN Permission reserved text1 set to 0s 0 M1 ro 7 M2 rw Physical Refs 0x0000 Page fault 0x1000 0x1004 0x001C No map page fault 0x2FFC 0x1008 0x0008 Page fault M1 D1001 D1002 no map no map no map no map M2 0 1 2 3 4 5 6 7 Page Table ro ro References 0x0000 0x0004 0x7FFC 0x0008 0x2134 text1 text2 global data 23 Reference 2134 Disk Pages Memory 2 entry TLB D0 D1 D2 D3 D1000 D1001 D1002 D1003 M0 M1 M2 M3 VPN PPN Permission reserved text1 set to 0s global data 0 M1 ro 2 M3 rw M1 D1001 M3 no map no map no map no map M2 0 1 2 3 4 5 6 7 Page Table ro ro References 0x0000 0x0004 0x7FFC 0x0008 0x2134 text1 text2 global data 24 Physical Refs 0x0000 Page fault 0x1000 0x1004 0x0007 No map page fault 0x2FFC 0x1008 0x0008 Page fault 0x3134 Can we skip address translation? Virtually-addressed caches 25 Address translation in a pipeline Load/store: Memory stage Instruction fetch: Fetch stage When to do address translation? After VA is computed, but before memory access is performed Is it possible to skip address translation for some memory accesses? Yes. Answer: Virtually-addressed caches. 26 Option 1: Address translation before cache access Physically-addressed Cache CPU virtual address VA PA hit cache memory Slower Low complexity miss TLB page table Use physical address to access cache (tag, set index, and block offset bits) Address translation only for all accesses 27 Physically addressed caches Step 1: Virtual address to Physical Virtual address Virtual page number Page offset tag tag tag Physical Page Number Physical Page Number Physical Page Number tag Physical Page Number TLB Step 2: Access the cache with physical address obtained from translation Cache tag comp Set0 tag Set0 tag Set1 tag Set1 tag Set2 tag Set2 tag Physical address Physical Page Number Page offset tag comp Tag Set Index Block offset 2-way set associative cache 28 Option 2: Address translation after cache access Virtually-addressed Cache High complexity (process isolation is hard) Faster Use virtual address to access cache (tag, set index, and block offset bits) Address translation only for cache misses CPU virtual address VA PA cache miss hit memory TLB page table 29 Option 2: Address translation after cache access Virtually-addressed Cache High complexity (process isolation is hard) Faster CPU virtual address cache miss PA hit memory TLB page table TLB can be accessed in parallel VA with cache lookup. Saves time on a cache miss, where physical address is needed to access main memory But wasteful TLB accesses on cache hits 30 Virtually addressed caches Virtual address tag comp tag set index block offset Set0 tag Set0 tag Set1 tag Set1 tag Set2 tag Set2 tag tag comp 2-way set associative cache 31 Address translation: Before or After Cache Access Physically-addressed cache Generate virtual address ->
Access TLB ->
Access cache ->
if cache miss, access main memory
Before Cache
Virtually-addressed cache
Generate virtual address -> Access cache ->
if cache miss, access TLB -> Access main memory
After Cache
32

Tradeoffs
Physically-addressed caches: Slow; Simple Vs
Virtually-addressed caches: Fast; Complex
33

Physical Vs Virtual Caches: Latency
Physically-addressed caches
Cache is accessed with physical address (after VM translations).
– Slow.Cachecanbeaccessedonlyafteraddresstranslation – Inefficient,becauseallaccessesneedaddresstranslation
Virtually-addressed caches
Cache is accessed with virtual address (before VM translation).
– Fast. Skips address translation for cache hits.
– Efficient,becauseonlycachemissesneedaddresstranslation
34

Physical Vs Virtual Caches: Complexity
Problem for virtually-addressed cache:
The same virtual address refers to different physical addresses in two different processes.
To ensure process isolation, on a context switch:
Virtual cache need to be invalidated. Dirty cache blocks written back.
Physical cache need not be invalidated.
So, physical cache incurs fewer cache misses if context switches are very frequent (but generally they are not)
35

Typically, in modern processors
Level-1 (L1) caches are virtually-addressed. L1 needs to be fast.
L2 and L3 are physically-addressed. Larger structures.
Checking TLB before accessing them does not significantly affect their latency.
36

Physically addressed caches: detailed flow
CPU
virtual address 1 to 100s cycles
~ 1 cycle
VA
PA hit
1-10s cycles
cache
miss (cached or not)
valid page (update TLB)
1,000,000s cycles
invalid
page disk
bring in page (update memory, page table & TLB)
TLB
hit
page table
100s cycles
miss memory
data to processor
37

Virtually addressed caches: detailed flow
CPU
cache
hit
1-10s cycles
~ 1 cycle
virtual address
1 to 100s cycles (cached or not)
miss
valid page (update TLB)
1,000,000s cycles
invalid
page disk
bring in page (update memory, page table & TLB)
miss
page table
TLB
VA
PA hit
100s cycles
memory
(update cache)
data to processor
38

OS Support for Virtual Memory
OS must be able to modify the page table register, update page table values, etc.
To enable the OS to do this, BUT not the user program, we have different execution modes for a process.
• Executive (or supervisor or kernel level) permissions and
• User level permissions.
39

References (not part of Course Syllabus)
See how Intel’s memory management hardware works, Intel x86 Software Manual:
http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32- architectures-software-developer-vol-3a-part-1-manual.html
Chapter 4 is on Paging
Linux page table management:
https://www.kernel.org/doc/gorman/html/understand/un`derstand006.html
40