程序代写代做 data structure chain cache go flex Multi-Level Paging, Inverted Page Table, TLB

Multi-Level Paging, Inverted Page Table, TLB
1
Reading assignment
 Dinosaur Chapter 8
 Comet Chapters 16, 17, 18
 Midterm: 3/7 8pm-10pm EE170
2
12
[lec 13] Summary: Evolution of Memory Management (so far)
Scheme
Simple uniprogramming
Simple multiprogramming
Base & Bound
Multiple segments
How
1 segment loaded to starting address 0
1 segment relocated at loading time
Dynamic mem relocation at runtime
Dynamic mem relocation at runtime
Pros
Simple
Simple,
Multiple processes
Simple hardware, Multiple processes Protection
Sharing, Protection, multi segs/process
Cons
1 process
1 segment No protection
1 segment/process No protection External frag.
1 segment/process, External frag.
More hardware, External frag.
3
[lec13] managing memory
 Simple multiprogramming – 4 drawbacks 1. No protection
2. Low utilization — Cannot relocate dynamically
 Cannot do anything about holes
3. No sharing — Single segment per process 4. Entire address space needs to fit in mem
 Need to swap whole, very expensive!
4
34

[lec13] Paging implementation – how does it really work?
Virtual address
> Page table
Physical address
 Where to store page table? error  How to use MMU?
 Even small page tables too large to load into MMU
 Page tables kept in mem and MMU only has their base addresses
 What does MMU have to do?
 Page size?
 Small page -> big table
 32-bit with 4k pages
 Large page ->small table but large internal fragmentation 5
page table size
VPage #
offset
PPage# … . …
. PPage# …
PPage #
offset
[lec13] Deep thinking
 Why does the page table we talked about so far have to be contiguous in the physical memory?
 Why did a segment have to be contiguous in memory?
 For a 4GB virtual address space (32 bit machine), we just need 1M PTE (~4MB), what is the big deal?
 My PC has 2GB, why do we need PTEs for the entire 4GB address space?
6
56
Page table
 The page table has to be consecutive in mem
 Potentially large
 Consecutive pages in mem hard to find
 How can we be flexible?
“All computer science problems can be solved with an extra level of indirection.”
7
Two-level page tables
Dir
Page
offset
Directory
Virtual address
Page table
Physical address
table addr
.
PPage# … . …
. PPage# …
PPage #
offset
8
78

32-bit Two-Level Page Table
What does this buy us?
Have we saved memory space to store page tables?
9
Multiple-level page tables
10
9 10
Multi-level page tables
 Advantages
 All L2 page tables do not have to be contiguous
 Note: a single L2 page table is still contiguous
 All L2 page tables not have to be allocated altogether
 Individual L2 page tables can be allocated on demand  Individual L2 page tables can swapped out to disk
 An advanced OS feature
The power of an extra level of indirection! 11
[review] How many PTEs do we need?
 Worst case for 32-bit address machine
 # of processes  220 (if page size is 4096 bytes)
 What about 64-bit address machine?  # of processes  236
Hmm, but my PC only has 1GB, 256K PTEs should be enough?!
17
11 17

Inverted Page Table
 Motivation
 Example: 2 processes, page table has 1M entries, 10 phy pages
 (An extreme case of very low phys memory)  Is there a way to save page table space?
18
Ideally,
 One PTE for each physical page frame,
disregarding how many processes
 Assuming rest virtual addressed not allocated/used
 i.e., linear inverted page table (an array of phy pages mapped to virtual addresses)
0
k n-1
pid
vpage
19
18 19
But,
 How do we do lookups with linear inverted page table?
 Has to go through the entire array and compare!
0
k n-1
CPU
virtual address
physical address
pid
vpage
Translation (MMU)
Physical memory
I/O device
20
(Hashed) Inverted page tables
Virtual address
Physical address
0
k n-1
 Main idea
 One PTE for each
physical page frame
 Hash (Vpage, pid) to
Ppage#  Pros
 Small page table for large address space
 Cons
 Lookup is difficult
 Overhead of managing
hash chains, etc
 Ex: 64-bit UltraSPARC,21 PowerPC
pid
vpage
offset
k
offset
pid
vpage
Inverted page table
20 21

Deep thinking
 How can two processes share memory under
inverted page table?
 Since the inverted page table provides only one forward mapping, it is very difficult to share memory among processes. For this reason most modern OSs use multi-level page tables.
22
Performance problem with paging
 How many extra memory references to access page tables?
 One-level page table?
 Two-level page table?
 Solution: reference locality!
 In a short period of time, a process is likely accessing only a few pages
 Instead of storing only page table starting address in hardware (MMU), store part of the page table 24 that is “hot”
22 24
Translation Look-aside Buffer (TLB)
Virtual address
TLB Hit
Physical address
TLB often fully set-associativeleast conflict misses Expensive  typically 64 – 1024 entries
VPage #
offset
PPage# … VPage# PPage# …
. VPage# PPage# …
VPage#
Miss
Real page table
PPage #
offset
25
Bits in a TLB Entry
 Common (necessary) bits
 Virtual page number: match with the virtual address  Physical page number
 Optional (useful) bits
 ASIDs — Address-space identifiers (process tags)
TLB
26
VPage# PPage# … VPage# PPage# …
. VPage# PPage# …
25 26

Handling of TLB miss:
Hardware-controlled TLB
 On a TLB hit, MMU checks the valid bit
 If valid, perform address translation
 If invalid (e.g. page not in memory), MMU generates a page
fault
 OS performs fault handling
 Restart the faulting instruction
 On a TLB miss
 MMU parses page table and loads PTE into TLB
 Needs to replace if TLB is full
 Page table layout is defined by hardware design  Same as hit …
27
Miss handling:
Software-controlled TLB
 On a TLB hit, MMU checks the valid bit
 If valid, perform address translation
 If invalid (e.g. page not in memory), MMU generates a page
fault
 OS performs page fault handling  Restart the faulting instruction
 On a TLB miss, HW raises exception, traps to the OS  OS parses page table and loads PTE into TLB
 Needs to replace if TLB is full
 Page table layout is defined by the OS  Same as in a hit…
28
27 28
Hardware vs. software controlled
 Hardware approach
 Efficient – TLB misses
handled by hardware
 OS intervention is required only in case of page fault
 Page structure prescribed by MMU hardware — rigid
 Software approach
 Less efficient — TLB misses are handled by software
 MMU hardware very simple, permitting larger, faster TLB
 OS designer has complete flexibility in choice of MM data structure
 e.g. 2-level page table, inverted page table
29
Deep thinking
 Without TLB, how MMU locates PTEs is pre- determined by hardware designer
 With TLB, it can be flexible, e.g. software- controlled is possible
 What enables this?
 TLB is an extra level of indirection!
30
29 30

More TLB Issues
 Which TLB entry should be replaced?  Random
 LRU
 What happens when changing a page table entry (e.g. because of swapping, change read/write permission)?
 Change the entry in memory
 flush (eg. invalidate) the TLB entry  INGLPG on x86
31
What happens to TLB in a process context switch?
 During a process context switch, cached translations cannot be used by the next process
 Invalidate all entries during a context switch  Lots of TLB misses afterwards
 Tag each entry with an ASID
 Add a HW register that contains the process id of the current
executing process
 TLB hits if an entry’s process id matches that reg
32
31 32
Cache vs. TLB
 Similarities:
 Both cache a part of the physical memory
 Differences:  Associatively
 TLB is usually fully associative
 Cache can be direct mapped  Consistency
 TLB does not deal with consistency with memory  TLB can be controlled by software
33
More on consistency Issues
 Snoopy cache protocols can maintain consistency with DRAM, even in the presence of DMA
 No hardware maintains consistency between DRAM
and TLBs:
 OS needs to flush related TLBs whenever changing a page table entry in memory
 On multiprocessors, when you modify a page table entry, you need to do “TLB shoot-down” to flush all related TLB entries on all processors
34
33 34