CS计算机代考程序代写 data structure flex cache MEMORY: SMALLER PAGETABLES

MEMORY: SMALLER PAGETABLES
Andrea Arpaci-Dusseau CS 537, Fall 2019

ADMINISTRIVIA
– Project 3 available: Shell in Linux (still solo)
– Discussion sections (fork() and exec())
– Test scripts available after/during weekend
– Midterm 1: Thursday, Oct 10th from 7:30-9:30pm
– Fill out Exam Conflict form in Canvas by TODAY
– Two sample exams posted – with answers!
– Next discussion sections on practice exam
– Canvas Homeworks
– “Due” each Tuesday and Thursday

AGENDA / LEARNING OUTCOMES
Memory virtualization
How we reduce the size of page tables?
What can we do to handle large address spaces?

RECAP

When are page tables created?
OS creates new page table when creates process
– OSchooseswhereprocesscode,heap,and stack are placed in RAM
– OSsetsuppagetablestocontaininitial mappings
OS modifies page tables when it allocates more process address space
Picks physical locations in RAM
CPU Memory
code static data heap
stack
Process
code static data
Program

PAGING TRANSLATION STEPS
For each mem reference:
1. extract VPN (virt page num) from VA (virt addr) 2. calculate addr of PTE (page table entry)
3. read PTE from memory
4. extract PFN (page frame num)
5. build PA (phys addr)
6. read contents of PA from memory

Disadvantages of Paging
Additional memory reference to page tableàVery inefficient – Pagetablemustbestoredinmemory
– MMUstoresonlybaseaddressofpagetable
Storage for page tables may be substantial
– Simplepagetable:RequiresPTEforallpagesinaddressspace
Entry needed even if page not allocated ?

Strategy: Cache Page Translations
CPU
Translation Cache
RAM
PT
memory interconnect

PAGING TRANSLATION STEPS
For each mem reference:
1. extract VPN (virt page num) from VA (virt addr) 2. check TLB for VPN
if miss:
3. calculate addr of PTE (page table entry)
4. read PTE from memory, replace some entry in TLB
5. extract PFN from TLB (page frame num) 6. build PA (phys addr)
7. read contents of PA from memory

0 KB 4 KB 8 KB
12 KB 16 KB 20 KB 24 KB 28 KB
PTBR
P1 pagetable load 0x0004
TLB Accesses: SEQUENTIAL Example
Virt Phys
PT
PT
P1
P2
P2
P1
P1
P2
load 0x0000 379……
0123 …load0x2000 CPU’s TLB
Valid
VPN
PPN

HW and OS Roles
Who Handles TLB Hit?
Who Handles TLB Miss? HW or OS H/W
H/W must know where pagetables are stored in memory
– CR3registeronx86
– PagetablestructurefixedandagreeduponbetweenHWandOS – HW“walks”knownpagetablestructureandfillsTLB
H/W

HW AND OS ROLES
Who Handles TLB MISS? H/W or OS? OS:
CPU traps into OS upon TLB miss “Software-managed TLB”
OS interprets pagetables as it chooses; any data structure possible Modifying TLB entries is privileged instruction

TLB Summary
Pages are great, but accessing page tables for every memory access is slow Cache recent page translationsàTLB
– HardwareperformsTLBlookuponeverymemoryaccess TLB performance depends strongly on workload
– Sequentialworkloadsperformwell
– Workloads with temporal locality can perform well (if enough TLB entries) In different systems, hardware or OS handles TLB misses
TLBs increase cost of context switches
– FlushTLBoneverycontextswitch – Add ASID to every TLB entry

Disadvantages of Paging
Additional memory reference to page tableàVery inefficient – Pagetablemustbestoredinmemory
– MMUstoresonlybaseaddressofpagetable
Storage for page tables may be substantial
– Simplepagetable:RequiresPTEforallpagesinaddressspace
Entry needed even if page not allocated ?

SMALLER PAGE TABLES

QUIZ: How big are page Tables?
1. PTE’s are 2 bytes, and 32 possible virtual page numbers
2. PTE’s are 2 bytes, virtual addrs are 24 bits, pages are 16 bytes 3. PTE’s are 4 bytes, virtual addrs are 32 bits, and pages are 4 KB 4. PTE’s are 4 bytes, virtual addrs are 64 bits, and pages are 4 KB How big is each page table?

Why ARE Page Tables so Large?
code Virt Mem Phys Mem heap
Not used, not allocated
stack

Many invalid PT entries
PFN valid prot
Problem:
Linear page tables must still allocate PTE for each page in address space
how to avoid storing these?
10 1 r-x
-0-
23 1 rw-
-0-
-0-
-0- (even for unallocated pages) -0-
…many more invalid… -0-
-0- -0- -0- 28 1 rw- 4 1 rw-

AVOID SIMPLE LINEAR PAGE TABLES?
Use more complex page tables, instead of just big array With software-managed TLB any data structure is possible Hardware looks for vpn in TLB on every memory access
– IfTLBdoesnotcontainvpn,TLBmiss
• Trap into OS and let OS find vpn->ppn translation • OSnotifiesTLBofvpn->ppnforfutureaccesses

1. 2.
3.
– –
Other Approaches
Segmented Pagetables Multi-level Pagetables
Page the page tables
Page the pagetables of page tables… Inverted Pagetables

how to avoid storing these P TEs?
PFN valid prot
10 1 r-x -0- 23 1 rw- -0- -0- -0- -0-
Note “hole” in addr space: valids vs. invalids are clustered
How did OS avoid allocating holes in address space?
Segmentation
valid Ptes are Contiguous
…many more invalid… -0-
-0- -0- -0- 28 1 rw- 4 1 rw-

Combine Paging and Segmentation
Divide address space into segments (code, heap, stack) – Segments can be variable length
Divide each segment into fixed-sized pages Logical address divided into three portions
seg # (4 bits)
page number (8 bits)
page offset (12 bits)
Implementation
• Each segment has a page table
• Each segment track base (physical address) and bounds of the page table

seg # (4 bits)
Chat: Paging and Segmentation
0x001000
page number (8 bits)
page offset (12 bits)

seg
base
bounds
RW
0
0x002000
0xff
10
1
0x000000
0x00
00
2
0x001000
0x0f
11
0x01f
0x011
0x003
0x02a
0x013
0x004070
0x003016
error
error
error
0x002070 read:
0x202016 read:
0x104c84 read:
0x010424 write:
0x210014 write:
0x203568 read:
0x002000
Assume bounds: # PTE entries
0x02a568

0x00c
0x007
0x004
0x00b
0x006

Advantages of Paging and Segmentation
Advantages of Segments
– Supportssparseaddressspaces.
• Decreases size of page tables.
• If segment not used, not need for page table
Advantages of Pages
– Noexternalfragmentation
– Segmentscangrowwithoutanyreshuffling
– Canrunprocesswhensomepagesareswappedtodisk(nextlecture)
Advantages of Both
– Increasesflexibilityofsharing:Shareeithersinglepageorentiresegment

seg # (4 bits)
page number (8 bits)
page offset (12 bits)
0x01f
seg
base
bounds
RW
8
0x002000
0xff
10
9
0x000000
0x00
00
a
0x001000
0x0f
11
P1:
Sharing: Paging and Segmentation
0x001000
0x011
0x003
seg
base
bounds
RW
8
0x000000
0x00
00
9
0x002000
0xff
11
a
0x003000
0x0f
11
P2:
0x002000
P1: 0x802070 read: P2: 0x902070 read: P2: 0xa00100 read:
0x003000

0x02a
0x013

0x00c
0x007
0x004
0x00b
0x006

0x01f

Disadvantages of Paging and Segmentation
Potentially large page tables (for each segment)
– Mustallocateeachpagetablecontiguously
– Moreproblematicwithmoreaddressbits – Pagetablesize?
Assume 2 bits for segment, 18 bits for page number, 12 bits for offset
Each page table is:
= Number of entries * size of each entry = Number of pages * 4 bytes
= 2^18 * 4 bytes = 2^20 bytes = 1 MB!!!

1. 2.
3.
– –
Other Approaches
Segmented Pagetables Multi-level Pagetables
Page the page tables
Page the pagetables of page tables… Inverted Pagetables

Multilevel Page Tables
Goal: Allow each page tables to be allocated non-contiguously
Idea: Page the page tables
– Createsmultiplelevelsofpagetables;outerlevel“pagedirectory” – Onlyallocatepagetablesforpagesinuse
– Usedinx86architectures(hardwarecanwalkknownstructure)

30-bit address:
Multilevel Page Tables
outer page(8 bits)
inner page (10 bits)
page offset (12 bits)
base of page directory

Multilevel example
page directory page of PT (@PPN:0x3) page of PT (@PPN:0x92)
PPN valid PPN valid PPN valid 0x31 0x101 -0 -0 0x231 -0 -0-0-0 -0-0-0 -0 0x801 -0 -0 0x591 -0 -0-0-0 -0-0-0 -0-0-0 -0-0-0 -0-0-0 -0-0-0 -0 -0 0x550 -0 -0 0x451 0x921-0 1
20-bit address:
translate 0x01ABC
0x23ABC
outer page(4 bits)
inner page(4 bits)
page offset (12 bits)

Neighbor Chat: Multilevel
page directory page of PT (@PPN:0x3) page of PT (@PPN:0x92)
PPN valid PPN valid PPN valid 0x31 0x101 -0 -0 0x231 -0 -0-0-0 -0-0-0 -0 0x801 -0 -0 0x591 -0 -0-0-0 -0-0-0 -0-00
translate 0x04000
-0-0-0 translate 0xFEED0 -0-0-0 -0-0-0
-0 -0 0x550
0x80000
-0 -0 0x451 0x921-0 1
20-bit address:
0x55ED0
outer page(4 bits)
inner page(4 bits)
page offset (12 bits)

Canvas Homework
paging-multilevel-translate.py

Address format for multilevel Paging
30-bit address:
How should logical address be structured? How many bits for each paging level? Goal?
– Each page table fits within a page
– PTE size * number PTE = page size
Assume PTE size = 4 bytes Page size = 2^12 bytes = 4KB
– How many page table entries can we fit on page?
– 4KB / 4bytes = 1K (1024) entries
à # bits for selecting inner page = à10
Remaining bits for outer page: 30-10-12=8 bits
outer page
inner page
page offset (12 bits)

Problem with 2 levels?
Problem: page directories (outer level) may not fit in a page
64-bit address:
– Splitpagedirectoriesintopieces
outer page?
inner page (10 bits)
page offset (12 bits)
Solution:
– Useanotherpagedirtorefertothepagedirpieces.
VPN
4KB / 4 bytesà1K entries per level 1 level:
2 levels: 3 levels:
PD idx 0
PD idx 1
PT idx
OFFSET
How large is virtual address space with 4 KB pages, 4 byte PTEs,
(each page table fits in page)
1K * 4K = 2^22 = 4 MB
1K * 1K * 4K = 2^32 ≈ 4 GB
1K*1K*1K*4K=2^42 ≈4TB

FULL SYSTEM WITH TLBS
On TLB miss: lookups with more levels more expensive
Assume 3-level page table Assume 256-byte pages
Assume 16-bit addresses
Assume ASID of current process is 211
Howmanyphysicalaccessesforeachinstruction? (IgnoreopschangingTLB) (a) 0xAA10: movl 0x1111, %edi
ASID
VPN
PFN
Valid
211
0xbb
0x91
1
211
0xff
0x23
1
122
0x05
0x91
1
211
0x05
0x12
0
0xaa: (TLB miss -> 3 for addr trans) + 1 instr fetch
0x11: (TLB miss -> 3 for addr trans) + 1 movl
Total: 8
(b) 0xBB13: addl $0x3, %edi
Total:1
(c) 0x0519: movl %edi, 0xFF10
Total: 5
0xbb: (TLB hit -> 0 for addr trans) + 1 instr fetch from 0x9113
0x05: (TLB miss -> 3 for addr trans) + 1 instr fetch
0xff: (TLB hit -> 0 for addr trans) + 1 movl into 0x2310

1. 2.
3.
– –
Other Approaches
Segmented Pagetables Multi-level Pagetables
Page the page tables
Page the pagetables of page tables… Inverted Pagetables

Inverted Page TAble
Observation:
– Only need entries for virtual pages w/ valid physical mappings
– Have entries based on physical pages (àinverted!)
Naïve approach:
– Search through data structure to find match
– Too much time to search entire structure
Faster:
– Find possible matches entries by hashing vpn+asid
– Smaller number of entries to search for exact match
TLB still manages most cases
– Managinginvertedpagetablerequiressoftware-controlledTLB

Summary: Better PAGE TABLES
Problem: Simple linear page tables require too much contiguous memory
Many options for efficiently organizing page tables
If OS traps on TLB miss, OS can use any data structure
– Invertedpagetables(hashing)
If Hardware handles TLB miss, page tables must follow specific format
– Multi-levelpagetablesusedinx86architecture – Eachpagetablefitswithinapage
Next Topic:
What if desired address spaces do not fit in physical memory?