CS计算机代考程序代写 x86 data structure flex [537] Smaller Page Tables

[537] Smaller Page Tables

Virtualizing Memory:
Smaller Page TAbles
Questions answered in this lecture:
Review: What are problems with paging?
Review: How large can page tables be?
How can large page tables be avoided with different techniques?
Inverted page tables, segmentation + paging, multilevel page tables
What happens on a TLB miss?
CSE 2431
Introduction to Operating Systems
Based on slides by Andrea C. Arpaci-Dusseau,
Remzi H. Arpaci-Dusseau

Announcements
Reading for today: Chapter 20

Disadvantages of Paging
Additional memory reference to look up in page table
Very inefficient (doubles number of memory references; one reference for page table translation information, and one reference to access the data/instruction the process needs)
Page table must be stored in memory
MMU stores only base address of page table
Avoid extra memory reference for lookup with TLBs (last lecture)
Storage for page tables may be substantial
Simple page table: Requires PTE for all possible pages in address space
Entry needed even if page not allocated (that is, pages the process may never use)
Problematic with dynamic stack and heap within address space (as stack/heap either grow/shrink, page table has to be modified)

QUIZ: How big are page Tables?
PTE’s are 2 bytes, and 32 possible virtual page numbers

PTE’s are 2 bytes, virtual addrs are 24 bits, pages are 16 bytes

PTE’s are 4 bytes, virtual addrs are 32 bits, and pages are 4 KB

PTE’s are 4 bytes, virtual addrs are 64 bits, and pages are 4 KB

How big is each page table?
REMINDER: lg = log2
32 * 2 bytes = 64 bytes
2 bytes * 2^(24 – lg 16) = 2^21 bytes (2 MB)
4 bytes * 2^(32 – lg 4K) = 2^22 bytes (4 MB)
4 bytes * 2^(64 – lg 4K) = 2^54 bytes

code
heap
stack
Virt Mem
Phys Mem

Waste!

Why ARE Page Tables so Large?

Many invalid PT entries
PFN valid prot
10 1 r-x
– 0 –
23 1 rw-
– 0 –
– 0 –
– 0 –
– 0 –
– 0 –
– 0 –
– 0 –
– 0 –
28 1 rw-
4 1 rw-
…many more invalid…

how to avoid
storing these?

Format of linear page tables:

Avoid
simple linear Page Table
Use more complex page tables, instead of just big array
Any data structure is possible with software-managed TLB (managed by OS, which is software!)
Hardware looks for vpn in TLB on every memory access
If TLB does not contain vpn, TLB miss
Trap into OS and let OS find vpn->ppn translation
OS stores vpn->ppn translation information in TLB for future accesses

Approach 1:
Inverted Page TAble
Inverted Page Tables
Only need entries for virtual pages w/ valid physical mappings (that is, only virtual pages that are currently in physical memory)

Naïve approach:
Search through data structure for to find match
Too much time to search entire table

Better: Find possible matches entries by hashing vpn+asid
Smaller number of entries to search for exact match

Managing inverted page table requires software-controlled TLB

For hardware-controlled TLB, need well-defined, simple approach (so this is not possible for inverted page table, because the structure is too complex)

Inverted page tables
HUGE advantage: The OS needs to keep only a single page table for all processes running in the system!
Why is it called “inverted”?
In an “ordinary” (that is non-inverted) page table, ppns are stored; vpns are used to index into the page table array, to find ppn and other information for the vpn; VPNs, however, ARE NOT STORED
In an inverted page table, vpns (and ASIDs) ARE stored, and ppns are stored along with the vpn/ASID data; it is called an inverted page table because vpn/ASID data is stored, which is not true in an “ordinary”/non-inverted page table.
Note: Searching the inverted page table is hard to do efficiently, but is critical for good performance, so that’s why a hash function is used.

Other Approaches
Inverted Pagetables (already considered above)
Segmented Pagetables (next below)
Multi-level Pagetables
Page the page tables
Page the pagetables of page tables…

SEGMENTED PAGE TABLE – notice that valid Ptes are Contiguous
PFN valid prot
10 1 r-x
– 0 –
23 1 rw-
– 0 –
– 0 –
– 0 –
– 0 –
– 0 –
– 0 –
– 0 –
– 0 –
28 1 rw-
4 1 rw-
…many more invalid…

how to avoid
storing these?

Note “hole” in addr space:
valids vs. invalids are clustered

How did OS avoid allocating holes in phys memory?

Segmentation

Combine Paging and Segmentation
Divide address space into segments (code, heap, stack)
Segments can be variable length
Divide each segment into fixed-sized pages
Logical address divided into three portions; for example, see below:
page offset (12 bits)
page number (8 bits)
seg #
(4 bits)
Implementation
Each segment has a page table
Each segment tracks base (physical address) and bounds of page table for that segment (so size of page table for segment can change if number of pages in segment changes); base-bounds pair of registers used in MMU for page table of each segment (instead of single PTBR).

Quiz: Paging and Segmentation
seg base bounds R W
0 0x002000 0xff 1 0
1 0x000000 0x00 0 0
2 0x001000 0x0f 1 1


0x01f
0x011
0x003
0x02a
0x013

0x00c
0x007
0x004
0x00b
0x006

0x001000
0x002000
0x002070 read:
0x202016 read:
0x104c84 read:
0x010424 write:
0x210014 write:
0x203568 read:
page offset (12 bits)
page number (8 bits)
seg #
(4 bits)
0x004070
0x003016
Segmentation error
Segmentation error

Segmentation error

0x02a568

Advantages of Paging and Segmentation
Advantages of Segments
Supports sparse (“spread out”) address spaces
Decreases size of page tables
If segment not used, not need for page table, so memory use reduced
Advantages of Pages
No external fragmentation
Segments can grow without any reshuffling
Can run process when some pages are swapped to disk (next lecture)
Advantages of Both
Increases flexibility of sharing
Share either single page or entire segment
How?

Disadvantages of Paging and Segmentation
Potentially large page tables (for each segment)
Must allocate each page table contiguously
(Maximum) Page table size?
Assume 2 bits for segment, 18 bits for page number, 12 bits for offset
Each page table is:
= Number of entries * size of each entry
= Number of pages * 4 bytes
= 2^18 * 4 bytes = 2^20 bytes = 1 MB!!!
This will still usually be better than a simple linear array page table, though!

Other Approaches
Inverted Pagetables (already considered above)
Segmented Pagetables (already considered above)
Multi-level Pagetables (below)
Page the page tables
Page the pages of page tables…

3) Multilevel
Page Tables
Goal: Allow each page table to be allocated non-contiguously
Idea: Page the page tables
Creates multiple levels of page tables; outer level “page directory”
Only allocate page tables for pages in use
Used in x86 architectures (hardware can walk known structure)
outer page
(8 bits)
inner page
(10 bits)
page offset (12 bits)
30-bit address:

base of page directory

Quiz: Multilevel
PPN
0x3














0x92
valid
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1

page directory
PPN
0x10
0x23


0x80
0x59









valid
1
1
0
0
1
1
0
0
0
0
0
0
0
0
0

page of PT (@PPN:0x3)
PPN













0x45
0x55
valid
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1

page of PT (@PPN:0x92)
translate 0x01ABC

outer page
(4 bits)
inner page
(4 bits)
page offset (12 bits)
20-bit address:

translate 0xFEED0
translate 0x00000
0x23ABC
0x10000
0x55ED0

What is the page table here?
It does not appear so clear what actually makes up the page table here.
It consists of the page directory (with entries indexed by the outer page in the virtual address) PLUS the inner pages (the pages of the page table).
This kind of page table is also sometimes referred to as a hierarchical page table (the page table has “levels”).
As we will see in the next slide set, by paging the page table, only pages of the page table which have mappings for pages which are currently being use by the process need to be kept in memory; the other page of the page table can be left on disk (they will be marked invalid in the page directory).

QUIZ: Address format for multilevel Paging
How should logical address be structured?
How many bits for each paging level?
Goal?
Each page table fits within a page
PTE size * number PTE = page size
Assume PTE size = 4 bytes
Page size = 2^12 bytes = 4KB
2^2 bytes * number PTE = 2^12 bytes
 number PTE = 2^10
 # bits for selecting inner page = 10
Remaining bits for outer page:
30 – 10 – 12 = 8 bits
outer page
inner page
page offset (12 bits)
Assume 30-bit virtual address:

Problem with 2 levels?
Problem: page directories (outer level) may not fit in a page

Solution:
Split page directories into pieces
Use another page dir to refer to the page dir pieces.
PT idx
OFFSET
PD idx 1
VPN

PD idx 0
How large is virtual address space with 4 KB pages, 4 byte PTEs,
each page table fits in page given 1, 2, 3 levels?
4KB / 4 bytes  1K entries per level
1 level: 1K * 4K = 2^22 = 4 MB
2 levels: 1K * 1K * 4K = 2^32 ≈ 4 GB
3 levels: 1K * 1K * 1K * 4K = 2^42 ≈ 4 TB
outer page?
inner page
(10 bits)
page offset (12 bits)
64-bit address:

On TLB miss: lookups with more levels more expensive
How much does a miss cost?
Assume 3-level page table
Assume 256-byte pages
Assume 16-bit addresses
Assume ASID of current process is 211
How many physical accesses for each instruction? (Ignore previous ops changing TLB)
(a) 0xAA10: movl 0x1111, %edi

(b) 0xBB13: addl $0x3, %edi

(c) 0x0519: movl %edi, 0xFF10
ASID VPN PFN Valid
211 0xbb 0x91 1
211 0xff 0x23 1
122 0x05 0x91 1
211 0x05 0x12 0

QUIZ: FULL SYSTEM WITH TLBS
0xaa: (TLB miss -> 3 for addr trans) + 1 instr fetch
0x11: (TLB miss -> 3 for addr trans) + 1 movl
Total: 8
Total: 1
0xbb: (TLB hit -> 0 for addr trans) + 1 instr fetch from 0x9113
0x05: (TLB miss -> 3 for addr trans) + 1 instr fetch
0xff: (TLB hit -> 0 for addr trans) + 1 movl into 0x2310
Total: 5

Summary:
Better PAGE TABLES
Problem:
Simple linear page tables require too much contiguous memory
Many options for efficiently organizing page tables
If OS traps on TLB miss, OS can use any data structure
Inverted page tables (hashing) is one option
If Hardware handles TLB miss, page tables must follow specific format
Multi-level page tables used in x86 architecture
Each page table fits within a page

Next Topic:
What if desired address spaces do not fit in physical memory?
.