CS代写 CS162 © UCB Spring 2022

Recall 61C:Average Memory Access Time
• Used to compute access time probabilistically:
AMAT = Hit RateL1 x Hit TimeL1 + Miss RateL1 x Miss TimeL1
Hit RateL1 + Miss RateL1 = 1

Hit TimeL1 = Time to get value from L1 cache.
Miss TimeL1 = Hit TimeL1 + Miss PenaltyL1
Miss PenaltyL1 = AVG Time to get value from lower level (DRAM)
So, AMAT = Hit TimeL1 + Miss RateL1 x Miss PenaltyL1
• What about more levels of hierarchy?
AMAT = Hit TimeL1 + Miss RateL1 x Miss PenaltyL1
Miss PenaltyL1 = AVG time to get value from lower level (L2) = Hit TimeL2 + Miss RateL2 x Miss PenaltyL2
Miss PenaltyL2 = Average Time to fetch from below L2 (DRAM)
AMAT = Hit TimeL1 +
Miss RateL1 x (Hit TimeL2 + Miss RateL2 x Miss PenaltyL2)
• And so on … (can do this recursively for more levels!)
3/15/22 Joseph & Kubiatowicz CS162 © UCB Spring 2022
Proc L1 Cache
Proc L2 Cache L1 Cache

Recall: Caching Applied to Address Translation
Cached? Yes
Translate (MMU)
Data Read or Write (untranslated)
Virtual Address
Physical Address
Physical Memory
• Question is one of page locality: does it exist?
– Instruction accesses spend a lot of time on the same page (accesses sequential) – Stack accesses have definite locality of reference
– Data accesses have less page locality, but still some…
• Can we have a TLB hierarchy?
– Sure: multiple levels at different sizes/speeds
Joseph & Kubiatowicz CS162 © UCB Spring 2022
Save Result

Management & Access to the Memory Hierarchy
Managed in Hardware
Managed in Software – OS
Processor L
L 1 C a c h
L3 Ca che (sh are d)
e gi st e r
Secondary Storage
Accessed in Hardware
Speed (ns): Size (bytes):
100Bs 10kBs
3 10-30 100 100kBs MBs GBs
100,000 (0.1 ms)
10,000,000 (10 ms)
Joseph & Kubiatowicz CS162 © UCB Spring 2022
Main Memory (DRAM)
Secondar y
Storage (SSD)

Page Fault ⇒ Demand Paging
virtual address
physical address
offset frame#
instruction
page fault
update PT entry
Operating System
Page Fault Handler
load page from disk
Joseph & Kubiatowicz CS162 © UCB Spring 2022

Demand Paging as Caching, …
What “block size”? – 1 page (e.g, 4 KB)
What “organization” ie. direct-mapped, set-assoc., fully-associative? – Fully associative since arbitrary virtual → physical mapping
How do we locate a page?
– First check TLB, then page-table traversal
What is page replacement policy? (i.e. LRU, Random…) – This requires more explanation… (kinda LRU)
What happens on a miss?
– Go to lower level to fill miss (i.e. disk)
What happens on a write? (write-through, write back) – Definitely write-back – need dirty bit!
Joseph & Kubiatowicz CS162 © UCB Spring 2022

Illusion of Infinite Memory
Page Table
Physical Memory 512 MB
Disk 500GB
Joseph & Kubiatowicz CS162 © UCB Spring 2022
Virtual Memory 4 GB
• Disk is larger than physical memory ⇒
– In-use virtual memory can be bigger than physical memory
– Combined memory of running processes much larger than physical memory
» More programs fit into memory, allowing more concurrency • Principle: Transparent Level of Indirection (page table)
– Supports flexible placement of physical data
» Data could be on disk or somewhere across network
– Variable location of data transparent to user program » Performance issue, not correctness issue

Review:What is in a PTE?
• What is in a Page Table Entry (or PTE)?
– Pointer to next-level page table or to actual page
– Permission bits: valid, read-only, read-write, write-only
• Example: Intel x86 architecture PTE:
– 2-level page tabler (10, 10, 12-bit offset)
– Intermediate page tables called “Directories”
31-12 11-9 876543210
P: Present (same as “valid” bit in other architectures)
W: Writeable
U: User accessible
PWT: Page write transparent: external cache write-through
PCD: Page cache disabled (page cannot be cached)
A: Accessed: page has been accessed recently
D: Dirty (PTE only): page has been modified recently
PS: Page Size: PS=1⇒4MB page (directory only). Bottom 22 bits of virtual address serve as offset
Page Frame Number (Physical Page Number)
Joseph & Kubiatowicz CS162 © UCB Spring 2022

Demand Paging Mechanisms
• PTE makes demand paging implementatable
– Valid ⇒ Page in memory, PTE points at physical page
– Not Valid ⇒ Page not in memory; use info in PTE to find it on disk when necessary
• Suppose user references page with invalid PTE?
Memory Management Unit (MMU) traps to OS » Resulting trap is a “Page Fault”
What does OS do on a Page Fault?:
» Choose an old page to replace
» If old page modified (“D=1”), write contents back to disk » Change its PTE and any cached TLB to be invalid
» Load new page into memory from disk
» Update page table entry, invalidate TLB for new entry
» Continue thread from original faulting location
TLB for new page will be loaded when thread continued!
While pulling pages off disk for one process, OS runs another process from ready queue
» Suspended process sits on wait queue
Joseph & Kubiatowicz CS162 © UCB Spring 2022

Origins of Paging
Keep most of the address space on disk
Disks provide most of the storage
Relatively small memory, for many processes
Actively swap pages to/from
Keep memory full of the frequently accesses pages
Many clients on dumb terminals running different programs
Joseph & Kubiatowicz CS162 © UCB Spring 2022

Very Different Situation Today
Powerful system Huge memory Huge disk Single user
Joseph & Kubiatowicz CS162 © UCB Spring 2022

A Picture on one machine
• Memory stays about 75% used, 25% for dynamics • A lot of it is shared 1.9 GB
Joseph & Kubiatowicz CS162 © UCB Spring 2022 Lec 16.11

Many Uses of Virtual Memory and “Demand Paging” …
• Extend the stack
– Allocate a page and zero it
• Extend the heap (sbrk of old, today mmap)
• Process Fork
– Create a copy of the page table
– Entries refer to parent pages – NO-WRITE – Shared read-only pages remain shared
– Copy page on write
– Only bring in parts of the binary in active use – Do this on demand
• MMAP to explicitly share region (or to access a file as RAM)
Joseph & Kubiatowicz CS162 © UCB Spring 2022

Classic: Loading an executable into memory
disk (huge)
lives on disk in the file system
contains contents of code & data segments, relocation entries and symbols
OS loads it into memory, initializes registers (and initial stack pointer)
program sets up stack and heap upon initialization:
crt0 (C runtime init)
Joseph & Kubiatowicz CS162 © UCB Spring 2022

Create Virtual Address Space of the Process
disk (huge)
process VAS
user page frames
user pagetable
kernel code & data
• Utilized pages in the VAS are backed by a page block on disk
– Called the backing store or swap file
– Typically in an optimized block store, but can think of it like a file
Joseph & Kubiatowicz CS162 © UCB Spring 2022

Create Virtual Address Space of the Process
disk (huge,TB)
stack heap
process VAS (GBs)
user page frames
• User Page table maps entire VAS
• All the utilized regions are backed on disk
– swapped into and out of memory as needed
• For every process
user pagetable
kernel code & data
Joseph & Kubiatowicz CS162 © UCB Spring 2022

Create Virtual Address Space of the Process
disk (huge,TB)
stack heap
[per process] PT
user page frames
user pagetable
kernel code & data
Joseph & Kubiatowicz CS162 © UCB Spring 2022
• User Page table maps entire VAS
– Resident pages to the frame in memory they occupy
– The portion of it that the HW needs to access must be resident in memory

Provide Backing Store for VAS
disk (huge,TB)
stack heap
VAS [per process]
user page frames
user pagetable
kernel code & data
Joseph & Kubiatowicz CS162 © UCB Spring 2022
• User Page table maps entire VAS
• Resident pages mapped to memory frames
• For all other pages, OS must record where to find them on disk

What Data Structure Maps Non-Resident Pages to Disk?
• FindBlock(PID, page#) → disk_block
– Some OSs utilize spare space in PTE for paged blocks – Like the PT, but purely software
• Where to store it?
– In memory – can be compact representation if swap storage is contiguous on disk
– Could use hash table (like Inverted PT)
• Usually want backing store for resident pages too
• May map code segment directly to on-disk image – Saves a copy of code to swap file
• May share code segment with multiple instances of the program
Joseph & Kubiatowicz CS162 © UCB Spring 2022

Provide Backing Store for VAS
disk (huge,TB)
stack heap
user page frames
user pagetable
kernel code & data
Joseph & Kubiatowicz CS162 © UCB Spring 2022

On page Fault …
disk (huge,TB)
stack heap
user page frames
user pagetable
kernel code & data
Joseph & Kubiatowicz CS162 © UCB Spring 2022
active process & PT

On page Fault … find & start load
disk (huge,TB)
stack heap
user page frames
user pagetable
kernel code & data
active process & PT
Joseph & Kubiatowicz CS162 © UCB Spring 2022

On page Fault … schedule other P or T
disk (huge,TB)
stack heap
user page frames
user pagetable
kernel code & data
Joseph & Kubiatowicz CS162 © UCB Spring 2022
active process & PT

On page Fault … update PTE
disk (huge,TB)
stack heap
user page frames
user pagetable
kernel code & data
Joseph & Kubiatowicz CS162 © UCB Spring 2022
active process & PT

Eventually reschedule faulting thread
disk (huge,TB)
stack heap
user page frames
user pagetable
kernel code & data
active process & PT
Joseph & Kubiatowicz CS162 © UCB Spring 2022

Some questions we need to answer!
• During a page fault, where does the OS get a free frame?
– Keeps a free list
– Unix runs a “reaper” if memory gets too full
» Schedule dirty pages to be written back on disk
» Zero (clean) pages which haven’t been accessed in a while
– As a last resort, evict a dirty page first
• How can we organize these mechanisms? – Work on the replacement policy
• How many page frames/process?
– Like thread scheduling, need to “schedule” memory resources: » Utilization? fairness? priority?
– Allocation of disk paging bandwidth
Joseph & Kubiatowicz CS162 © UCB Spring 2022

Working Set Model
• As a program executes it transitions through a sequence of “working sets” consisting of varying sized subsets of the address space
Joseph & Kubiatowicz CS162 © UCB Spring 2022

Cache Behavior under WS model
new working set fits
Cache Size
• Amortized by fraction of time the Working Set is active • Transitions from one WS to the next
• Capacity, Conflict, Compulsory misses
• Applicable to memory caches and pages. Others ?
Joseph & Kubiatowicz CS162 © UCB Spring 2022

Another model of Locality: Zipf
Likelihood of accessing item of rank r is α 1/ra
Although rare to access items below the top few, there are so many that it yields a “heavy tailed” distribution
Substantial value from even a tiny cache Substantial misses from even a very large cache
Joseph & Kubiatowicz CS162 © UCB Spring 2022

Demand Paging Cost Model
Since Demand Paging like caching, can compute average access time! (“Effective Access Time”)
– EAT = Hit Rate x HitTime + Miss Rate x MissTime
– EAT = Hit Time + Miss Rate x Miss Penalty Example:
– Memory access time = 200 nanoseconds
– Average page-fault service time = 8 milliseconds
– Suppose p = Probability of miss, 1-p = Probably of hit – Then, we can compute EAT as follows:
EAT = 200ns + p x 8 ms
= 200ns + p x 8,000,000ns
If one access out of 1,000 causes a page fault, then EAT = 8.2 μs:
– This is a slowdown by a factor of 40! What if want slowdown by less than 10%?
– EAT < 200ns x 1.1 ⇒ p < 2.5 x 10-6 – This is about 1 page fault in 400,000! Joseph & Kubiatowicz CS162 © UCB Spring 2022 What Factors Lead to Misses in Page Cache? • Compulsory Misses: – Pages that have never been paged into memory before – How might we remove these misses? » Prefetching: loading them into memory before needed » Need to predict future somehow! More later • Capacity Misses: – Not enough memory. Must somehow increase available memory size. – Can we do this? » One option: Increase amount of DRAM (not quick fix!) » Another option: If multiple processes in memory: adjust percentage of memory allocated to each one! • Conflict Misses: – Technically, conflict misses don’t exist in virtual memory, since it is a “fully-associative” cache • Policy Misses: – Caused when pages were in memory, but kicked out prematurely because of the replacement policy – How to fix? Better replacement policy Joseph & Kubiatowicz CS162 © UCB Spring 2022 Page Replacement Policies • Why do we care about Replacement Policy? – Replacement is an issue with any cache – Particularly important with pages » The cost of being wrong is high: must go to disk » Must keep important pages in memory, not toss them out • FIFO (First In, First Out) – Throw out oldest page. Be fair – let every page live in memory for same amount of time. – Bad – throws out heavily used pages instead of infrequently used – Pick random page for every replacement – Typical solution for TLB’s. Simple hardware – Pretty unpredictable – makes it hard to make real-time guarantees • MIN (Minimum): – Replace page that won’t be used for the longest time – Great (provably optimal), but can’t really know future... – But past is a good predictor of the future ... Joseph & Kubiatowicz CS162 © UCB Spring 2022 Replacement Policies (Con’t) • LRU (Least Recently Used): – Replace page that hasn’t been used for the longest time – Programs have locality, so if something not used for a while, unlikely to be used in the near future. – Seems like LRU should be a good approximation to MIN. • How to implement LRU? Use a list: Tail (LRU) – On each use, remove page from list and place at head – LRU page is at tail • Problems with this scheme for paging? – Need to know immediately when page used so that can change position in list... – Many instructions for each hardware access • In practice, people approximate LRU (more later) Joseph & Kubiatowicz CS162 © UCB Spring 2022 Example: FIFO (strawman) • Suppose we have 3 page frames, 4 virtual pages, and following reference stream: – A B CA B DA D B C B • Consider FIFO Page replacement: Ref: Page: • FIFO: 7 faults • When referencing D, replacing A is bad choice, since need A again right away Joseph & Kubiatowicz CS162 © UCB Spring 2022 Example: MIN / LRU • Suppose we have the same reference stream: – A B CA B DA D B C B • Consider MIN Page replacement: Ref: Page: • MIN: 5 faults – Where will D be brought in? Look for page not referenced farthest in future • What will LRU do? – Same decisions as MIN here, but won’t always be true! Joseph & Kubiatowicz CS162 © UCB Spring 2022 Is LRU guaranteed to perform well? • Consider the following:A B C D A B C D A B C D • LRU Performs as follows (same as FIFO here): Ref: Page: – Every reference is a page fault! • Fairly contrived example of working set of N+1 on N frames Joseph & Kubiatowicz CS162 © UCB Spring 2022 When will LRU perform badly? • Consider the following:A B C D A B C D A B C D • LRU Performs as follows (same as FIFO here): Ref: Page: – Every reference is a page fault! • MIN Does much better: Ref: Page: Joseph & Kubiatowicz CS162 © UCB Spring 2022 Graph of Page Faults Versus The Number of Frames • One desirable property:When you add memory the miss rate drops (stack property) – Does this always happen? – Seems like it should, right? • No: Bélády’s anomaly – Certain replacement algorithms (FIFO) don’t have this obvious property! Joseph & Kubiatowicz CS162 © UCB Spring 2022 Adding Memory Doesn’t Always Help Fault Rate • Does adding memory reduce number of page faults? – Yes for LRU and MIN – Not necessarily for FIFO! (Called Bélády’s anomaly) Ref: Page: Ref: Page: • After adding memory: – With FIFO, contents can be completely different – In contrast, with LRU or MIN, contents of memory with X pages are a subset of contents with X+1 Page Joseph & Kubiatowicz CS162 © UCB Spring 2022 Approximating LRU: Clock Algorithm Set of all pages in Memory Single Clock Hand: Advances only on page fault! Check for pages not used recently Mark pages as not used recently • Clock Algorithm: Arrange physical pages in circle with single clock hand – 程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts