程序代写代做代考 scheme algorithm cache chain Memory Management

Memory Management

Memory Management

Anandha Gopalan
(with thanks to R. Kolcun and P. Pietzuch)

axgopala@imperial.ac.uk

Memory Management
Outline

Basic Concepts

Memory Allocation

Swapping

Virtual Memory

Paging & Segmentation

Demand Paging

Page replacement algorithms

Working set model

Linux Memory Management

2/86

Memory Hierarchy

Hardware: CPU registers and main memory

Register access in one CPU clock cycle (or less)
Main memory can take many cycles
Caches sit between main memory and CPU registers

Managed by

CPU
registers

L1 cache

L2 cache

Main memory

Backing storage

Hardware

Hardware/
Software

3/86

Memory Management

Memory is a key component of the computer

e.g. every instruction cycle involves memory access ⇒ process
has to be loaded into memory before it can execute

Memory management needs to provide

Memory allocation

Memory protection

Characteristics

No knowledge of how memory addresses are generated
e.g. instruction counter, indexing, indirection, . . .

No knowledge what memory addresses are used for
e.g. instructions or data

True for simple case but may want protection with respect to
read, write, execute, etc.

4/86

Logical vs. Physical Address Space

Memory management binds logical address space to physical
address space

Logical address

Generated by the CPU

Address space seen by the process

Physical address

Address seen by the memory unit

Refers to physical system memory

Logical and physical addresses

Same in compile- and load-time address-binding schemes

Different in execution-time address-binding schemes

How do you achieve this mapping?

5/86

Memory-Management Unit (MMU)

Hardware device for mapping logical to physical addresses

e.g. add value in relocation register to every address
generated by process when sent to memory

User process deals with logical addresses only

Has to be fast → implemented in hardware

CPU

logical
address

900
+

14000

relocation
register

physical
address

14900

MMU

Memory

6/86

Memory Allocation

Main memory is usually split into two partitions:

Resident operating system (kernel)

Usually held in low memory with interrupt vector

User processes (user)

Held in high memory

How do you decide where to load a new process?

Need to figure out the strategy for process to be loaded into the
correct location

7/86

Contiguous Memory Allocation I

Contiguous allocation with relocation registers

base register contains physical start address for process

limit register contains maximum logical address for process

MMU maps logical address dynamically

Physical address = logical address + base

If logical address > limit then error

8/86

Contiguous Memory Allocation II

base and limit register define logical address space

e.g jmp 100 in program would go to physical location 300140

9/86

Multiple-Partition Allocation

Hole

Block of available memory
Holes of various size scattered throughout memory

When new process arrives:

allocate memory from hole large enough

OS maintains information about:

Allocated partitions
Free partitions (holes)

What is the best algorithm for allocation?

10/86

Dynamic Memory Allocation

First-fit → Allocate first hole that is big enough

Best-fit → Allocate smallest hole that is big enough

Must search entire list, unless ordered by size

Produces smallest leftover hole

Worst-fit → Allocate largest hole

Must also search entire list

Produces largest leftover hole

Why best-fit or worst-fit?

First-fit and best-fit better than worst-fit in terms of speed and
storage utilisation

11/86

Fragmentation

External fragmentation → memory exists to satisfy request, but
not contiguous

Reduce external fragmentation by compaction

Shuffle memory contents to place all free memory together in
one large block → leads to I/O bottlenecks

12/86

Swapping

Problem: Number of processes limited by amount of available
memory

But . . . only running processes need to be in memory

Solution:

Swap processes temporarily out of memory to backing store

Bring back into memory for continued execution

Requires swap space → can be file or dedicated partition on
disk

Transfer time is major part of swap time

13/86

Swapping

What if a process is “too large” to fit into memory ⇒ can only
part of a process exist in memory?

14/86

Virtual Memory

Separation of user logical memory from physical memory

Only part of process needs to be in memory for execution

Logical address space can be much larger than physical
address space

Address spaces can be shared by several processes

Allows for more efficient process creation

15/86

Virtual Memory

16/86

Virtual Memory

Virtual memory can be implemented via

Paging

Segmentation

17/86

Paging

Physical address space of process can be noncontiguous

Process allocated physical memory when available

Avoid external fragmentation
Avoid problems of variable sized memory chunks

Frames

Fixed-sized blocks of physical memory

Keep track of all free frames

Pages

Block of same size (as frame) of logical memory

To run program of size n pages

Find n free frames and load program

Set up page table to translate logical into physical addresses

18/86

Page Table Example

19/86

Address Translation I

How does logical address translate to physical address?

Hint: pages and frames are the same size ⇒ address offset in the
page will be the same as that in the frame

Address now consists of two parts: page number and page offset

only need to translate page number into its corresponding
frame address

20/86

Address Translation II

How do you calculate the page number?

Depends on address size and page/frame size

e.g. Consider a page/frame size of 64 bytes

64 bytes can be addressed ⇒ total of 64 addresses

Number of bits required for 64 addresses = 6 (26 = 64)

For a 10-bit virtual address we have:

page offset requires 6 bits (based on above)

page number has 4 bits (remaining bits) ⇒ between 0 . . . 15

21/86

Address Translation III

Page number (p)

Used as an index into page table

Page table has base address of pages in physical memory

Page offset (d)

Defines physical memory address sent to the memory unit

Combined with base address

For given address size of m-bits and page size of 2n

page number page offset

p d

(m – n) bits n bits

22/86

Paging Hardware

23/86

Free Frames

24/86

Example Problem

Address Translation

Consider a 32-bit virtual memory address and a page size of 1 KB.
How many pages can a process potentially have?

1 KB page size = 1024 bytes ⇒ total of 1024 addresses
Number of bits needed for 1024 address = 10 (210 = 1024)

For a 32-bit address you have:

page offset requires 10 bits

page number has 22 bits ⇒ 222 (4194304) potential pages

25/86

Fragmentation

Internal fragmentation → Allocated memory is larger than
requested memory, but size difference internal to partition

Example – Calculating Internal Fragmentation

Page size = 2048 bytes; Process size = 72,766 bytes

Number of pages = 72766
2048

= 35

Bytes left-over = 72766 % 2048 = 1086

Internal fragmentation = 2048 – 1086 = 962 bytes

Worst-case fragmentation ⇒ 1 frame = 1 byte

Average-case fragmentation ⇒ 1
2

frame size

26/86

Fragmentation
Frame Size

Are small frames desirable?

Each page table entry takes memory to track

Page size growing over time → typically 4 KB but some
architectures support variable page sizes up to 256 MB

27/86

Page Table Implementation

Page table kept in main memory

Page-table base register (PTBR) points to page table

Context switch requires update of PTBR for new process page
table (if necessary)

Page-table length register (PTLR) indicates size

Problem

Inefficient, as every data/instruction access requires two
memory accesses → one for page table and one for
data/instruction

28/86

Associative Memory

Solution: use special fast-lookup hardware cache as associative
memory

Associative memory → supports parallel search

Called Translation Look-aside Buffer (TLB)

Page # Frame #

1 4

3 74

9 50

7 7

Address translation (p, d)

If p in associative register, get frame # out

Otherwise, get frame # from page table in memory

29/86

Translation Look-aside Buffers (TLBs) I

TLBs usually needs to be flushed after context switch

Can lead to substantial overhead

What about kernel pages for system calls?

Some TLBs store address-space ids (ASIDs) in entries

Uniquely identifies each process to provide address-space
protection for that process

30/86

Translation Look-aside Buffers (TLBs) II

31/86

Example Problem

Effective Access Time

TLB Lookup = � (can be < 10% of memory access time m) Hit Ratio = α Fraction of times that page is found in associative registers Ratio related to number of associative registers Effective Access Time (EAT) = (� + m) x α + (� + 2m) x (1 - α) = 2m + � - mα Consider α = 80%, � = 10 ns for TLB search, m = 100 ns for memory access EAT = 110 x 0.80 + 210 x 0.20 = 130 ns A more realistic hit ratio might be 99% EAT = 110 x 0.99 + 210 x 0.01 = 111 ns 32/86 Page Table Size Why do we need need to worry? Page table can grow to be very large in size On a 32-bit machine with a 4 KB page size: Number of page table entries = 2 32 212 = 220 Size of each page table entry = 32 bits Size of page table = 220 x 32 bits = 4 MB On 64-bit machine with 4 KB pages → page table needs 252 entries with 8 bytes per entry, that’s 30 million GB . . . lot of memory to be allocated / 33/86 Page Table Types Hierarchical page table Hashed page table Inverted page table 34/86 Hierarchical Page Table I Idea: Let the page-table be broken-up and paged if it is too large Simple technique → two-level page table for a machine with 32-bit addresses and a 4 KB page size page offset needs 12 bits page table size = 4 MB How do you break the page table up? Each part of the page table that is being paged must fit on a page Recall: page size = 4 KB Number of entries on one page = Page size Address size = 4 KB 32 bits = 210 No of bits required for 210 entries = 10 Address bits left for top-level page table = 32 – 10 – 12 = 10 35/86 Hierarchical Page Table II Fix outer page table in memory 36/86 Two-Level Paging I Logical address divided Page number consisting of 20 bits Page offset consisting of 12 bits Since page table paged, page number further divided 10 bit page number 10 bit page offset within 2nd level page table Thus, logical addresses as follows page number page offset p1 p2 d 10 10 12 p1 → index into the outer page table p2 → displacement within page pointed to by outer page table 37/86 Two-Level Paging II 38/86 Example Problem Page Table Addressing Consider a paging system that uses a three-level page table. Virtual addresses are composed into four fields (a, b, c, d) with d being the offset. What is the maximum number of pages in a virtual address space? Answer: 2a+b+c , since there are a total of 2a+b+c+d addresses in the address space and each page has 2d addresses Page Table: Another Idea Don’t store entry per page but per frame Hashed page table Inverted page table 39/86 Hashed Page Table Hash virtual page number into page table Page table contains chain of elements hashing to same location Search for match of virtual page number in chain Extract corresponding physical frame if match found 40/86 Inverted Page Table One entry per physical frame Decreases memory needed to store page table But increases time to search table when page reference occurs 41/86 Segmentation Paging gives one-dimensional virtual address space → what about separate address spaces for code, data, stack? Segment Independent address space from 0 to some maximum Can grow/shrink independently Support different kinds of protection (read/write/execute) Unlike pages, programmers are aware of segments Segment corresponds to program, procedure, stack, object, array, etc. Memory allocation harder due to variable size May need to move segment which grows May suffer from external fragmentation But good for shared libraries 42/86 Logical View of Segmentation 43/86 Segmentation Address Translation One bit in table indicates whether segment is in memory Another bit indicates whether segment is modified 44/86 Hybrid Segmentation/Paging Most OSs use only paging 45/86 Memory Control Bits Protection bits → associated with a frame indicate read-only, read-write, execute only Valid-invalid bit Valid → page present in physical memory Invalid → page missing in physical memory Page fault is generated ⇒ kernel trap to bring in page from backing store Page replacement bits → to indicate if page has been modified or referenced (used later). Also, lock bit to prevent page from being transferred out 46/86 Memory Validity 47/86 Demand Paging I When do you bring the page into memory? Bring page into memory only when needed Lower I/O load Less memory needed Faster response time Support for more users Page needed → reference it Invalid reference → abort Not-in-memory → bring into memory Many page faults when process first starts Eventually required pages are in memory so page fault rate drops 48/86 Demand Paging II 49/86 Demand Paging III Use valid-invalid bit to check memory validity 1 → in memory 0 → not in memory Initially set to 0 on all entries If 0 during address translation → page fault 50/86 Demand Paging IV 51/86 Page Faults I First reference, trap to OS → page fault OS looks at another table to decide Invalid reference → abort Valid reference but just not in memory → handle request To handle valid request Get empty frame Swap page into frame Reset tables, validation bit = 1 Restart last instruction 52/86 Page Faults II 53/86 Performance: Demand Paging Page Fault Rate (p), 0 ≤ p ≤ 1.0 If p = 0, no page faults If p = 1, every reference causes a page fault Effective Access Time (EAT) = (1 – p) x memory access + p x (page fault overhead + [swap page out] + swap page in + restart overhead) Note: no need to swap page out if not modified 54/86 Virtual Memory Tricks Copy-on-Write (COW) Allows parent and child processes to initially share same pages in memory → if either process modifies shared page, then copy page Efficient process creation: copy only modified pages Free pages allocated from pool of zeroed-out pages Memory-mapped files Map file into virtual address space using paging Simplifies programming model for I/O I/O Interlock Pages must sometimes be locked into memory Pages used for DMA from disk 55/86 Example Problem Demand Paging Memory access time = 200 ns Average page-fault service time = 8 ms EAT = (1 – p) x 200 + p x (8 ms) = (1 – p) x 200 + p x 8,000,000 = 200 + p x 7,999,800 If one access out of 1,000 causes a page fault, then EAT = 8.2 ms → slowdown by a factor of 40! If we want performance degradation < 10% EAT < EAT + 10% of EAT 200 + 7,999,800 x p < 220 7,999,800 x p < 20 p < 0.0000025 Less than one page fault in every 400,000 memory accesses 56/86 Page Replacement No free frame? Replace page How do you decide which page to replace? Find some unused page in memory to swap out → need strategy for page replacement Minimise number of page faults Avoid bringing same page into memory several times Prevent over-allocation of memory Page-fault service routine should include page replacement Use modify (dirty) bit to reduce overhead of page transfers Only modified pages written to disk 57/86 Basic Page Replacement I 1 Find location of desired page on disk 2 Find free frame 3 Frame found? Yes? ⇒ use it No? ⇒ use replacement algorithm to select victim frame 4 Load desired page into (newly) freed frame 5 Update page and frame tables 6 Restart process 58/86 Basic Page Replacement II 59/86 Page Replacement Algorithms Aim: Lowest page-fault rate How do we compare page replacement algorithms? Use a Reference String → particular string of memory references and calculate number of page faults for each algorithm E.g. 1, 2, 3, 3, 2, 4, 1, 4, 5, 5, 7, 2, 3, 1 60/86 Optimal Algorithm Replace page that will not be used for the longest period of time Unimplementable, as knowledge of future references needed Useful for measuring how well other algorithms perform 1 4 2 3 4 5 Assume 4 frames 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 6 page faults Reference string 7 0 1 2 0 3 0 4 2 3 0 3 2 1 2 0 1 7 0 1 7 7 7 2 2 2 2 2 7 0 0 0 0 4 0 0 0 1 1 3 3 3 1 1 Total of 9 page faults 61/86 First-In-First-Out (FIFO) Algorithm Replace oldest page May replace heavily used page Reference string 7 0 1 2 0 3 0 4 2 3 0 3 2 1 2 0 1 7 0 1 7 7 7 2 2 2 4 4 4 0 0 0 7 7 7 0 0 0 3 3 3 2 2 2 1 1 1 0 0 1 1 1 0 0 0 3 3 3 2 2 2 1 Total of 15 page faults Heavily used pages, 0, 2, 3 are being swapped in and out 62/86 Belady’s Anomaly Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 (FIFO replacement) 1 4 5 2 1 3 3 2 4 Assume 3 frames 9 page faults 1 5 4 2 1 5 3 2 4 3 Assume 4 frames 10 page faults Belady’s Anomaly: More frames ⇒ more page faults 63/86 Least Recently Used (LRU) Algorithm Each page entry has a counter When page referenced, copy clock into counter When page needs to be replaced, choose lowest counter 1 5 2 3 5 4 4 3 Assume 4 frames 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 8 page faults Reference string 7 0 1 2 0 3 0 4 2 3 0 3 2 1 2 0 1 7 0 1 7 7 7 2 2 4 4 4 0 1 1 1 0 0 0 0 0 0 3 3 3 0 0 1 1 3 3 2 2 2 2 2 7 Total of 12 page faults 64/86 LRU Approximation Algorithms Proper LRU is expensive → use approximations instead Reference bit With each page associate reference bit r, initially r = 0 When page referenced, set r = 1 Replace page with r = 0 (if one exists) Periodically reset reference bits Does not provide proper order for LRU Clock Replacement Policy Needs reference bit r and uses clock replacement If page to be replaced (in clock order) has r = 1 then Set r = 0 and leave page in memory Continue till you find r = 0, and replace that page If all r = 1, replace starting page 65/86 Clock Page Replacement When page fault occurs, the page being pointed to is inspected If r = 0, evict page If r = 1, clear r, and advance pointer 66/86 Counting Algorithms Keep counter of number of references made to each page LFU (least frequently used) algorithm Replace page with smallest count May replace page just brought into memory Page with heavy usage in past will have high count Reset counters or use aging MFU (most frequently used) algorithm Replace page with largest count Page with smallest count probably just brought in and yet to be used 67/86 Example Problem Page Replacement Reference string: 1, 2, 1, 3, 2, 1, 4, 3, 1, 1, 2, 4, 1, 5, 6, 2, 1. Assuming number of frames is 3, calculate the number of page faults for LRU and Clock page replacement algorithms. Using LRU: 1 2 1 3 2 1 4 3 1 1 2 4 1 5 6 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 4 4 4 6 6 6 3 3 3 4 4 4 4 2 2 2 5 5 5 1 Y Y N Y N N Y Y N N Y Y N Y Y Y Y Total of 11 page faults Using Clock: 1 2 1 3 2 1 4 3 1 1 2 4 1 5 6 2 1 1 1 1 1 1 1 4 4 4 4 4 4 4 5 5 5 5 2 2 2 2 2 2 2 1 1 1 1 1 1 6 6 6 3 3 3 3 3 3 3 2 2 2 2 2 2 1 Y Y N Y N N Y N Y N Y N N Y Y N Y Total of 9 page faults 68/86 Locality of Reference I For program to run efficiently System must maintain program’s favoured subset of pages in main memory Otherwise thrashing Excessive paging activity causing low processor utilisation Program repeatedly requests pages from secondary storage Locality of Reference Programs tend to request same pages in space and time 69/86 Locality of Reference II 70/86 Working Set Model Working set of pages → W (t, w) Set of pages referenced by process during process-time interval (t – w) to t 71/86 Working Set Clock Algorithm Idea: Add “time of last use” to Clock Replacement algorithm Keeps track if page in working set At each page fault, examine page pointed to If r = 1, then set r = 0 and move to next page If r = 0, calculate age If age < working set age w, continue (page in working set) If age > working set age w

If page is clean, replace

Otherwise trigger write-back, continue to next page

72/86

Working Set Size

Processes transition between working sets

OS temporarily maintains in memory pages outside of current
working set

Goal of memory management is to reduce mis-allocation

What about page fault frequency?

If many faults ⇒ allocate more page frames

73/86

Global vs. Local Page Replacement

Local strategy

Each process gets fixed allocation of physical memory

Need to pick up changes in working set size

Global strategy

Dynamically share memory between runnable processes

Initially allocate memory proportional to process size

Consider page fault frequency (PFF) to tune allocation

Measure page faults/per sec and increase/decrease allocation

No universally agreed solution

Linux: global page replacement

Windows: local page replacement

Depends on scheduling strategy (i.e. round-robin, . . . )

74/86

Linux Memory Management

75/86

Mapping and Sharing Memory

76/86

Memory Management System Calls

System call Description
s = brk(addr) Change data segment size
a = mmap

(addr,len,prot,flags,fd,offset)

Map a file/device into memory

s = munmap (addr,len) Unmap a file/device from memory

Return code s is -1 if error

a and addr are memory addresses

len is a length

prot controls protection

flags are miscellaneous bits

fd is a file descriptor

offset is a file offset

77/86

Virtual Memory Layout I

78/86

Virtual Memory Layout II

On a 32-bit machine, process has 4 GB of space

Top 1 GB used for Kernel memory

User processes can make system calls without TLB flush

Kernel space not visible in user mode

Kernel typically resides in 0 – 1 GB of physical memory

Kernel maps lower 896 MB of physical memory to its virtual
address space

All memory access must be virtual but need efficient access to
user memory + DMA in low memory

Create temporary mappings for > 896 MB of physical memory
in remaining 128 MB of virtual memory

79/86

Physical Memory Management

Linux memory zones

ZONE DMA and ZONE DMA32: pages used for DMA

ZONE NORMAL: normal regularly mapped pages

ZONE HIGHMEM (> 896 MB): pages with high memory
addresses – not permanently mapped

Kernel and memory map are pinned, i.e. never paged out

80/86

Paging

Usually on IA-32 On x86-64

Page size 4 KB Larger page sizes
(e.g. 4 MB)

Virtual address
space

4 GB 128 TB

Page-table Two-level (or three
levels with Physical
Address Extension
(PAE))

Up to four-level page
table

Offset bits contain page status: dirty, read-only, . . .

81/86

3-level Paging

82/86

Buddy Memory Allocation

Tries to map contiguous pages to contiguous frames to
optimise transfers

Split and merge frames as required

83/86

Page Replacement I

Linux uses variation of clock algorithm to approximate LRU
page-replacement strategy

Memory manager uses two linked lists (and reference bits)

Active list
Contains active pages

Most-recently used pages near head of active list

Inactive list
Contains inactive pages

Least-recently used pages near tail of inactive list

Only replace pages in inactive list

84/86

Page Replacement II

85/86

Page Replacement III

kswapd (swap daemon)

Pages in inactive list reclaimed when memory low

Uses dedicated swap partition or file

Must handle locked and shared pages

pdflush kernel thread

Periodically flushes dirty pages to disk

86/86