CS计算机代考程序代写 scheme compiler cache computer architecture RISC-V mips Computer Architecture ELEC3441

Computer Architecture ELEC3441
Lecture 10 – Virtual Memory
Dr. Hayden Kwok-Hay So
Department of Electrical and Electronic Engineering
Physical Address
Physical Address
Bare Machine
PC
Inst. Cache
D
Decode
E
+
M
Data Cache
W
Physical Address
Physical Address
Memory Controller
Main Memory (DRAM)
§ In a bare machine, the only kind of address is a physical address § Assume there is only 1 program running in the computer at a
time
§ Program owns all the (physical) memory locations
Physical Address
2
Need for Virtualization
n Real world systems demand multi- programming
• multipleprograms/usersrunninginthesystemat the same time
n Requires:
• Virtualization–illusionofaccessingmorememory
than physically in the system
• Relocation–illusionofrunninginafixedmemory address while physically it is not
• Protection–illusionofbeingtheonlyprogram running in the system
HKUEEE ENGG3441 – HS 3
Virtualization
n Programs (compilers) assume they can access the entire memory of the system
• 32 or 64 bit address
• 4GiB or 16EiB
n Physically, amount of memory ≠ full address space
• In fact, amount of DRAM is system dependent
n Program cannot possess full information about
system configuration during compile time
Motivation: programs can assume access to infinite (full) memory address space.
HKUEEE ENGG3441 – HS 4

Relocation
n Programs must be compiled to run on a specific address in the memory
• e.g. always start execution from location 0x00040000
n Cannot support running 2 programs at the same time:
• Both programs will try to use the same locations
n Possible solution : Compiles all programs that may run at the same time to different locations
• èunrealistic in most situations
Need ability to relocate a program to different part in the memory physically without recompiling the program
HKUEEE ENGG3441 – HS 5
Program 1
Program 2
OS
Protection
n With multiple programs executing at the same time, need to avoid programs interfering each other
n Should not allow Program A to write into the memory of Program B
• due to programming bugs
• due to malicious intentions
n Should not allow a program to change its own code inadvertently
• e.g. buffer overflow attack
Need a way to limit the parts of memory a program can
access
HKUEEE ENGG3441 – HS 6
A unified system to provide all of the above functions
HKUEEE ENGG3441 – HS 7
Paged Memory Systems
§ Processor-generated address can be split into:
• A Page Table contains the physical address at the start of each
Page Number
Offset
page
Address Space of User-1
0 1 2 3
Physical Memory
1
0
3
2
0
1
2
3
Page Table of User-1
Page tables make it possible to store the pages of a program non-contiguously.
8
Physical Memory

Address Translation & Protection
Kernel/User Mode Read/Write
Protection Check
Exception?
Physical Address • Every instruction and data access needs address
translation and protection checks
A good VM design needs to be fast (~ one cycle) and space efficient
Virtual Address
Address Translation
Virtual Page No. (VPN)
offset
Physical Page No. (PPN)
offset
9
Private Address Space per User
VA1
OS
pages
free
User 1
User 2
User 3
Page Table
Page Table
Page Table
VA1
VA1
• Each user has a page table
• Page table contains an entry for each user page
10
Demand Paging
n Virtual and physical memory address space do not match in general:
• Virtualaddressspaceishugewhencomparedto physical memory
• 64-bit virtual address spaceè16 EiB of physical memory
• Memoryrequirementmultipliedbynumberof concurrent user process
n Idea: Use hard disk as memory
• Savememorypagesinharddisk
• Accessingamemorypagethatiscurrently residing in hard disk causes a page fault
HKUEEE ENGG3441 – HS 11
Page Fault Handling
n On page fault:
• An exception causes OS to take over
• Create or locate the requested page from hard disk
• Copy content to physical memory (DRAM)
• Update the rest of the memory system (e.g. page table) about
the new page
n If no physical space is available in memory, swap old pages to hard disk
• Use pseudo-LRU or other similar policies
n Since it takes a long time to transfer a page (msecs), page faults are handled completely in software by the OS
• Untranslated addressing mode is essential to allow kernel to access page tables
n Another job may be run on the CPU while the first job waits for the requested page to be read from disk
HKUEEE ENGG3441 – HS 12
Physical Memory

Page Fault Handling – Hard disk
n The hard disk area for saving memory pages is usually called a swap space
n Harddisk access speed << memory speed • Pagefaultisextremelyslowwhencomparedto any other memory access n Excessive swapping causes processor to thrash • Processorbusycopyingpagesbetweenharddisk and memory without performing any useful work • e.g.ifworkingmemoryrequirementofaprocess> physical memory size
HKUEEE ENGG3441 – HS 13
Where Should Page Tables Reside?
§ Space required by the page tables (PT) is proportional to the address space, number of users, …
Þ Too large to keep in registers
§ Idea: Keep PTs in the main memory
– needs one reference to retrieve the page base address and another to access the data word
Þ doubles the number of memory references!
14
Page Tables in Physical Memory
User 1 Virtual Address Space
User 2 Virtual Address Space
PT User
1
PT User
2
VA1
VA1
15
§ Page Table Entry (PTE) contains:
– A bit to indicate if a page exists
– PPN (physical page number) for a memory-resident page
– DPN (disk page number) for a page on the disk
– Status bits for protection and usage
§ OS sets the Page Table Base Register whenever active user process changes
Supervisor Accessible Control Register inside CPU
Page Table
Data Pages
Linear Page Table
PPN
PPN
DPN
PPN
DPN
PPN
PPN
DPN
DPN
DPN
PPN
PPN
Offset
VPN
Data word
PT Base Register
VPN
Offset
Virtual address from CPU Execute Stage
16
Physical Memory

Size of Linear Page Table
§With 32-bit addresses, 4-KB pages & 4-byte PTEs:
– 220 PTEs, i.e, 4 MB page table per user
– 4 GB of swap needed to back up full virtual address space
§Larger pages?
– Internal fragmentation (Not all memory in page is used) – Larger page fault penalty (more time to read from disk)
§What about 64-bit virtual address space??? – Even 4MB pages would require 242 8-byte PTEs (35 TB!)
What is the “saving grace” ?
17
Hierarchical Page Table
Virtual Address from CPU 31 22 21 12 11 0
p1
p2
offset
10-bit L1 index
Root of the Current Page Table
(Processor Register)
10-bit L2 index
offset
p2
page in primary memory page in secondary memory
PTE of a nonexistent page
Level 2 Page Tables
p1
Level 1 Page Table
Data Pages
18
Two-Level Page Tables in Physical Memory
Virtual Address Spaces
User 1
User 2
Physical Memory
User2/VA1
User1/VA1
Level 1 PT User 1
Level 1 PT User 2
VA1
VA1
Level 2 PT User 2
19
Address Translation Quick Summary
n Example:
• 32-bitvirtual
address
• 30-bitphysical address
n Translation through Page Table
n Page Table can be multi-level
Virtual address
31 30 29 28 27
29 28 27
15 14 13 12 11 10 9 8
15 14 13 12 11 10 9 8
Physical address
3 2 1 0
3 2 1 0
HKUEEE
ENGG3441 – HS
20
Virtual page number
Translation
Page offset
Physical page number
Page offset
Physical Memory

Translation Lookaside Buffers (TLB)
Address translation is very expensive!
In a two-level page table, each reference becomes several memory accesses
Solution: Cache translations in TLB
TLB hit TLB miss
Þ Single-Cycle Translation Þ Page-Table Walk to refill
virtual address
(VPN = virtual page number) (PPN = physical page number)
VPN
offset
V
R
W
D
tag
PPN
hit?
physical address
PPN
offset
21
TLB Designs
§ Typically 32-128 entries, usually fully associative
– Each entry maps a large page, hence less spatial locality across pagesè
more likely that two entries conflict
– Sometimes larger TLBs (256-512 entries) are 4-8 way set-associative – Larger systems sometimes have multi-level (L1 and L2) TLBs
§ Random or FIFO replacement policy
§ No process information in TLB?
§ TLB Reach: Size of largest virtual address space that can be simultaneously mapped by TLB
Example: 64 TLB entries, 4KB pages, one page per entry
64 entries * 4 KB = 256 KB (if contiguous)
TLB Reach = _____________________________________________?
22
Address Translation in CPU Pipeline
+
Instruction Data Virtual Data Address
PC
Inst TLB
Inst.
Cache
D
Decode
E
M
Data TLB
Data
Cache
W
Virtual Instruction Address
Physical Instruction Address Physical Data Address
§ Address translation results in physical memory address
§ Physical address used to index data/instruction cache for read/write
23
Address Translation + Cache
31 30 29
= = = = = =
Tag
=
20
Virtual page number
Tag
14 13 12 11 10 9 3 2 1 0
Page offset 12
Physical page number
20
Data
32
Ex: Translate 0xABCD1234
Byte offset
Virtual address
TLB
TLB hit
Valid Dirty
VPN
Page Offset
PPN
Block offset
42
Physical Address
8
8
PA Tag
12
Cache Index
Valid
Cache
Cache hit
Block Offset
Byte Offset
0x305BC
HKUEEE
ENGG3441 – HS 24
Physical page number Page offset
Physical address
Physical address tag Cache index
18
Data

VA
ABCD1234
1010 1011 1100 1101 0001 0010 0011 0100
PA
305BC234
11
0011 0000 0101 1011 1100 0010 0011 0100
TLB
V
1
1
0
0
1
1
0
Tag
0E012BA
0B231CD
10BBC23
060B784
0000000
160B007
1B231CE
3
W3
AB3
4A3
BF3
DA3
000
0A3
FF3
W2
AB2
4A2
BF2
000
0A2
FF2
W1
AB1
4A1
DA1
W0
AB0
4A0
BF1
BF0
1
DA2
DA0
000
0A1
FF1
000
0A0
5FF0
0000000
000
000
000
000
4k Page
Direct Map $
8 Entries
4 words block Physically Tagged
HKUEEE
2
4
=?
ENGG3441 – HS
25
dmem[ABCD1234]
Address Translation:
putting it all together
Virtual Address
miss hit
hardware
hardware or software software
permitted
Physical
Address
(to cache)
26
TLB Lookup
Page Table Walk
Protection Check
Ï memory
Where?
the page is
Î memory
Update TLB
denied
SEGFAULT
Page Fault
(OS loads page)
Protection Fault
Virtual address
TLB access
TLB hit?
Flowchart from TLB to Cache
TLB miss exception
No
Yes
Physical address
No
Write?
Yes
No
Write protection exception
Write access bit on?
No
Yes
Try to write data to cache
Cache miss stall while read block
No
Cache hit?
Yes
Deliver data to the CPU
Cache miss stall while read block
Cache hit?
Yes
HKUEEE
ENGG3441 – HS
27
Try to read data from cache
Write data into cache, update the dirty bit, and put the data and the address into the write buffer
Handling a TLB Miss
n Software (MIPS, Alpha)
• TLB miss causes an exception and the operating system walks the page tables and reloads TLB. A privileged “untranslated” addressing mode used for walk.
n Hardware (SPARC v8, x86, PowerPC, RISC-V)
• A memory management unit (MMU) walks the page tables and reloads the TLB.
• If a missing (data or PT) page is encountered during the TLB reloading, MMU gives up and signals a Page Fault exception for the original instruction.
HKUEEE ENGG3441 – HS 28

Page-Based Virtual-Memory Machine (Hardware Page-Table Walk)
Page Fault?
Protection violation?
Page Fault?
Protection violation?
Virtual Address
Miss?
Physical Address
Virtual Address
+
Physical Address
Physical Address
PC
Inst. TLB
Inst.
Cache
D
Decode
E
M
Data TLB
Data
Cache
W
Miss?
Page-Table Base Register
Hardware Page Table Walker
Memory Controller
Main Memory (DRAM)
§ Assumes page tables held in untranslated physical memory
Physical Address
Physical Address
29
Handling VM-related exceptions
+
TLB miss? Page Fault? TLB miss? Page Fault? Protection violation? Protection violation?
§ Handling a TLB miss needs a hardware or software mechanism to refill TLB
§ Handling a page fault (e.g., page is on disk) needs a restartable exception so software handler can resume after retrieving page
– Precise exceptions are easy to restart
– Can be imprecise but restartable, but this complicates OS software
§ Handling protection violation may abort process – But often handled the same as a page fault
PC
Inst TLB
Inst.
Cache
D
Decode
E
M
Data TLB
Data
Cache
W
30
Hierarchical Page Table Walk: SPARC v8
Index 1
Index 2
Index 3
Offset
Virtual Address
Context Table
31 23 17 11 0
Context Table Register
L1 Table
root ptr
Context Register
L2 Table
PTP
PTP
L3 Table
PTE
31 11 0
PPN
Offset
Physical Address
MMU does this table walk in hardware on a TLB miss
31
Address Translation Performance
+
TLB miss? Page Fault? TLB miss? Page Fault? Protection violation? Protection violation?
§Need to cope with additional latency of TLB:
– slow down the clock?
– pipeline the TLB and cache access?
– virtual address caches
– parallel TLB/cache access
PC
Inst TLB
Inst.
Cache
D
Decode
E
M
Data TLB
Data
Cache
W
32

CPU
VA
PA
PA
Virtual-Address Caches
TLB
Physical Cache
Primary Memory
Alternative: place the cache before the TLB
VA
§ cache needs to be flushed on a context switch unless address space identifiers (ASIDs) included in tags (-)
§ aliasing problems due to the sharing of pages (-)
§ maintaining cache coherence (-) (see later in course)
VA PA (StrongARM) § one-step process in case of a hit (+)
Primary Memory
CPU
Virtual Cache
TLB
33
Virtually Addressed Cache (Virtual Index/Virtual Tag)
+
Virtual Address
Virtual Address
Miss?
Miss?
Physical Address
34
PC
Inst.
Cache
D
Decode
Page-Table Base Register
E
M
Data
Cache
W
Inst. TLB
Hardware Page Table Walker
Data TLB
Physical Address
Translate on miss
Instruction data
Memory Controller
Physical Address
Main Memory (DRAM)
Concurrent Access to TLB & Cache (Virtual Index/Physical Tag)
VPN
L
b
VA
PA
Tag
Virtual Index
TLB
k
Direct-map Cache 2L blocks 2b-byte block
PPN
Page Offset
=
Physical Tag Data
hit?
Index L is available without consulting the TLB
cache and TLB accesses can begin simultaneously!
Tag comparison is made after both accesses are completed
Cases:L+b=k, L+bk
35
ABCD1234
305BC234
4k Page Direct Map $ 8 Entries
4 words block
Virtually Indexed Physically Tagged Cache
VA
1010 1011 1100 1101 0001 0010 0011 0100
10
0011 0000 0101 1011 1100 0010 0011 0100
0
W1
PA
TLB
V
1
1
0
0
1
1
0
Tag
0E012BA
0B231CD
10BBC23
060B784
0000000
160B007
1B231CE
2
W3
AB3
4A3
BF3
DA3
000
0A3
FF3
W2
AB2
4A2
BF2
000
0A2
FF2
AB1
4A1
DA1
W0
AB0
4A0
BF1
BF0
1
DA2
DA0
000
0A1
FF1
000
0A0
2FF0
0000000
000
000
000
000
1
2
=?
HKUEEE
ENGG3441 – HS
36
dmem[ABCD1234]

Virtual-Index Physical-Tag Caches: Associative Organization
VPN
a
L = k-b
b
VA
PA
Virtual 2a Index
TLB
k
Direct-map 2L blocks
Direct-map 2L blocks
Phy. Tag
PPN
Page Offset
Tag
== hit? 2a
Data After the PPN is known, 2a physical tags are compared
How does this scheme scale to larger caches?
37
VA1
VA2
Data Pages PA
Aliasing in Virtual-Address Caches
Page Table
Tag Data
Virtual cache can have two copies of same physical data. Writes to one copy not visible to reads of other!
VA1
1st Copy of Data at PA
VA2
2nd Copy of Data at PA
Two virtual pages share one physical page
General Solution: Prevent aliases coexisting in cache Software (i.e., OS) solution for direct-mapped cache
VAs of shared pages must agree in cache index bits; this ensures all VAs accessing same PA will conflict in direct-mapped cache (early SPARCs)
38
Modern Virtual Memory Systems
Illusion of a large, private, uniform store
Protection & Privacy
several users, each with their private address space and one or more shared address spaces
OS
page table name space
Demand Paging
Provides the ability to run programs larger than the primary memory
Hides differences in machine configurations
The price is address translation on each memory reference
Secondary
Storage Primary
Memory
useri
VA
PA
mapping
TLB
39
VM features track historical uses:
§ Bare machine, only physical addresses – One program owned entire machine
§ Batch-style multiprogramming
– Several programs sharing CPU while waiting for I/O
– Base & bound: translation and protection between programs (supports
swapping entire programs but not demand-paged virtual memory)
– Problem with external fragmentation (holes in memory), needed occasional
memory defragmentation as new jobs arrived
§ Time sharing
– More interactive programs, waiting for user. Also, more jobs/second.
– Motivated move to fixed-size page translation and protection, no external
fragmentation (but now internal fragmentation, wasted bytes in page)
– Motivated adoption of virtual memory to allow more jobs to share limited
physical memory resources while holding working set in memory § Virtual Machine Monitors
– Run multiple operating systems on one machine
– Idea from 1970s IBM mainframes, now common on laptops
• e.g., run Windows on top of Mac OS X
– Hardware support for two levels of translation/protection
• Guest OS virtual -> Guest OS physical -> Host machine physical
40

Virtual Memory Use Today – 1
§ Servers/desktops/laptops/smartphones have full demand- paged virtual memory
– Portability between machines with different memory sizes – Protection between multiple users or multiple tasks
– Share small physical memory among active tasks
– Simplifies implementation of some OS features
§ Vector supercomputers have translation and protection but rarely complete demand-paging
§ (Older Crays: base&bound, Japanese & Cray X1/X2: pages)
– Don’t waste expensive CPU time thrashing to disk (make jobs fit in memory) – Mostly run in batch mode (run set of jobs that fits in memory)
– Difficult to implement restartable vector instructions
41
Virtual Memory Use Today – 2
§ Most embedded processors and DSPs provide physical addressing only
– Can’t afford area/speed/power budget for virtual memory support
– Often there is no secondary storage to swap to!
– Programs custom written for particular memory configuration in product – Difficult to implement restartable instructions for exposed architectures
42
Acknowledgements
n These slides contain material developed and copyright by:
• Arvind (MIT)
• Krste Asanovic (MIT/UCB)
• Joel Emer (Intel/MIT)
• James Hoe (CMU)
• John Kubiatowicz (UCB)
• David Patterson (UCB)
• John Lazzaro (UCB)
n MIT material derived from course 6.823
n UCB material derived from course CS152,
CS252
HKUEEE ENGG3441 – HS 43