ECE 391 Virtualization
Yih-
Portions taken from ECE 391 Lecture Notes by , , , Wikipedia, the free encyclopedia, ’ x86 Assembly Guide, ’s Programming from the Ground Up, and the X86 Assembly Wikibook
History
Ad from 1994 Byte Magazine
Not Tested On Students
Image by pch.vector (freepik.com)
You should be familiar with the following:
• virtualmemory
– virtual/logical address – linear address
– physical address
• problemsaddressedby virtual memory
– protection between programs
– code/data sharing
– memory fragmentation
– code/data relocation
• x86segmentation support
– segment
– shadow bits (in a
register)
– Global Descriptor Table (GDT)
– Local Descriptor Table (LDT)
– task state segment (TSS)
What is virtualization?
How do we do it?
Virtualization
• Virtualization is the process of creating a software-based, or virtual, representation of something, such as virtual applications, servers, [memory, cpu, ] storage and networks.
https://www.vmware.com/solutions/virtualization.html
How do we do it?
• Time-based sharing • Space-based sharing
What is virtual memory? • A useful abstraction
– between memory addresses seen by software and those used by hardware
– Enabled by indirection
• Typically done with large blocks, e.g., 4kB in x86
virtual physical addresses addresses,
(per program) including memory-
4kB
4kB
4kB
4kB
mapped I/O
called an address space
not in use/ not visible to this
not actually
in memory program
…
Why use virtual memory?
What does it cost?
Why use virtual memory? • protection
– one program cannot accidentally or deliberately destroy another’s data
– the memory is simply not accessible
Why use virtual memory? • more effective sharing
– two (or more) programs that share library code can share a single copy of the code in physical memory
– code and data not actively used by a program can be pushed out to disk to make room for other programs’ active data; provides the illusion of a much larger physical memory
Why use virtual memory? • no fragmentation [little to none, anyway]
– systems without virtual memory suffer fragmentation effects when they try to multitask
– for example, if we run A followed by B followed by C, and then B finishes, we can’t give D a contiguous block of memory, even though it fits in the absolute sense
A
A
B
A
B
C
A
C
D
Why use virtual memory?
• simplifiesprogramloadingandexecution:
no relocation of code, rewriting stored pointer values, etc.
• Complexity • Space
• Time
Trade-offs
x86 Support for VM
• protectionmodel • segmentation
• paging
logical linear physical address address address
seg. unit
paging unit
x86 Protection Model
• four rings: kernel (ring 0) through user (ring 3) – lower numbers are more privileged
– lower numbers never call/trust higher numbers
– higher numbers call lower numbers only through narrow interfaces (e.g., system calls)
x86 Protection Model
• CPL—current privilege level (of executing code) [in CS]
• RPL—requested privilege level; when code at high privilege level executes on behalf of code at lower level, some accesses may voluntarily lower privilege to that of caller/beneficiary
• DPL—descriptor privilege level; level necessary to execute code/data
0 kernel
12
if MAX(CPL,RPL) > DPL, processor generates an exception (general protection fault)
not used by Linux
3 user
x86 Segmentation
• x86 actually has two levels of indirection, but one is mostly unused…(this one!)
• a segment is a contiguous portion of a linear address space such as the 32-bit space of physical addresses
• x86 in protected mode always uses segmentation
GDTR
47 16 15 0
16-Bit Table Limit
32-bit Linear Base Address
global descriptor table (GDT)
0 48-bit register 1
GDTR points to 2 table & holds 3
table size as well (really size – 1 in bytes)
8191 (max.)
8B descriptors include: base address, size, DPL, & some other bits
#0 is not usable
…
Segment descriptors
• descriptors can also differentiate
– code (executable and possible readable) from
– data (readable and possibly writable)
– and a few other somewhat useful things
• finally, descriptors in the GDT can describe certain aspects of program state (e.g., the task state segment, or TSS), which we talk about later
codesegment CS datasegment DS
extradata ES segment
stillmoreextras FS
each segment register has 16 bits visible + ~64 bits shadow (not accessible via ISA) that cache the description of the segment # referenced by the visible 16 bits
(floating point + another)
Segment Registers
GS stack segment SS
segment register meaning
15 3210
RPL
index in table
(0…8191)
or, since table entries are 8B, offset to find entry
0 for GDT
1 for Local Descriptor
Table (not mentioned yet, & essentially not used by Linux)
GDTR
+(segment register & 0xFFF8)
8191 (max.)
jumbled mess…J
0 1 2 3
note: if a descriptor in the table (GDT) changes, a segment register that references it must be reloaded to update the shadow portion of the register
… …
LDT
• GDT entries can also describe local descriptor tables (LDTs)
– LDT originally meant to be per-task segment tables
– LDTR points to current LDT (includes base, size, and index of LDT in GDT)
LDT
GDTR
0
LDTR
an LDT
8191 (max.)
size
…
unused
thread-local storage segments (glibs, WINE, etc.)
kernel code seg.
kernel data seg
user code seg.
user data seg
TSS
LDT
BIOS support, per-CPU data, TSS for double faults, etc.
GDTR
0 1
5 6
11 12
13 14
15 16 17 18
31
each CPU has its own GDT;
look at per_cpu__gdt_page in a debugger
(see asm/segment.h for details)
[together on one cache line] each starts at address 0
and has size 4GB,
so, effectively, segmentation is not used in Linux
task state &
LDT for this CPU
LDT segment is not present by default (all bits are 0)
… … …
https://manybutfinite.com/post/memory-translation-and-segmentation/
x86 Support for VM
• protectionmodel • segmentation
• Paging!
logical linear physical address address address
seg. unit
paging unit
x86 Paging
• Paging is the second level of indirection in x86
• Each page of a virtual address space is in one of three states
– doesn’t exist
– exists and is in physical memory
– exists, but is now on the disk rather than in memory
x86 Paging
• Wecanencodethesepossibilitiesasfollowsusing4B
• These4Barecalledapagetableentry(PTE);agroup of them is a page table
31 12 11
4kB-aligned address
1 0 1
leftovers for other uses 0
not present
31 bits to differentiate between blocks on disk & blocks that don’t exist
present in physical memory
31 10
Question
• If we use 4B for every 4kB, how big is the page table for a single virtual address space?
Question
• If we use 4B for every 4kB, how big is the page table for a single virtual address space?
• (4/4096)*232 =4MB
• too big…
• Solution?
x86 Paging
• Solution
– page the page table
– i.e., use a hierarchical structure
• The page table is just another 4kB page – it holds 4096 / 4 = 1024 PTEs
page directory
page table page
1
1
page table present bits
page present bits
x86 Paging
• What about the page directory?
– given
• 232 bytes total (32-bit address space)
• 210 PTEs per table • 212 bytes per page
– the page directory needs 232/(210 ́ 212) = 210 entries total
– which also fits in one page
(and could be paged out to disk, although Linux does not)
x86 Paging
31 22 21 12 11 0
page directory base register
(PDBR, usually called control register 3, or cr3)
directory #
table #
offset
page directory
page table
4kB page
1
1
0
1023
pointers to page tables (+ access controls); 4B total
pointers to pages (+ access controls); 4B total
…
x86 Paging
• To translate a virtual address into a physical address
– start with the PDBR (cr3)
– look up page directory entry (PDE) using the 10 MSb
of virtual address
– extract page table address from PDE
– look up page table entry (PTE) using next 10 bits of virtual address
– extract page address from PTE
– use last 12 bits of virtual address as offset into page
Review
Question
• What are the draw backs?
• Way too slow to do on every memory access! • Solution?
x86 Paging • Cache!
– Caveats?
• Hence the translation lookaside buffers (TLBs)
– keep translations of first 20 bits around and reuse them
– only walk tables when necessary (in x86, OS manages tables, but hardware walks them)
What Does This Do?
x86 Paging
• Remember the 11 free bits in the PTEs?
• What should we use them to do? – protect
– optimize to improve performance • Protect
– User/Supervisor (U/S) page or page table
• User means accessible to anyone (any privilege level) • Supervisor requires PL < 3 (i.e., MAX (CPL,RPL) < 3)
– Read-Only or Read/Write
x86 Paging • Optimize
– TLBs must be fast, so you can’t use many: • 386: 32 TLB entries
• Zen3:64ITLB/64DTLB
• : 64+16 ITLB, 64+32+4 DTLB
– nice if
• some translations are the same for all programs
• bigger translations could be used when possible (e.g., use one translation for 4MB rather than 1024 translations)
x86 Paging • x86 supports both
– G flag—global
• TLB not flushed when changing to new program or address
space (i.e., when cr3 changes)
• used for kernel pages (in Linux)
– 4MB pages
• skip the second level of translation
– indicated by PS (page size) bit in PDE
– PS=1 means that the PDE points directly to a 4MB page
• remaining 22 bits of virtual address used as offset
• Many Intel architectures provide separate TLBs for 4kB & 4MB translations
Which Pages Belong in Memory? (1/2)
• Cachephilosophy:recentusepredictsfutureuse
• Hardware provides Accessed bit to help OS determine recency-of-access
Which Pages Belong in Memory? (2/2)
• If a page changes after it is brought in from disk, must be written back to disk (Dirty bit)
TLBs in Multiprocessors
• TLB entries may be inconsistent with
updated page tables if the OS is not careful
Not Tested On Students
Image by pch.vector (freepik.com)