CS计算机代考程序代写 file system Excel algorithm single.dvi

single.dvi

21

Beyond Physical Memory: Mechanisms

Thus far, we’ve assumed that an address space is unrealistically small
and fits into physical memory. In fact, we’ve been assuming that every
address space of every running process fits into memory. We will now
relax these big assumptions, and assume that we wish to support many
concurrently-running large address spaces.

To do so, we require an additional level in the memory hierarchy.
Thus far, we have assumed that all pages reside in physical memory.
However, to support large address spaces, the OS will need a place to
stash away portions of address spaces that currently aren’t in great de-
mand. In general, the characteristics of such a location are that it should
have more capacity than memory; as a result, it is generally slower (if it
were faster, we would just use it as memory, no?). In modern systems,
this role is usually served by a hard disk drive. Thus, in our memory
hierarchy, big and slow hard drives sit at the bottom, with memory just
above. And thus we arrive at the crux of the problem:

THE CRUX: HOW TO GO BEYOND PHYSICAL MEMORY
How can the OS make use of a larger, slower device to transparently pro-
vide the illusion of a large virtual address space?

One question you might have: why do we want to support a single
large address space for a process? Once again, the answer is convenience
and ease of use. With a large address space, you don’t have to worry
about if there is room enough in memory for your program’s data struc-
tures; rather, you just write the program naturally, allocating memory as
needed. It is a powerful illusion that the OS provides, and makes your
life vastly simpler. You’re welcome! A contrast is found in older systems
that used memory overlays, which required programmers to manually
move pieces of code or data in and out of memory as they were needed
[D97]. Try imagining what this would be like: before calling a function or
accessing some data, you need to first arrange for the code or data to be
in memory; yuck!

1

2 BEYOND PHYSICAL MEMORY: MECHANISMS

ASIDE: STORAGE TECHNOLOGIES
We’ll delve much more deeply into how I/O devices actually work later
(see the chapter on I/O devices). So be patient! And of course the slower
device need not be a hard disk, but could be something more modern
such as a Flash-based SSD. We’ll talk about those things too. For now,
just assume we have a big and relatively-slow device which we can use
to help us build the illusion of a very large virtual memory, even bigger
than physical memory itself.

Beyond just a single process, the addition of swap space allows the OS
to support the illusion of a large virtual memory for multiple concurrently-
running processes. The invention of multiprogramming (running multi-
ple programs “at once”, to better utilize the machine) almost demanded
the ability to swap out some pages, as early machines clearly could not
hold all the pages needed by all processes at once. Thus, the combina-
tion of multiprogramming and ease-of-use leads us to want to support
using more memory than is physically available. It is something that all
modern VM systems do; it is now something we will learn more about.

21.1 Swap Space

The first thing we will need to do is to reserve some space on the disk
for moving pages back and forth. In operating systems, we generally refer
to such space as swap space, because we swap pages out of memory to it
and swap pages into memory from it. Thus, we will simply assume that
the OS can read from and write to the swap space, in page-sized units. To
do so, the OS will need to remember the disk address of a given page.

The size of the swap space is important, as ultimately it determines
the maximum number of memory pages that can be in use by a system at
a given time. Let us assume for simplicity that it is very large for now.

In the tiny example (Figure 21.1), you can see a little example of a 4-
page physical memory and an 8-page swap space. In the example, three
processes (Proc 0, Proc 1, and Proc 2) are actively sharing physical mem-
ory; each of the three, however, only have some of their valid pages in
memory, with the rest located in swap space on disk. A fourth process
(Proc 3) has all of its pages swapped out to disk, and thus clearly isn’t
currently running. One block of swap remains free. Even from this tiny
example, hopefully you can see how using swap space allows the system
to pretend that memory is larger than it actually is.

We should note that swap space is not the only on-disk location for
swapping traffic. For example, assume you are running a program binary
(e.g., ls, or your own compiled main program). The code pages from this
binary are initially found on disk, and when the program runs, they are
loaded into memory (either all at once when the program starts execution,

OPERATING
SYSTEMS
[VERSION 1.00] WWW.OSTEP.ORG

BEYOND PHYSICAL MEMORY: MECHANISMS 3

Physical
Memory

PFN 0

Proc 0
[VPN 0]

PFN 1

Proc 1
[VPN 2]

PFN 2

Proc 1
[VPN 3]

PFN 3

Proc 2
[VPN 0]

Swap
Space

Proc 0
[VPN 1]

Block 0

Proc 0
[VPN 2]

Block 1

[Free]

Block 2

Proc 1
[VPN 0]

Block 3

Proc 1
[VPN 1]

Block 4

Proc 3
[VPN 0]

Block 5

Proc 2
[VPN 1]

Block 6

Proc 3
[VPN 1]

Block 7

Figure 21.1: Physical Memory and Swap Space

or, as in modern systems, one page at a time when needed). However, if
the system needs to make room in physical memory for other needs, it
can safely re-use the memory space for these code pages, knowing that it
can later swap them in again from the on-disk binary in the file system.

21.2 The Present Bit

Now that we have some space on the disk, we need to add some ma-
chinery higher up in the system in order to support swapping pages to
and from the disk. Let us assume, for simplicity, that we have a system
with a hardware-managed TLB.

Recall first what happens on a memory reference. The running pro-
cess generates virtual memory references (for instruction fetches, or data
accesses), and, in this case, the hardware translates them into physical
addresses before fetching the desired data from memory.

Remember that the hardware first extracts the VPN from the virtual
address, checks the TLB for a match (a TLB hit), and if a hit, produces the
resulting physical address and fetches it from memory. This is hopefully
the common case, as it is fast (requiring no additional memory accesses).

If the VPN is not found in the TLB (i.e., a TLB miss), the hardware
locates the page table in memory (using the page table base register)
and looks up the page table entry (PTE) for this page using the VPN
as an index. If the page is valid and present in physical memory, the
hardware extracts the PFN from the PTE, installs it in the TLB, and retries
the instruction, this time generating a TLB hit; so far, so good.

If we wish to allow pages to be swapped to disk, however, we must
add even more machinery. Specifically, when the hardware looks in the
PTE, it may find that the page is not present in physical memory. The way
the hardware (or the OS, in a software-managed TLB approach) deter-
mines this is through a new piece of information in each page-table entry,
known as the present bit. If the present bit is set to one, it means the
page is present in physical memory and everything proceeds as above; if
it is set to zero, the page is not in memory but rather on disk somewhere.
The act of accessing a page that is not in physical memory is commonly
referred to as a page fault.

c© 2008–18, ARPACI-DUSSEAU
THREE

EASY
PIECES

4 BEYOND PHYSICAL MEMORY: MECHANISMS

ASIDE: SWAPPING TERMINOLOGY AND OTHER THINGS
Terminology in virtual memory systems can be a little confusing and vari-
able across machines and operating systems. For example, a page fault
more generally could refer to any reference to a page table that generates
a fault of some kind: this could include the type of fault we are discussing
here, i.e., a page-not-present fault, but sometimes can refer to illegal mem-
ory accesses. Indeed, it is odd that we call what is definitely a legal access
(to a page mapped into the virtual address space of a process, but simply
not in physical memory at the time) a “fault” at all; really, it should be
called a page miss. But often, when people say a program is “page fault-
ing”, they mean that it is accessing parts of its virtual address space that
the OS has swapped out to disk.

We suspect the reason that this behavior became known as a “fault” re-
lates to the machinery in the operating system to handle it. When some-
thing unusual happens, i.e., when something the hardware doesn’t know
how to handle occurs, the hardware simply transfers control to the OS,
hoping it can make things better. In this case, a page that a process wants
to access is missing from memory; the hardware does the only thing it
can, which is raise an exception, and the OS takes over from there. As
this is identical to what happens when a process does something illegal,
it is perhaps not surprising that we term the activity a “fault.”

Upon a page fault, the OS is invoked to service the page fault. A partic-
ular piece of code, known as a page-fault handler, runs, and must service
the page fault, as we now describe.

21.3 The Page Fault

Recall that with TLB misses, we have two types of systems: hardware-
managed TLBs (where the hardware looks in the page table to find the
desired translation) and software-managed TLBs (where the OS does). In
either type of system, if a page is not present, the OS is put in charge to
handle the page fault. The appropriately-named OS page-fault handler
runs to determine what to do. Virtually all systems handle page faults in
software; even with a hardware-managed TLB, the hardware trusts the
OS to manage this important duty.

If a page is not present and has been swapped to disk, the OS will need
to swap the page into memory in order to service the page fault. Thus, a
question arises: how will the OS know where to find the desired page? In
many systems, the page table is a natural place to store such information.
Thus, the OS could use the bits in the PTE normally used for data such as
the PFN of the page for a disk address. When the OS receives a page fault
for a page, it looks in the PTE to find the address, and issues the request
to disk to fetch the page into memory.

OPERATING
SYSTEMS
[VERSION 1.00] WWW.OSTEP.ORG

BEYOND PHYSICAL MEMORY: MECHANISMS 5

ASIDE: WHY HARDWARE DOESN’T HANDLE PAGE FAULTS
We know from our experience with the TLB that hardware designers are
loathe to trust the OS to do much of anything. So why do they trust the
OS to handle a page fault? There are a few main reasons. First, page
faults to disk are slow; even if the OS takes a long time to handle a fault,
executing tons of instructions, the disk operation itself is traditionally so
slow that the extra overheads of running software are minimal. Second,
to be able to handle a page fault, the hardware would have to understand
swap space, how to issue I/Os to the disk, and a lot of other details which
it currently doesn’t know much about. Thus, for both reasons of perfor-
mance and simplicity, the OS handles page faults, and even hardware
types can be happy.

When the disk I/O completes, the OS will then update the page table
to mark the page as present, update the PFN field of the page-table entry
(PTE) to record the in-memory location of the newly-fetched page, and
retry the instruction. This next attempt may generate a TLB miss, which
would then be serviced and update the TLB with the translation (one
could alternately update the TLB when servicing the page fault to avoid
this step). Finally, a last restart would find the translation in the TLB and
thus proceed to fetch the desired data or instruction from memory at the
translated physical address.

Note that while the I/O is in flight, the process will be in the blocked
state. Thus, the OS will be free to run other ready processes while the
page fault is being serviced. Because I/O is expensive, this overlap of
the I/O (page fault) of one process and the execution of another is yet
another way a multiprogrammed system can make the most effective use
of its hardware.

21.4 What If Memory Is Full?

In the process described above, you may notice that we assumed there
is plenty of free memory in which to page in a page from swap space.
Of course, this may not be the case; memory may be full (or close to it).
Thus, the OS might like to first page out one or more pages to make room
for the new page(s) the OS is about to bring in. The process of picking a
page to kick out, or replace is known as the page-replacement policy.

As it turns out, a lot of thought has been put into creating a good page-
replacement policy, as kicking out the wrong page can exact a great cost
on program performance. Making the wrong decision can cause a pro-
gram to run at disk-like speeds instead of memory-like speeds; in cur-
rent technology that means a program could run 10,000 or 100,000 times
slower. Thus, such a policy is something we should study in some detail;
indeed, that is exactly what we will do in the next chapter. For now, it is
good enough to understand that such a policy exists, built on top of the
mechanisms described here.

c© 2008–18, ARPACI-DUSSEAU
THREE

EASY
PIECES

6 BEYOND PHYSICAL MEMORY: MECHANISMS

1 VPN = (VirtualAddress & VPN_MASK) >> SHIFT

2 (Success, TlbEntry) = TLB_Lookup(VPN)

3 if (Success == True) // TLB Hit

4 if (CanAccess(TlbEntry.ProtectBits) == True)

5 Offset = VirtualAddress & OFFSET_MASK

6 PhysAddr = (TlbEntry.PFN << SHIFT) | Offset 7 Register = AccessMemory(PhysAddr) 8 else 9 RaiseException(PROTECTION_FAULT) 10 else // TLB Miss 11 PTEAddr = PTBR + (VPN * sizeof(PTE)) 12 PTE = AccessMemory(PTEAddr) 13 if (PTE.Valid == False) 14 RaiseException(SEGMENTATION_FAULT) 15 else 16 if (CanAccess(PTE.ProtectBits) == False) 17 RaiseException(PROTECTION_FAULT) 18 else if (PTE.Present == True) 19 // assuming hardware-managed TLB 20 TLB_Insert(VPN, PTE.PFN, PTE.ProtectBits) 21 RetryInstruction() 22 else if (PTE.Present == False) 23 RaiseException(PAGE_FAULT) Figure 21.2: Page-Fault Control Flow Algorithm (Hardware) 21.5 Page Fault Control Flow With all of this knowledge in place, we can now roughly sketch the complete control flow of memory access. In other words, when some- body asks you “what happens when a program fetches some data from memory?”, you should have a pretty good idea of all the different pos- sibilities. See the control flow in Figures 21.2 and 21.3 for more details; the first figure shows what the hardware does during translation, and the second what the OS does upon a page fault. From the hardware control flow diagram in Figure 21.2, notice that there are now three important cases to understand when a TLB miss oc- curs. First, that the page was both present and valid (Lines 18–21); in this case, the TLB miss handler can simply grab the PFN from the PTE, retry the instruction (this time resulting in a TLB hit), and thus continue as described (many times) before. In the second case (Lines 22–23), the page fault handler must be run; although this was a legitimate page for the process to access (it is valid, after all), it is not present in physical memory. Third (and finally), the access could be to an invalid page, due for example to a bug in the program (Lines 13–14). In this case, no other bits in the PTE really matter; the hardware traps this invalid access, and the OS trap handler runs, likely terminating the offending process. From the software control flow in Figure 21.3, we can see what the OS roughly must do in order to service the page fault. First, the OS must find a physical frame for the soon-to-be-faulted-in page to reside within; if there is no such page, we’ll have to wait for the replacement algorithm to run and kick some pages out of memory, thus freeing them for use here. OPERATING SYSTEMS [VERSION 1.00] WWW.OSTEP.ORG BEYOND PHYSICAL MEMORY: MECHANISMS 7 1 PFN = FindFreePhysicalPage() 2 if (PFN == -1) // no free page found 3 PFN = EvictPage() // run replacement algorithm 4 DiskRead(PTE.DiskAddr, PFN) // sleep (waiting for I/O) 5 PTE.present = True // update page table with present 6 PTE.PFN = PFN // bit and translation (PFN) 7 RetryInstruction() // retry instruction Figure 21.3: Page-Fault Control Flow Algorithm (Software) With a physical frame in hand, the handler then issues the I/O request to read in the page from swap space. Finally, when that slow operation completes, the OS updates the page table and retries the instruction. The retry will result in a TLB miss, and then, upon another retry, a TLB hit, at which point the hardware will be able to access the desired item. 21.6 When Replacements Really Occur Thus far, the way we’ve described how replacements occur assumes that the OS waits until memory is entirely full, and only then replaces (evicts) a page to make room for some other page. As you can imagine, this is a little bit unrealistic, and there are many reasons for the OS to keep a small portion of memory free more proactively. To keep a small amount of memory free, most operating systems thus have some kind of high watermark (HW ) and low watermark (LW ) to help decide when to start evicting pages from memory. How this works is as follows: when the OS notices that there are fewer than LW pages avail- able, a background thread that is responsible for freeing memory runs. The thread evicts pages until there are HW pages available. The back- ground thread, sometimes called the swap daemon or page daemon1, then goes to sleep, happy that it has freed some memory for running pro- cesses and the OS to use. By performing a number of replacements at once, new performance optimizations become possible. For example, many systems will cluster or group a number of pages and write them out at once to the swap parti- tion, thus increasing the efficiency of the disk [LL82]; as we will see later when we discuss disks in more detail, such clustering reduces seek and rotational overheads of a disk and thus increases performance noticeably. To work with the background paging thread, the control flow in Figure 21.3 should be modified slightly; instead of performing a replacement directly, the algorithm would instead simply check if there are any free pages available. If not, it would inform the background paging thread that free pages are needed; when the thread frees up some pages, it would re-awaken the original thread, which could then page in the desired page and go about its work. 1The word “daemon”, usually pronounced “demon”, is an old term for a background thread or process that does something useful. Turns out (once again!) that the source of the term is Multics [CS94]. c© 2008–18, ARPACI-DUSSEAU THREE EASY PIECES 8 BEYOND PHYSICAL MEMORY: MECHANISMS TIP: DO WORK IN THE BACKGROUND When you have some work to do, it is often a good idea to do it in the background to increase efficiency and to allow for grouping of opera- tions. Operating systems often do work in the background; for example, many systems buffer file writes in memory before actually writing the data to disk. Doing so has many possible benefits: increased disk effi- ciency, as the disk may now receive many writes at once and thus better be able to schedule them; improved latency of writes, as the application thinks the writes completed quite quickly; the possibility of work reduc- tion, as the writes may need never to go to disk (i.e., if the file is deleted); and better use of idle time, as the background work may possibly be done when the system is otherwise idle, thus better utilizing the hard- ware [G+95]. 21.7 Summary In this brief chapter, we have introduced the notion of accessing more memory than is physically present within a system. To do so requires more complexity in page-table structures, as a present bit (of some kind) must be included to tell us whether the page is present in memory or not. When not, the operating system page-fault handler runs to service the page fault, and thus arranges for the transfer of the desired page from disk to memory, perhaps first replacing some pages in memory to make room for those soon to be swapped in. Recall, importantly (and amazingly!), that these actions all take place transparently to the process. As far as the process is concerned, it is just accessing its own private, contiguous virtual memory. Behind the scenes, pages are placed in arbitrary (non-contiguous) locations in physical mem- ory, and sometimes they are not even present in memory, requiring a fetch from disk. While we hope that in the common case a memory access is fast, in some cases it will take multiple disk operations to service it; some- thing as simple as performing a single instruction can, in the worst case, take many milliseconds to complete. OPERATING SYSTEMS [VERSION 1.00] WWW.OSTEP.ORG BEYOND PHYSICAL MEMORY: MECHANISMS 9 References [CS94] “Take Our Word For It” by F. Corbato, R. Steinberg. www.takeourword.com/TOW146 (Page 4). Richard Steinberg writes: “Someone has asked me the origin of the word daemon as it applies to computing. Best I can tell based on my research, the word was first used by people on your team at Project MAC using the IBM 7094 in 1963.” Professor Corbato replies: “Our use of the word daemon was inspired by the Maxwell’s daemon of physics and thermodynamics (my background is in physics). Maxwell’s daemon was an imaginary agent which helped sort molecules of different speeds and worked tirelessly in the background. We fancifully began to use the word daemon to describe background pro- cesses which worked tirelessly to perform system chores.” [D97] “Before Memory Was Virtual” by Peter Denning. In the Beginning: Recollections of Software Pioneers, Wiley, November 1997. An excellent historical piece by one of the pioneers of virtual memory and working sets. [G+95] “Idleness is not sloth” by Richard Golding, Peter Bosch, Carl Staelin, Tim Sullivan, John Wilkes. USENIX ATC ’95, New Orleans, Louisiana. A fun and easy-to-read discussion of how idle time can be better used in systems, with lots of good examples. [LL82] “Virtual Memory Management in the VAX/VMS Operating System” by Hank Levy, P. Lipman. IEEE Computer, Vol. 15, No. 3, March 1982. Not the first place where page clustering was used, but a clear and simple explanation of how such a mechanism works. We sure cite this paper a lot! c© 2008–18, ARPACI-DUSSEAU THREE EASY PIECES 10 BEYOND PHYSICAL MEMORY: MECHANISMS Homework (Measurement) This homework introduces you to a new tool, vmstat, and how it can be used to understand memory, CPU, and I/O usage. Read the associ- ated README and examine the code in mem.c before proceeding to the exercises and questions below. Questions 1. First, open two separate terminal connections to the same machine, so that you can easily run something in one window and the other. Now, in one window, run vmstat 1, which shows statistics about machine usage every second. Read the man page, the associated README, and any other information you need so that you can understand its output. Leave this window running vmstat for the rest of the exercises below. Now, we will run the program mem.c but with very little memory usage. This can be accomplished by typing ./mem 1 (which uses only 1 MB of memory). How do the CPU usage statistics change when running mem? Do the numbers in the user time column make sense? How does this change when running more than one instance of mem at once? 2. Let’s now start looking at some of the memory statistics while running mem. We’ll focus on two columns: swpd (the amount of virtual memory used) and free (the amount of idle memory). Run ./mem 1024 (which allocates 1024 MB) and watch how these values change. Then kill the running program (by typing control-c) and watch again how the values change. What do you notice about the values? In particular, how does the free column change when the program exits? Does the amount of free memory increase by the expected amount when mem exits? 3. We’ll next look at the swap columns (si and so), which indicate how much swapping is taking place to and from the disk. Of course, to activate these, you’ll need to run mem with large amounts of memory. First, examine how much free memory is on your Linux system (for example, by typing cat /proc/meminfo; type man proc for details on the /proc file system and the types of information you can find there). One of the first entries in /proc/meminfo is the total amount of memory in your system. Let’s as- sume it’s something like 8 GB of memory; if so, start by running mem 4000 (about 4 GB) and watching the swap in/out columns. Do they ever give non-zero values? Then, try with 5000, 6000, etc. What happens to these values as the program enters the second loop (and beyond), as compared to the first loop? How much data (total) are swapped in and out during the second, third, and subsequent loops? (do the numbers make sense?) 4. Do the same experiments as above, but now watch the other statistics (such as CPU utilization, and block I/O statistics). How do they change when mem is running? 5. Now let’s examine performance. Pick an input for mem that comfortably fits in memory (say 4000 if the amount of memory on the system is 8 GB). How long does loop 0 take (and subsequent loops 1, 2, etc.)? Now pick a size comfortably beyond the size of memory (say 12000 again assuming 8 GB of OPERATING SYSTEMS [VERSION 1.00] WWW.OSTEP.ORG BEYOND PHYSICAL MEMORY: MECHANISMS 11 memory). How long do the loops take here? How do the bandwidth num- bers compare? How different is performance when constantly swapping versus fitting everything comfortably in memory? Can you make a graph, with the size of memory used by mem on the x-axis, and the bandwidth of accessing said memory on the y-axis? Finally, how does the performance of the first loop compare to that of subsequent loops, for both the case where everything fits in memory and where it doesn’t? 6. Swap space isn’t infinite. You can use the tool swapon with the -s flag to see how much swap space is available. What happens if you try to run mem with increasingly large values, beyond what seems to be available in swap? At what point does the memory allocation fail? 7. Finally, if you’re advanced, you can configure your system to use different swap devices using swapon and swapoff. Read the man pages for details. If you have access to different hardware, see how the performance of swap- ping changes when swapping to a classic hard drive, a flash-based SSD, and even a RAID array. How much can swapping performance be improved via newer devices? How close can you get to in-memory performance? c© 2008–18, ARPACI-DUSSEAU THREE EASY PIECES