CS代写 !iili!i ̧!i

!iili!i ̧!i
Machine-Independent Virtual Memory Management for Paged Uniprocessor and Multiprocessor Architectures
, , , , , , , and
This paper describes the design and implementation of vir- tual memory management within the CMU Mach Operating System and the experiences gained by the Mach kernel group in porting that system to a variety of architectures. As of this writing, Maeh runs on more than half a dozen uniprocessors and multiprocessors including the VAX family of uniproces- sors and multiprocessors, the IBM RT PC, the SUN 3, the Encore MultiMax, the Sequent Balance 21000 and several experimental computers. Although these systems vary con- siderably in the kind of hardware support for memory management they provide, the machine-dependent portion of Mach virtual memory consists of a single code module and its related header file. This separation of software memory management from hardware support has been accomplished without sacrificing system performance. In addition to im- proving portability, it makes possible a relatively unbiased examination of the pros and cons of various hardware memory management schemes, especially as they apply to the support of multiprocessors.

Copyright By PowCoder代写 加微信 powcoder

1. Introduction
While software designers are increasingly able to cope with
variations in instruction set architectures, operating system portability continues to suffer from a proliferation of memory structures. UNIX systems have traditionally addressed the problem of VM portability by restricting the facilities provided and basing implementations for new memory management architectures on versions already done for pre- vious systems. As a result, existing versions of UNIX, such as Berkeley 4.3bsd, offer little in the way of virtual memory management other than simple paging support. Versions of Berkeley UNIX on non-VAX hardware, such as SunOS on the SUN 3 and ACIS 4.2 on the IBM RT PC, actually simu- late internally the VAX memory mapping architecture — in effect treating it as a machine-independent memory manage- ment specification.
This xeseareh was sponsored by the Defense Advanced Research l:Xrojeels Agency (DOD), ARPA Order No. 4864, monitored by the Space and Naval Warfare Systems Command under contraeet N00039-85-C-1034.
Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and Its date appear, and notice is given that copying Is by permission of the Association of ComputingMachinery. To copy otherwise, or to republish, requires a fee and/or specific permission.
Over the last two years CMU has been engaged in the development of a portable, multiprocessor operating system called Mach. One of the goals of Mach has been to explore the relationship between hardware and software memory ar- chitectures and to design a memory management system that would be readily portable to multiprocessor computing en- gines as well as traditional uniprocessors.
Mach provides complete UNIX 4.3bsd compatibility while significantly extending UNIX notions of virtual memory management and inteerprocess communication [1]. Mach sup- ports:
* large, sparse virtual address spaces,
• copy-on-write virtual copy operations,
• copy-on-write and read-write memory sharing between tasks,
• memory mappedfiles and
n user-provided backing store objects andpagers.
This has been accomplished without patterning Mach’s in- ternal memory representation after any specific architecture. In fact, Math makes relatively few assumptions about avail- able memory management hardware. The primary require- ment is an ability to handle and recover from page faults (for some arbitrary page size).
As of this writing, Math runs on more than half a dozen uniprocessors and multiprocessors including the entire VAX family of uniprocessors and mulfiprocessors, the IBM RT PC, the SUN 3, the Encore MultiMax and the Sequent Balance 21000. Implementations are in progress for several ex- perimentalcomputers. Despitedifferencesbetweensupported architectures, the machine-dependent portion of Mach’s vir- tual memory subsystem consists of a single code module and its related header file. All information important to the management of Mach virtual memory is maintained in machine-independent data structures and machine-dependent data structures contain only those mappings necessary to run- ning the current mix of programs.
Mach’s separation of software memory management from hardware support has been accomplished without sacrificing system performance. In several eases overall system perfor- mance has measurably improved over existing UNIX im- plementations. Moreover, this approach makes possible a
© 1987 ACM 0-89791-238-1/87/1000-0031 $00.75
Department of Computer Science University Pittsburgh, Pennsylvania 15213

relatively unbiased examination of the pros and cons of various hardware memory management schemes, especially as they apply to the support of multiprocessors. This paper describes the design and implementation of virtual memory management within the CMU Mach Operating System and the experiences gained by the Mach kernel group in porting that system to a variety of arChitectures.
There are five basic Mach abstractions:
1. A task is an execution environment in which threads may run. It is the basic unit of resource allocation. A task includes a paged virtual ad- dress space and protected access to system resources (such as processors, port capabilities and virtual memory). A task address space con- sists of an ordered collection of mappings to memory objects (see below). The UNIX notion of a process is, in Mach, represented by a task with a single thread of control.
2. A thread is the basic unit of CPU utilization. It is roughly equivalent to an independent program counter operating within a task. All threads within a task share access to all task resources.
3. A port is a communicationchannel — logically a queue for messages protected by the kernel. Ports are the reference objects of the Mach design. They are used in much the same way that object references could be used in an object oriented system. Send and Receive are the fun- damental primitive operations on ports.
4. A message is a typed collection of data objects used in conmmnication between threads. Mes- sages may be of any size and may contain pointers and typed capabilities for ports.
5. A memory object is collection of data provided and managed by a server which can be mapped into the address space of a task.
Operations on objects other than messages are performed by sending messages to ports. In this way, Mach permits system servicesandresourcestobemanagedbyuser-statetasks. For example, the Mach kernel itself can be considered a task with multiple threads of control. The kernel task acts as a server which in turn implements tasks, threads and memory objects. The act of creating a task, a thread or a memory object, returns access rights to a port which represents the new object and can be used to manipulate it. Incoming messages on such a port results in an operation performed on the object it represents.
The indirection provided by message passing allows objects to be arbitrarily placed in the network (either within a mul- tiprocessor or a workstation) without regard to programming details. For example, a thread can suspend another thread by sending a suspend message to that thread’s threadport even if the requesting thread is on another node in a network. It is thus possible to run varying system configurations on dif-
ferent classes of machines while providing a consistent inter- face to all resources. The actual system running on any particular machine is thus more a function of its servers than its kernel.
Traditionally, message based systems of this sort have operated at a distinct performance disadvantage to conven- tionally implemented operating systems. The key to efficiency in Mach is the notion that virtual memory management can be integrated with a message-oriented communication facility. This integration allows large amounts of data including whole files and even whole address spaces to be sent in a single message with the efficiency of simple memory remapping.
2.1. Basle VM Operations
Each Mach task possesses a large address space that consists
of a series of mappings between ranges of memory addres- sible to the task and memory objects. The size of a Mach address space is limited only by the addressing restrictions of the underlying hardware. An RT PC task, for example, can address a full 4 gigabytes of memory under MachI while the VAX architecture allows at most 2 gigabytes of user address space. A task can modify its address space in several ways, including:
allocate a region of virtual memory on a page boundary,
®deallocate a region ofvirtucd memory,
• set the protection status of a region of virtual memory,
• specify the inheritance of a region of virtual memory and
®create and manage a memory object that can then be mapped into the address space of another task.
The only restriction imposed by Mach on the nature of the regions that may be specified for virtual memory operations is that they must be aligned on system page boundaries. The definition of page size is a boot time system parameter and can be any power of two multiple of the hardware page size. Table 2-1 fists the set of virtual memory operations that can be performed on a task.
Both copy-on-write and read/write sharing of memory are permitted between Mach tasks. Copy-on-write sharing be- tween unrelated tasks is typically the result of large message transfers. An entire address space may be sent in a single message with no actual data copy operations performed. Read/write shared memory can be created by allocating a memory region and setting its inheritance attribute. Sub- sequently created child tasks share the memory of their parent according to its inheritance value. Inheritance may be specified as shared, copy or none, and may be specified on a per-page basis. Pages specified as shared, are shared for read and write. Pages marked as copy are logically copied by value, although for efficiency copy-on-write techniques are
1 T h i s f e a t u r e i s a c t i v e l y u s e d a t C M U b y t h e C M U R T i m p l e m e n t a t i o no f CommonLisp.

lcyii!i: ̧i
employed. An inheritance specification of none signifies that a page is not to be passed to a child. In this case, the child’s corresponding address is left unallocated.
Like inheritance, protection is specified on a per-page basis. For each group of pages there exist two protection values: the current and the maximum protection. The current protection controls actual hardware permissions. The maximum protec- tion specifies the maximum value that the current protection may take. While the maximum protection can never be raised, it may be lowered. If the maximum protection is lowered to a level below the current protection, the current protection is also lowered to that level. Each protection is implemented as a combination of read, write and execute permissions. Enforcement of access permissions depends on hardware support. For example, many machines do not allow for explicit execute permissions, but those that do will have that protection properly enforced.
Virtual Memory Operations vm allerate(target_tstsk,addr ess~lze,anywhere)
Allocate andfill with zeros new virtual memory either anywhere or at a ~pecifiedaddress.
vm_copy(targettask,source addre.~,cotmt,dezt_addre~)
Virtnolly copy a range o/memory from one address to another.
vm~leallocate(target t~k,addre~lze)
Dea//ocatea rangeofaddresses,i.e.~ them no longerwdld. vm_lnherlt(target..task,address,dze,new_lnherltanee)
Set the inheritance at~rlbuteofan addre~ range. vm proteet(,~rget task,addres~dze,set maximurn,new protection) Settheprotection attributeofanaddressrange.
vm_read0arget~sk ,addres,s,sl~,data,dala_eount)
Readtl~ contentsofa regionofa task’saddressspace.
vra_reglons(larget ta.¢k,addressoIze,lements,elements2munt)
R etwvn description o f specO%d region o f tazk”s address space. vm_statlstlcs(target task,vm stats)
Retwpnstatistics about the use o f memory by targetJask.
vm_wrlte(target task,nddress,eotmt,data,dala count)
Write the contents of a region ofa task’s address ~pace. Table 2-1:
All VM operations apply to a target task (represented by a port) and all I~ut vm statmties specify aft-address and size in bytes. anywh,ere is a booloan which indieates whether or not avm allocate ~lloeates meanery anywhere or at a location specified by address.
Mach’s implementationof UNIX fork is an example of how its virtual memory operations can be used. When a fork operation is invoked, the newly created child task address map is created based on the parent’s inheritance values. By default, all inheritance values for an address space are set to copy. Thus the child’s address space is, by default, a copy- on-write copy of the parent’s and UNIX address space copy semantics are preserved.
One of the more unusual features of Mach is that fact that virtual memory related functions, such as pagein and pageout, can be performed directly by user-state tasks for memory objects they create. Section 3.3 describes this aspect of the system.
3. The Implementation of
Four basic memory management data structures are used in
1. the resident page table —
a table used to keep track of information about machine independentpages,
2. the address map —
a doubly linked list of map entries, each of which describes a mapping from a range of ad- dresses to a region of a memory object,
3. the memory object —
a unit of backing storage managed by the kernel or a user task and
4.thepmap —
a machine dependent memory mapping data structure (i.e., a hardware defined physical ad- dress map).
The implementation is split between machine independent and machine dependent sections. Machine dependent code implements only those operations necessary to create, update and manage the hardware required mapping data structures. All important virtual memory information is maintained by machine independent code. In general, the machine depend- ent part of Mach maintains only those mappings which are crucial to system execution (e.g., the kernel map and the mappings for frequently referenced task addressees) and may garbage collect non-important mapping information to save space or time. It has no knowledge of machine independent data structures and is not required to maintain full knowledge of valid mappings from virtual addresses to hardware pages.
3.1.Managing Resident Memory
Physical memory in Mach is treated primarily as a cache for
the contents of virtual memory objects. Information about physical pages (e.g., modified and reference bits) is main- tained in page entries in a table indexed by physical page number. Each page entry may simultaneously be linked into several lists:
• a memory object list.
• a memory allocation queue and
• a object~offset hash bucket.
All the page entries associated with a given object are linked together in a memory object list to speed-up object dealloca- tion and virtual copy operations. Memory object semantics permit each page to belong to at most one memory object. Allocation queues are maintained for free, reclaimable and allocated pages and are used by the Mach paging daemon. Fast lookup of a physical page associated with an object/offset at the time of a page fault is performed using a bucket hash table keyed by memory object and byte offset.
Byte offsets in memory objects are used throughout the system to avoid linking the implementation to a particular notionofphysicalpagesize. AMachphysicalpagedoesnot, in fact, correspond to a page as defined by the memory mapping hardware of a particular computer. The size of a Mach page is a boot time system parameter. It relates to the physical page size only in that it must be a power of two

multiple of the machine dependent size. For example, Mach page sizes for a VAX can be 512 bytes, 1K bytes, 2K bytes, 4K bytes, etc. Mach page sizes for a SUN 3, however, are limited to 8K bytes, 16K bytes, etc. The physical page size used in Mach is also independent of the page size used by memory object handlers (see section below).
3.2. Address Maps
Just as the kernel keeps track of its own physical address
space, it must also manage its virtual address space and that of each task. Addresses within a task address space are mapped to byte offsets in memory objects by a data structure called an address map.
An address map is a doubly linked list of address map entries each of which maps a contiguous range of virtual addresses onto a contiguous area of a memory object. This linked list is sorted ixt order of ascending virtual address and different entries may not map overlapping regions of memory.
Each address map entry carries with it information about the inheritance and protection attributes of the region of memory it defines. For that reason, all addresses within a range mapped by an entry must have the same attributes. This can force the system to allocate two address map entries that map adjacent memory regions to the same memory object simply because the properties of the two regions are different.
This address map data structure was chosen over many alter- natives because it was the simplest that could efficiently im- plement the most frequent operations performed on a task address space, namely:
®page fault lookups,
• copy/protection operations on address ranges and
allocationldeallocation of address ranges.
A sorted linked list allows operations on ranges of addresses (e.g., copy-on-write copy operations) to be done simply and quickly and does not penalize large, sparse address spaces. Moreover, fast lookup on faults can be achieved by keeping last fault “hints”. These hints allow the address map list to be searched from the last entry found for a fault of a particular type. Because each entry may map a large region of virtual addresses, an address map is typically small. A typical VAX UNIX process has five mapping entries upon creation – one for its UNIX u-area and one each for code, stack, initialized and uninitializeddata.
3.3. Memory Objects
A Mach address map need not keep track of backing storage
because all backing store is implemented by Mach memory objects. Logically, a virtual memory object is a repository for data, indexed by byte, upon which various operations (e.g., read and write) can be performed. In many respects it resembles a UNIX file.
A reference counter is maintained for each memory object. This counter allows the object to be garbage collected when all mapped references to it are removed. In some cases, for example UNIX text segments or other frequently used files, it is desirable for the kernel to retain information about an object even after the last mapping reference disappears. By
retaining the physical page mappings for such objects sub- sequent reuse can be made very inexpensive. Mach maintains an cache of such frequently used memory objects. A pager may use domain specific knowledge to request that an object be kept in this cache after it is no longer referenced.
An important feature of Mach’s virtual memory is the ability to handle page faults and page-out requests outside of the kernel. This is accomplished by associating with each memory object a managing task (called a pager). For ex- ample, to implement a memory mapped file, virtual memory is created with its pager specified as the file system. When a page fault occurs, the kernel will translate the fault into a request for data from the file system.
Access to a pager is represented by a port (called the paging_object port) to which the kernel can send messages requestin

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com