CS代写 SL 246

Report No. STAN-B-83-973
Also twtdered C¡¯SL 246
The Distributed V Kernel and Its Performance for Diskless Workstations
. Cheriton and Wily Zwaenepoel

Copyright By PowCoder代写 加微信 powcoder

Department of Computer Science
Stanford University Stanford, CA 94305

The Distributed V Kernel
and its Performance for Diskless Workstations
. Cheriton and
Computer Systems Laboratory
Departments of Computer¡¯ Science and Electrical Engineering Stanford University
The distributed V kernel is a message-oriented kernel that provides uniform local and network interprocess communication. It is primarily being used in an environment of diskless workstations connected by a high-speed local network to a set of file servers. We describe a performance evaluation of the kernel, with particular emphasis on the cost of network file access. Our results show that over a local network:
1. Diskless workstations can access remote files with minimal performance penalty.
2. The V message facility can be used to access remote files at comparable cost to any well-tuned specialized file access protocol.
We conclude that it is feasible to build a distributed system with all network communication using the V message facility even when most of the network nodes have no secondary storage.
I. Introductkn
The distributed V kernel is a message-oriented kernel that provides uniform local and network inter-process communication. The kernel interface is modeled after the ¡®Thoth [3,5] and Verex (4, S] kernels with some modifications to facilitate efficient local network operation. It is in active use at Stanford and at other research and commercial establishments. The system is implemented on a collection of MC68OO&based SUN workstations [2] interconnected by a 3 Mb Ethernet [9] or 10 Mb
This work was sponsomd in pm by the Defense Advanmd Rcseatch Projects @IICy under contracts MDA903-8OC4102 and NKU39-83-K0431.
Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and
its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise,
Ethernet [7]. Network interprocess communication is predominantly used for remote file access since most SUN workstations at Stanford are configured without a local disk
This paper reports our experience with the implementation and use of the V kernel. Of particular interest are the controversial aspects of our approach, namely:
1. The use of diskless workstations with all secondary storage provided by backend file servers.
2. The use of a general purpose network interprocess communication facility (as opposed to special-purpose file access protocols) and, in particular, the use of a Thoth-like inter-processcommunicationmechanism.
The more conventional approach is to configure workstations with a small local disk, using network-based file servers for archival storage. Diskless workstations, however, have a number of advantages, including:
1. Lower hardware cost per workstation.
2. Simpler maintenance and economies of scale with shared file servers.
3. Little or no memory or processing ovcrhcad on the workstation for file system and disk handling.
4. Fewer probiems with replication, consistency and distribution of files.
The major disadvantage is the overhead of performing all file access over the network. One might therefore expect that we use a carefully tuned specialized tile-access protocol integrated into the transport protocol layer.as done in LOCUS [ll). Instead, our file access is built on top of a general-purpose interprocess communication (IPC) facility that scrvcs as lhc transport layer. While this approach has the advantage of supporting a variety of different types of network communication, its generality has the potential of introducing a significant performance penalty over the ¡°problem-oriented¡± approach used in LOCUS.
Furthermore. because sequential file access is so common, it is convcnlional to use streaming protocols to minimize the effect of network latency on performance. Instead. WC adopted a synchronous ¡°request-response¡± model of mcs.sagc-passing and data transfer which, while simple and efficient to implement as
or to republish, requires a fee specific permission.

well as relatively easy to use, does not suppon application-level use of streaming.
These potential problems prompted a performance evaluation of our methods, with particular emphasis on the cfliciency of file access. This emphasis on file access diztin:uishes our work from similar studies [IO, 131. The results of our stud! strongly support the idea of building a distributed vstem using diskless workstations connected by a high-speed local network to one or more file servers. Furthermore, we show that remote file access using the \¡¯ kernel IPC facility is on]! slightly more expensive than a lower bound imposed by the basic mst of network communication. From this we conclude that relativelylittle improvement in performance can be achieved using protocols further specialized to file access.
2. V Kernel Interprocess Communication
The basic model provided by the I¡¯kernel is hat of many small processes communicating by messages. A process is identified by a 32-bit globally unique process idcrzri/ier or pid Communication between processes is protided in the form of short fixed-length messages, each uith an associated reply message, plus a data transfer operation for maying larger arnounE of data between processes. In particular, all messages are a fixed 32 bytes in length.
The common communicaLion scenario is as follows: A cfienr process executes a Send to a seTseT process \vh;ch then completes execution of a Receive to receive the missage and eventually executes a Reply 10 respond uith a rep!! message back to the chent We refer to thus sequence as a mcssrge exchange. lhe recciier ma! execute one or more :\lo\*eTu or .IloveFrom data transfer operations beliveen the ume the message 1s received and the Ume the reply message is sent.
The following sections describe the primitives relevant to this paper. The interested reader is referred to the V kernel manual [6] for a complete description of the kernel facilities.
2.1. Primitives
Scnd( message, pid )
Send the 32-b\te message specified b! message to the process specified by prd Ihe sender blocks until the recciler has rcceiied the message and has sent back a
. 32-byte rep11 using Reply. The rep]\ message overwnlcs the onglnal mcssagc area
Using the kernel mc4tigc format con\entions, a process sJ)cciJics in the nlejtige the scgmcnr of its address space t!ar the mesG;e recipient may access and uhether the reciJlient may read or write that segment- A segment is specified b! the last two words of a mehsase. gilin its Stan addre% and its length respecti\el>. Resened fins bl& aL the &sinning of the message lndlcare uhcther a ScgmcnI is specified and if so, 11s access permissions.
pid = Recci\ c( message )
Hock the Invoking process. if ntycLqn. to receive a 32-b!tc mc¡¯+a;c in 11s mesage VccIor. ~Icssages arc queued In ~rst-cornc-first-senc¡¯d (J-Cl-S) order until
( pid, count ) = RcceivcWithScgmcnt( message, scgptr, scgsize ) Block the inI oking process to receli e a message as with Receive except, if a segment is spccit¡¯icd in the message with read access, up to the first segsize bl tcs of the segment may bc transferred to the array starting at segptr. with counr specifying the actual number of bytesreceived.
Reply( message, pid )
Send a 32-byte reply cbntained in the message buffer to the specified process providing it is awaiting a reply from the replier. The sending process is readied upon receiving the reply; the replying process does not block.
ReplyWithSegmcnt(message,pid.destptr,segptr,segsize)
Send a reply message as done by Reply but also transmit the short segment specified by segprr and segsize to desfptr in the destination process¡¯ address space.
MoveFrom( srcpid, dest, src, count )
Copy counf bytes from the segment starting at src in the address space of srcpid to the segment starting at desr in the active process¡¯s space. The srcpid must be awaiLing rcpl) from the active process and must have provided read access to the segment of memory in its address space using the message con\entions described under Send
MoveTo( destpid, dest, src, count )
Copy counr bytes from the segment starting at src in the active process¡¯s space to the segment starting at desr in the address space of the dcsrpid procebs. The desrpid must be auairlng reply from the active process and must ha\e provided wntc access to the segment of mcmop in i& address space using the message conventions described under Send
SetPid( logicalid, pid, scope )
Associate pid with the specified logicalid in the specifiedscope, which is one of local. remote or both. Example loglcalid¡¯s are fileserver, nameserver, etc.
pid = GctPid( logicalid. scope )
Return the process idcntificr associated with (ogicalid in the specified scope if any, else 0.
2.2. Discussion
The V kernel¡¯s interproccss communication is modeled after that of the Thoth and Verex kernels, which have been used in multi- user systems and real-time applications for several years. An extensive discussion of this design and its motivations is available [5], although mainly in the scope of a single machine system. We summarize here the highlights of the discussion.
1. Synchronous request-response message communication makes programming easy because of the similarity to procedurecalls.
2. The distinction between small messages and a separate data transfer facility ties in well with a frcquenlly observed usage pattern: A last amount of interprocess communication is transfer of small amounts of control information (e.g. de& completion) while occasionally there is bulk data transfer (e.g. program loading).
3. Finally, sl nchronous communication and small, fixed-size mesugcs reduce queuing and buffering problems in the kernel. !n particular. only small, fixed-slrc mcssazc buffers

must be allocatedin the kcrnc! and Large amounts of data are transferred dircctlg bctwccn users¡¯ address spaces without extra topics. Morcovcr. by virtue of the synchrony of the communication. the kcrncl¡¯s message buffers can bc statically allocated. As excmp!i!icd in Thoth. these factors make for a small, efficient kemc!.
intcrnct functionality and local ¡®net performance, we have chosen not to burden the dominant (local net) operation with any more overhead than is strictly necessary.
3. The synchronous request-response nature of a reply associated with each message is exploited to build reliable message transmission directly on an unreliable datagram service. i.e. without using an exVa layer (and extra packets) to implement reliable transport The reply message serves as an acknowledgement as we!! as carrying the reply messagedata
4. The mapping from process id to process location is aided by encoding a host specificationin the process identifier. The kernel can thus determine quickly whether a process is either local or remote. and in the latter case on which machine it resides.
5. There are no per-packet acknowledgements for large data transfers (as in MoveTo and in MoveFtom). There is only a single acknowledgement when the transfer is complete.
6. File page-level transfers require the minima! number of packets (i.e. two) because of the ability to append short segments to messages using Receive WithSegment and Reply WithSegment.
The following sections look at particular aspects of the implementation in greater detail.
3.1. Process Naming
V uses a global (flat) naming space for specifying processes, in contrast to the local port naming used in DEiiOS [l] and Accent(12). Process identifiers are unique within the context of a local network. On the SUN workstation, it is natural for the V kernel to use 32-bitvrocess identifiers. The high-order 16 bits of the process identifier seme as a logical host identi/ier subfield while the low-order 16 bits are dsed as a locally unique identifier.
In the current 3 Mb Ethernet implementation, the top 8 bits of the logical host identifier are the physical network address of the workstation, making the process identifier to network address mapping trivial. In the 10 Mb implementation, a table maps logical hosts to network addresses. When there is no table entry for the specified logical host. the message is broadcast New ¡°logical host-to-network address¡± correspondences can be discovered from messages received. However, each node must at least know or discover its own logical host identifier during kernel initialization.
The use of an explicit host field in the process identifier allows distributed generation of unique process identifiers between machines and allows an efficient mapping from process id to network address. In particular. it is very cfficicnt to determine whether a process is local or remote. This ¡°locality¡± test on process idcntificrs serves as the primary invocation mechanism from the local kernel software into the network IPC portion. In gcncral, most V kcrnc! operations differ from their Thoth implcmcntation by a call to a ¡°non-local¡± handler when one of the .proccss idcntificr parameters fails to validate as a local
The V message primitives appear ill-suited in several ways for a network environment. at least on first observation. The shoi
fixed-length messages appear to make inefficient use of large packet sizes typically available on local networks. The ¡¯ synchronous nature of the message sending would seem to interfere with the true parallelism possibk between separate workstations. And the economies of message buffering afforded
by thcsc restrictions in a single machine implementation are less evident in a distributed environment. Finally, the separate data transfer operations MoveTo and MoveFrom appear only to increase the number of remote data transfer operations that must be implemented in the distributed case.
However, our experience has been that the V message primitives are easily and efficiently implemented over a local network. Moreover, we have found that the semantics of the primitives facilitated an efficient distributed implementation. The only major departure from Thoth was the explicit specification of segments in messages and the addition of the primitives Receive WithSegment and Reply WithSegment. This extension was done for efficient page-level file access although it has proven useful under more general circumstances, e.g. in passing character string names to name servers.
3. implementation Issues
A foremost concern-ii the implementation of the kernel has been efficiency.Before dcscri¡¯>ing sonic of the implementation det& of the individual primitives, we list several aspects of the implementation that are central to the efficient operation of the kernel.
1. Remote operations are implemented directly in the kernel instead of through a process-level network server. When the kernel recognizes a request directed to a remote process, it immediately writes an appropriate packet on the network. The alternative approach whereby the kernel relays a remote request to a network server who then proceeds to write the packet out on the network incurs a heavy penalty in extra copying and process switching. (WC measured a factor of four increase in the remote message exchange time.)
2. Interkcmc! packets use the ¡°raw¡± Ethernet data link level. The overhead of layered protocol implementation has been described many times [lo]. An alternative impkmcntation using intcrnct (II¡¯) headers showed a 20 percent increase in the basic mcssagc exchange time, cvcn without computing the IP hcadcr checksum and with only the simplest routing in the kernel. While we recognize the tradeoff bctwccn

process. With the exception of GetPid, kernel operations with no proccm identifier parameters are implicitly local to the workstation.
GetPid uses network broadcast to determine the mapping of a logical process identiIier to real process identifier if the mapping is not known to the local kernel. Any kernel knowing the mapping can respond to the broadcast request The addition of local and remote scopes was required to discriminate between server processes that serve only a single workstation and those that serve the network.
3.2. Remote Messagelmplementation
When a process identifier is spcciIied to Send with a logical host identifier different from that of the local machine. the local pid validation test fails and Send calls h¡¯onL.ocufSend which handles transmission of the message over the network
The NonLudSend routine writes a interkeme! packet on the network addressed to the host machine of this process or else broadcasts the packet if the host machine is not known. When the host containing the recipient process receives the packet it creates an alien process descriptor to represent the remote sending process using a standard kernel process descriptor¡¯and saves the message in the message buffer field of the alien process descriptor. H¡¯hen the receiving process repliesto the message, the reply is transmitted back to the sender as well as being saved for a period of time in the alien descriptor. If the sender does not receive a reply within the timeout period T. the original message is retransmitted by the sender¡¯s kernel. The receiving kernel filters out retransmissions of received messages by comparing the message sequence number and source process with those represented by the aliens. The kernel responds to a retransmitted message by discarding the message and either retransmitting the reply message or else sending back a ¡°reply-pending¡± packet to
the sending kernel if the reply has not yet been generated. It also sends back a reply-pending packet if it is forced to discard a new message because no (alien) process dcsaiptors are available. The sending kernel condudes the receiving process does not exist (and thus the Send has failed) if it receives a negative acknowledgement packet or it retransmits N times without receiving either a reply message or a reply-pending packet
This description supports the claim made above that reliable message transmission is built immediately on top of an unreliable datagram protocol with the minima1 number of network packets in the normal case
3.3. Remote Data Transfer
MoveTo and MoveFrom provide a means of transferring a large amount of data between remote processes with a minima! time increase over the time for transferring the Shme amount of data in
¡®USC of aandard kcrad praen &scriptw fa Jicru rcduca the amount of cpakdizcd code for hdling rcnxxc mcssaga Howvu. alien pracaa do nu eraue and can reasonably be thought of as marage buffcn
raw network datagrams. MoveTo transmits the data to bc moved in a scqucncc of maximally-sized packets to the destination workstation and awaits a single acknowlcdgemcnt packet when a!! the data has been rcccivcd. Given the observed low error rates of local networks, full retransmission on error introduces only a slight performance degradation. WC have, however, implcmcnted retransmission from the .!ast correctly received data packet in order to avoid the pitfall of repeated identical failures arising when back-to-back packets are consistently being dropped by the receiver. The implementation of MoveFrom is similar except a MoveFrom request is sent out and acknowledged by the req

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com