CS 111 Summer 2022
Lecture 17 Page 1
Operating System Principles: Distributed Systems
Operating Systems
Copyright By PowCoder代写 加微信 powcoder
• Introduction
• Distributed system paradigms
• Remote procedure calls
• Distributed synchronization and consensus
• Distributed system security
• Accessing remote data
CS 111 Summer 2022
Lecture 17 Page 2
Introduction
Why do we care about distributed systems?
– Because that’s how most modern computing is done
Why is this an OS topic?
– Because it’s definitely a systems issue
– And even the OS on a single computer needs to worry about distributed issues
If you don’t know a bit about distributed
systems, you’re not a modern computer
scientist Summer 2022
Lecture 17 Page 3
Why Distributed Systems?
• Betterscalabilityandperformance
– Apps require more resources than one computer has
– Can we grow system capacity/bandwidth to meet demand?
• Improvedreliabilityandavailability
– 24×7 service despite disk/computer/software failures
• Easeofuse,withreducedoperatingexpenses
– Centralized management of all services and systems – Buy (better) services rather than computer equipment
• Enablingnewcollaborationandbusinessmodels
– Collaborations that span system (or national) boundaries
CS 111 – A global free market for a wide range of new services Summer 2022
Lecture 17 Page 4
A Few Little Problems
Different machines don’t share memory
– Or any peripheral devices
– So one machine can’t easily know the state of
Might this cause synchronization problems?
The only way to interact remotely is to use a
So how can we know what’s going on remotely?
– Usually asynchronous, slow, and error prone
– Usually not controlled by any single machine
Failures of one machine aren’t visible to other
machines Summer 2022
How can our computation be
reliable if pieces fail? Lecture 17 Page 5
Transparency
• Ideally, a distributed system would be just like a single machine system
• But better
– More resources – More reliable – Faster
• Transparent distributed systems look as much like single machine systems as possible
CS 111 Summer 2022
Lecture 17 Page 6
Deutsch’s “Seven Fallacies of Network Computing”
1. The network is reliable
2. There is no latency (instant response time)
3. The available bandwidth is infinite
4. The network is secure
5. The topology of the network does not change
6. There is one administrator for the whole network 7. The cost of transporting additional data is zero Bottom Line: true transparency is not achievable
CS 111 Summer 2022
Lecture 17 Page 7
Here’s an eight: all locations on the network are equivalent.
Distributed System Paradigms
• Parallel processing
– Relying on tightly coupled special hardware
Not widely used, we won’t discuss them.
• Single system images
– Make all the nodes look like one big computer – Somewhere between hard and impossible
• Loosely coupled systems
– Work with difficulties as best as you can
– Typical modern approach to distributed systems
• Cloud computing
CS 111 – A recent variant Summer 2022
Lecture 17 Page 8
So these are also not popular, and we won’t discuss them.
Loosely Coupled Systems
• Characterization:
– A parallel group of independent computers
– Connected by a high speed LAN
– Serving similar but independent requests
– Minimal coordination and cooperation required
• Motivation:
– Scalability and price performance
– Availability – if protocol permits stateless servers – Ease of management, reconfigurable capacity
• Examples:
– Web servers, app servers, cloud computing
CS 111 Summer 2022
Lecture 17 Page 9
Horizontal Scalability
• Each node largely independent
• So you can add capacity just by adding a node “on the side”
• Scalability can be limited by network, instead of hardware or algorithms
– Or, perhaps, by a load balancer • Reliability is high
– Failure of one of N nodes just reduces capacity
CS 111 Summer 2022
Lecture 17 Page 10
Horizontal Scalability Architecture
If I need more web server capacity,
WAN to clients
load balancing switch with fail-over
web server
app server
app server
app server
app server
app server
HA database server
CS 111 Summer 2022
Lecture 17 Page 11
web server
web server
web server
web server
content distribution server
Elements of Loosely Coupled Architecture
• Farmofindependentservers
– Servers run same software, serve different requests – May share a common back-end database
• Front-endswitch
– Distributes incoming requests among available servers
– Can do both load balancing and fail-over
• Serviceprotocol
– Stateless servers and idempotent operations
– Successive requests may be sent to different servers
CS 111 Summer 2022
Lecture 17 Page 12
Same result if you do it once, twice, three times, . . ., n times
Horizontally Scaled Performance • Individualserversareveryinexpensive
– Blade servers may be only $100-$200 each • Scalabilityisexcellent
– 100 servers deliver approximately 100x performance
• Serviceavailabilityisexcellent
– Front-end automatically bypasses failed servers – Stateless servers and client retries fail-over easily
• Thechallengeismanagingthousandsofservers
– Automated installation, global configuration services
– Self monitoring, self-healing systems
– Scaling limited by management, not HW or algorithms
CS 111 Summer 2022
Lecture 17 Page 13
Cloud Computing
• The most recent twist on distributed computing
• Set up a large number of machines all identically configured
• Connect them to a high speed LAN – And to the Internet
• Accept arbitrary jobs from remote users
• Run each job on one or more nodes
• Entire facility probably running mix of single machine and distributed jobs, simultaneously
CS 111 Summer 2022
Lecture 17 Page 14
What Runs in a Cloud? In principle, anything
– But general distributed computing is hard
So much of the work is run using special tools
These tools support particular kinds of parallel/distributed processing
– Either embarrassingly parallel jobs
– Or those using a method like map-reduce or
horizontal scaling
Things where the user need not be a distributed
systems expert Summer 2022
Lecture 17 Page 15
Embarrassingly Parallel Jobs
• Problems where it’s really, really easy to parallelize them
• Probably because the data sets are easily divisible
• And exactly the same things are done on each piece
• So you just parcel them out among the nodes and let each go independently
• Everyone finishes at more or less same time
CS 111 Summer 2022
Lecture 17 Page 16
• Perhaps the most common cloud computing software tool/technique
• A method of dividing large problems into compartmentalized pieces
• Each of which can be performed on a separate node
• With an eventual combined set of results
CS 111 Summer 2022
Lecture 17 Page 17
The Idea Behind MapReduce
• There is a single function you want to perform on a lot of data
– Such as searching it for a particular string
• Divide the data into disjoint pieces
• Perform the function on each piece on a
separate node (map)
• Combine the results to obtain output
CS 111 Summer 2022
Lecture 17 Page 18
An Example
• We have 64 megabytes of text data
• Count how many times each word occurs in the text
• Divide it into 4 chunks of 16 Mbytes
• Assign each chunk to one processor
• Perform the map function of “count words” on each
CS 111 Summer 2022
Lecture 17 Page 19
The Example Continued
Foo Zoo Foo Zoo Foo Zoo Foo Zoo 16712249
CS 111 Summer 2022
Lecture 17 Page 20
Bar 4 Baz 3
Yes 12 Too 5
Bar 3 Baz 9
Yes 17 Too 8
Bar 6 Baz 2
Yes Bar 7 10 Baz 5 Too 4
Yes 3 Too 7
That’s the map stage
On To Reduce
• We might have two more nodes assigned to doing the reduce operation
• They will each receive a share of data from a map node
• The reduce node performs a reduce operation to “combine” the shares
• Outputting its own result
CS 111 Summer 2022
Lecture 17 Page 21
Continuing the Example
Foo Zoo Foo Zoo Foo Zoo Foo Zoo 16712249
Bar 4 Baz 3
Yes 12 Too 5
Bar 3 Baz 9
Yes 17 Too 8
Bar 6 Baz 2
Yes Bar 7 10 Baz 5 Too 4
Yes 3 Too 7
CS 111 Summer 2022
Lecture 17 Page 22
The Reduce Nodes Do Their Job
Write out the results to files And MapReduce is done!
Foo Zoo 14 16 Bar 20 Yes Baz 42 19 Too
CS 111 Summer 2022
Lecture 17 Page 23
But I Wanted A Combined List
• No problem
• Run another (slightly different) MapReduce on the outputs
• Have one reduce node that combines everything
CS 111 Summer 2022
Lecture 17 Page 24
CS 111 Summer 2022
Lecture 17 Page 25
Synchronization in MapReduce
• Each map node produces an output file for each reduce node
• It is produced atomically
• The reduce node can’t work on this data
until the whole file is written
• Forcing a synchronization point between the map and reduce phases
Map Reduce vs. Embarrassing Parallelism
• Embarrassing parallelism is enough if it’s easy to divide a job into pieces
– Of the same size
• And if you don’t worry about failures
• And if you don’t need to combine the results in a non-trivial way
• Map reduce is needed if those things aren’t true
CS 111 Summer 2022
Lecture 17 Page 26
Cloud Computing and Horizontal Scaling
• An excellent match
• Rent some cloud nodes to be your web servers
• If load gets heavy, ask the cloud for another web server node
• As load lightens, release unneeded nodes
• No need to buy new machines
• No need to administer your own machines
CS 111 Summer 2022
Lecture 17 Page 27
Cloud Computing and Sysadmin
• Not quite as painless as it sounds
• The cloud provider will take care of lots of the problem
– Running the hardware
– Fixing broken hardware
– Loading your software onto machines
• But they won’t take care of internal administration
– E.g., updating the version of the web server you’re
running CS 111
Summer 2022
Lecture 17 Page 28
Actually, they will take care of that, too, but at an extra price and with a loss of control.
Remote Procedure Calls
• RPC, for short
• One way of building a distributed program
• Procedure calls are a fundamental paradigm
– Primary unit of computation in most languages
– Unit of information hiding in most methodologies – Primary level of interface specification
• A natural boundary between client and server – Turn procedure calls into message send/receives
• A few limitations
– No implicit parameters/returns (e.g., global variables)
– No call-by-reference parameters
– Much slower than procedure calls (TANSTAAFL)
CS 111 Summer 2022
Lecture 17 Page 29
Remote Procedure Call Concepts • Interface Specification
– Methods, parameter types, return types
• eXternal Data Representation (XDR)
– Machine independent data-type representations – May have optimizations for similar client/server
• Client stub
– Client-side proxy for a method in the API
• Server stub (or skeleton)
– Server-side recipient for API invocations
CS 111 Summer 2022
Lecture 17 Page 30
Key Features of RPC
• Client application links against local procedures
– Calls local procedures, gets results
• All RPC implementation inside those procedures
• Client application does not know about RPC – Does not know about formats of messages
– Does not worry about sends, timeouts, resends
– Does not know about external data representation
• All of this is generated automatically by RPC tools
• The key to the tools is the interface specification
CS 111 Summer 2022
Lecture 17 Page 31
RPC At Work, Step 1
Process_list
… list[0] = 10;
list[1] = 20; list[2] = 17;
max = list_max(list);
CS 111 Summer 2022
list_max() is a remote procedure call!
Lecture 17 Page 32
RPC At Work, Step 2
Process_list
local_max = list_max(list);
. . . list[0] = 10;
list[1] = 20; list[2] = 17;
max = list_max(list);
Format RPC message
Send the message
CS 111 Summer 2022
Extract RPC info
list_max()
Call local procedure
Lecture 17 Page 33
RPC message: list_max(), parameter list
RPC At Work, Step 3
… list[0] = 10;
list[1] = 20;
list[2] = 17;
local_max = list_max(list);
Format RPC response
Send the message
Lecture 17 Page 34
CS 111 Summer 2022
max = list_max(list);
If (max > 10) {
Extract the return value Resume the local program
RPC response: list_max(), return value 20
RPC Is Not a Complete Solution
• Requires client/server binding model
– Expects to be given a live connection
• Threading model implementation
– A single thread services requests one at a time
– So use numerous one-per-request worker threads
• Limited consistency support
– Only between calling client and called server
– What if there are multiple clients and servers working together?
• Limited failure handling
– Client must arrange for timeout and recovery
• Higher level abstractions improve RPC
– e.g. Microsoft DCOM, Java RMI, DRb, Pyro
CS 111 Summer 2022
Lecture 17 Page 35
Distributed Synchronization
• Why is it hard to synchronize distributed systems?
• What tools do we use to synchronize them?
CS 111 Summer 2022
Lecture 17 Page 36
What’s Hard About Distributed Synchronization?
• Spatial separation
– Different processes run on different systems
– No shared memory for (atomic instruction) locks – They are controlled by different operating systems
• Temporal separation
– Can’t “totally order” spatially separated events – Before/simultaneous/after lose their meaning
• Independent modes of failure
CS 111 – One partner can die, while others continue
Summer 2022
Lecture 17 Page 37
Leases – More Robust Locks
• Obtained from resource manager
– Gives client exclusive right to update the file
– Lease “cookie” must be passed to server on update – Lease can be released at end of critical section
• Only valid for a limited period of time – After which the lease cookie expires
• Updates with stale cookies are not permitted – After which new leases can be granted
• Handles a wide range of failures
– Process, client node, server node, network
CS 111 Summer 2022
Lecture 17 Page 38
Lock Breaking and Recovery • Revoking an expired lease is fairly easy
– Lease cookie includes a “good until” time • Based on server’s clock
– Any operation involving a “stale cookie” fails
• This makes it safe to issue a new lease
– Old lease-holder can no longer access object – But was object left in a “reasonable” state?
• Object must be restored to last “good” state – Roll back to state prior to the aborted lease
CS 111 – Implement all-or-none transactions Summer 2022
Lecture 17 Page 39
Distributed Consensus
• Achievingsimultaneous,unanimousagreement
– Even in the presence of node & network failures
– Required: agreement, termination, validity, integrity
– Desired: bounded time
– Provably impossible in fully general case
– But can be done in useful special cases, or if some
requirements are relaxed
• Consensusalgorithmstendtobecomplex
– And may take a long time to converge
• Theytendtobeusedsparingly
– E.g., use consensus to elect a leader
– Who makes all subsequent decisions by fiat
CS 111 Summer 2022
Lecture 17 Page 40
Typical Consensus Algorithm
1. Each interested member broadcasts his nomination.
2. All parties evaluate the received proposals according to a fixed and well known rule.
3. After allowing a reasonable time for proposals, each voter acknowledges the best proposal it has seen.
4. If a proposal has a majority of the votes, the proposing member broadcasts a claim that the question has been resolved.
5. Each party that agrees with the winner’s claim acknowledges the announced resolution.
6. Election is over when a quorum acknowledges the result.
What’s going to happen if someone lies . . . ?
CS 111 Summer 2022
Lecture 17 Page 41
Security for Distributed Systems
• Security is hard in single machines
• It’s even harder in distributed systems • Why?
CS 111 Summer 2022
Lecture 17 Page 42
Why Is Distributed Security Harder?
• Your OS cannot guarantee privacy and integrity – Network activities happen outside of the OS – Should you trust where they happen?
• Authentication is harder
– All possible agents may not be in local password file
• The wire connecting the user to the system is insecure – Eavesdropping, replays, man-in-the-middle attacks
• Even with honest partners, hard to coordinate distributed security
• The Internet is an open network for all
– Many sites on the Internet try to serve all comers
– Core Internet makes no judgments on what’s acceptable
– Even supposedly private systems may be on Internet
CS 111 Summer 2022
Lecture 17 Page 43
Goals of Network Security
• Secure conversations
– Privacy: only you and your partner know what is said – Integrity: nobody can tamper with your messages
• Positive identification of both parties
– Authentication of the identity of message sender
– Assurance that a message is not a replay or forgery – Non-repudiation: he cannot claim “I didn’t say that”
• Availability
– The network and other nodes must be reachable when
they need to be Summer 2022
Lecture 17 Page 44
Elements of Network Security • Cryptography
– Symmetric cryptography for protecting bulk transport of data
– Public key cryptography primarily for authentication
– Cryptographic hashes to detect message alterations
• Digital signatures and public key certificates – Powerful tools to authenticate a message’s sender
• Filtering technologies
– Firewalls and the like
– To keep bad stuff from reaching our machines
CS 111 Summer 2022
Lecture 17 Page 45
Tamper Detection: Cryptographic Hashes
• Check-sums often used to detect data corruption – Add up all bytes in a block, send sum along with data
– Recipient adds up all the received bytes
– If check-sums agree, the data is probably OK
– Check-sum (parity, CRC, ECC) algorithms are weak
• Cryptographic hashes are very strong check-sums
– Unique –two messages vanishingly unlikely to
produce same hash
• Particularly hard to find two messages with the same hash
– One way – cannot infer original input from output
– Well distributed – any change to input changes output
CS 111 Summer 2022
Lecture 17 Page 46
Using Cryptographic Hashes
• Startwithamessageyouwanttoprotect
• Computeacryptographichashforthatmessage
– E.g., using the Secure Hash Algorithm 3 (SHA-3) • Transmitthehashsecurely
• Recipientdoessamecomputationonreceivedtext
– If both hash results agree, the message is intact
– If not, the message has been corrupted/compromised
CS 111 Summer 2022
Lecture 17 Page 47
Secure Hash Transport • Whymustthehashbetransmittedsecurely?
– Cryptographic hashes aren’t keyed, so anyone can produce them (including a bad guy)
• Howtotransmithashsecurely?
– Encrypt it
– Unless secrecy required, cheaper than encrypting entire message
– If you have a secure channel, could transmit it that way
CS 111 Summer 2022
• But if you have secure channel, why not use it for everything?
Lecture 17 Page 48
A Principle of Key Use
• BothsymmetricandPKcryptographyrelyonasecret key for their properties
• Themoreyouuseonekey,thelesssecure – The key stays around in various places longer
– There are more opportunities for an attacker to get it – There is more incentive for attacker to get it
– Brute force attacks may eventually succeed
• Therefore:
– Use a given key as little as possible
– Change them often
– Within the limits of practicality and required performance
CS 111 Summer 2022
Lecture 17 Page 49
Putting It Together: Secure Socket Layer (SSL)
• A general solution for securing network communication
• Built on top of existing socket IPC
• Establishes secure link between two parties
– Privacy – nobody can snoop on conversation – Integrity – nobody can generate fake messages
• Certificate-based authentication of server – Typically, but not necessarily
– Client knows what server he is talking to
• Optional certificate-based authentication of client – If server requires authentication and non-repudiation
• PK used to distribute a symmetric session key – New key for each new socket
• Rest of data transport switches to s
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com