COMP3221: Distributed Systems
Architectures & Processes
Dr Nguyen Tran
School of Computer Science
The University of Sydney
Page 1
Previously…
– Basic definition of a distributed system
– “A collection of independent computers that appears to its users as a single
coherent system.”
– Real-world examples for distributed systems
– Transparency helps the users observe a single coherent system
The University of Sydney Page 2
Outline – Architectures
– Software Architectures
– System Architectures
– Centralized architectures
• The Client-Server Model
• The Layered Organization
– Decentralized architectures
• The Peer-to-Peer Organization
– Hybrid architectures
• Edge server systems
• Collaborative distributed systems
The University of Sydney
Page 3
Software Architectures
Architectures & Processes Week 2, COMP3221
The University of Sydney Page 4
Software Architectures
Logical organization of components in distributed systems
– Component: A modular unit with well-defined interfaces that is replaceable
– Connector: a mechanism that mediates communication, coordination among
components.
Layer-based Architectures vs. Object-based Architectures
– In (a), requests (resp. responses) go downward (resp. upward)
– In (b), objects communicate through Remote Procedure Calls (RPCs)
The University of Sydney Page 5
Software Architectures
Event-based Architectures vs. Data-centered Architectures
– –
(a) Event-based architecture: communication through events, that optionally carry data (subscribers get their desired events delivered)
(b) Data-centered architecture: through a shared repository, that contains data (e.g., files in a distributed file system, web-based distributed system, Twitter)
The University of Sydney Page 6
Centralized Architectures
Architectures & Processes Week 2, COMP3221
The University of Sydney Page 7
Centralized Architectures
The basic client-server model
– The client/server requests/provides the services
– The client and the server can be hosted on different machines.
– The communication follows a request-reply model.
– Examples: ??
The University of Sydney Page 8
The Client-Server Model
Stateless vs Stateful server
?
Hi, I’m comp1, may I have the lines 21-40 of file 5? Sure, you have the credentials, attached are the lines
Client
Stateless server
The University of Sydney
Page 9
The Client-Server Model
Stateless vs Stateful server
Hi, I’m comp1
I know, how are you doing today?
May I have the next 20 lines of my file Sure, here they are
Client
Stateful server
The University of Sydney
Page 10
The Client-Server Model
Stateless vs Stateful server
– Stateless server: does not record the state of its clients
– Stateful server: maintains persistent information about its clients
(client->file)
Stateless server
Stateful server
State
No info kept
Persistent info
Request
Self-contained
Can be split, generally faster
Upon failure
No recovery needed
State recovery needed (explicit deletion)
Example
Network file system (NFSv3)
Andrew file system (AFS)
The University of Sydney Page 11
The Layered Organization
Architectures & Processes Week 2, COMP3221
The University of Sydney Page 12
Application layering
Traditional three-layered view:
1. Theuserinterfacelayer
– contains the feature to control the application
2. The processing layer
– contains the function of the application
3. The data layer
– contains the data of the application
The University of Sydney
Page 13
Application layering (cont’d)
Example: a search engine request spanning the traditional three layers
The University of Sydney Page 14
Multi-tiered architectures
Physical two-tiered architecture
The University of Sydney Page 15
Multi-tiered architectures (cont’d)
A single machine can act both as a client and a server
Example: Cloud computing
– Cloud computing: the delivery of computation or storage as a service to end-users.
– The servers handle most of the computation upon request and sends back the results to the client
– One server asks data to another server – Another does the computation
–…
The University of Sydney Page 16
Decentralized Architectures
Architectures & Processes Week 2, COMP3221
The University of Sydney Page 17
The peer-to-peer model
Every machine acts similarly
– Every machine is both a client and a server
– No centralized control: the responsibility is distributed evenly – Even the program executing on each machine is similar
– Forms an overlay network,
– nodes are processors,
– links possible communication channels.
The University of Sydney
Page 18
Structured Peer-to-Peer Model
The overlay network is constructed using a deterministic procedure
– Chord is an example of a Distributed Hash Table (DHT)
As a node:
– I have a successor peer
– I have a predecessor peer
– I have some shortcuts to other nodes to speedup delivery of requests
– I am responsible of a subset of the system data items (based on my unique identifier)
The University of Sydney
Page 19
Hybrid Architectures
Architectures & Processes Week 2, COMP3221
The University of Sydney Page 20
Edge-Server Systems
– Servers are placed at the edge of the network.
– Edge servers serve content, possibly after applying filtering
and transcoding functions
– One edge server acts as an origin server for all content, while taking advantage of other servers for optimized delivery of content, e.g. through replication.
The University of Sydney Page 21
Collaborative Distributed Systems
– How do you first get started ?
– Often a traditional client-server models is used.
– Once joined fully decentralized schemes can be used.
– Example: BitTorrent, a file sharing application
– 35% of world-wide internet traffic in 2015.
– Used for Linux distribution, software patches, distributing movies
– Goal: quickly replicate large files to large number of clients
The University of Sydney
Page 22
BitTorrent
– Web server hosts a .torrent file (w/ file length, hash, tracker’s URL…)
– A tracker (server or a DHT) tracks downloaders/owners of a file
– Files are divided into chunks (256kb-1MB)
– Downloaders download chunks from themselves (and owners)
– Tit-for-tat: the more one shares (server), the faster it can
download (client)
The University of Sydney Page 23
Conclusion
– Client and server are used to identify the role of communication participant
– Client and server roles may run on:
– The same machine
– Distinct machines with very different resources – Distinct machines with similar resources
– Most moderns distributed systems are hybrid architectures The University of Sydney
Page 24
Processes
Architecture & Processes Week 2, COMP3221
Dr. Nguyen Tran
School of Computer Science
The University of Sydney
Page 25
Introduction
– Previously: a distributed system gives the illusion of a single system to a user thanks to transparency
– Now: a single system gives the illusion of multiple resources to multiple users thanks to concurrency
The University of Sydney Page 26
Outline – Processes
– A Very Brief History
– UNIX Processes and Threads – Multi-threading
– Multi-threaded Server
The University of Sydney
Page 27
A Very Brief History
Architectures & Processes Week 2, COMP3221
The University of Sydney Page 28
Evolution of Operating Systems (OSes)
– The hardware evolved
– Hardware was expensive, humans were cheap (e.g., Multics 1960’s)
– Hardware was cheaper, humans were expensive (e.g., desktop computer 1980’s)
– Hardware are very cheap, humans are very expensive (e.g., handheld devices 2000’s)
– The OS interaction had to adapt
– Batch: one execution at a time
– Multiprogramming: multiple program executions simultaneously
– Timeshare: split time into slots allocated for different program execution – GUI: multiple interfaces to access a system from different end-points
– Ubiquitous devices: each user possesses her own computational device
The University of Sydney Page 29
Uniprogramming vs. Multiprogramming
– Uniprogramming: one program execution at a time – MS/DOS
– early Macintosh
– Batch processing
– No longer acceptable!
– Multiprogramming: more than one program execution at a time
– Multics, UNIX/Linux, Windows NT, Mac OS X
– Many program executions on personal computers: – Browsing the web
– Sending emails
– Typing a letter
– Burning a DVD The University of Sydney
Page 30
Concurrency
– Assume a single central processing unit (CPU)
– Problem: How to give the illusion to the users of multiple CPUs?
CPU
exec1
exec2
exec3
CPU1
CPU2
CPU3
Time
– Solution: scheduling program executions, one after the other during small fractions of the time
CPU
exec1.1
exec2.1
exec3.1
exec1.2
exec2.2
exec3.2
The University of Sydney Time Page 31
Concurrency
– Multi/many-cores on a single machine
CPU1
cache
CPU2
cache
CPU3
cache
Network on chip
shared memory
– Distributed operating systems
– This is a single system image, the system maintains a single copy of the resources
Distributed applications
Distribution service
Kernel
Network
Kernel
Kernel
The University of Sydney
Page 32
Concurrency
– Network operating systems – Machines provide resources to
other machines (e.g., UNIX rlogin)
– Middleware
– A layer over the network services providing general services to applications in a very transparent manner (systems can differ)
stributed applications
Dist. service
Kernel
Network
stributed applications Middleware
Network services
Kernel
Dist. service
Kernel
Di
Dist. service
Kernel
Di
Network services
Kernel
Network services
Kernel
The University of Sydney
Page 33
Network
Processes and Threads
Architectures & Processes Week 2, COMP3221
The University of Sydney Page 34
Processes
Process: defined as a program in execution
– Operating system creates a virtual CPU for each process – OS makes sure that processes cannot maliciously or
inadvertently affect each other’s behaviour, i.e. Secure ! – This concurrent transparency comes at a very high cost:
– For every process (virtual CPU), OS must create a completely independent address space.
The University of Sydney
Page 35
Address space
Address space: A unit of management of a process’s virtual memory – Process address space:
– Stack: temporary data extensible towards lower virtual addresses
– Heap: memory allocated dynamically extensible to higher virtual addresses
– Data section: global variable
– Text region: program code
Stack
Heap
Data
Text
The University of Sydney
Page 36
Process
Process: an abstraction representing a program execution – A process switches from a state to another:
exec1
exec1.1
exec2.1
– When there is timer interrupt goes off, explicit yield, I/O, etc.
Time
new admitted ready
interrupt
exit terminated
running
I/O or event wait
I/O or event completion
Scheduler dispatch waiting
The University of Sydney
Page 37
Process management
Process creation:
– Process allocation: the act of choosing the host of the process
– The system provides command to execute a process at an idle workstation – The system chooses a host from a pool of processors to execute it
– Execution environment: an address space w/ initialized content & open files
– Static: program text, heap and stack regions created from a list
– Dynamic: UNIX fork shares program text and copies stack and heap
regions
The University of Sydney Page 38
Process management
CPU
Time
exec1.1
exec2.1
exec3.1
exec1.2
exec2.2
exec3.2
– Switching from one process to another has become one of the most expensive operations due to Context-Switches
– CPU context: register values, program counter, stack pointer, etc.
– Modify registers of MMU (Memory management unit)
– Invalidate address translation caches such as in TLB (Translation lookaside buffer)
Inter-Process Communication (IPC):
– Pipes/Message queues/Shared memory segments – Requires costly context switches
The University of Sydney
Page 39
Context-switch
IPC requires kernel intervention
– S1. Process A switches from user to kernel mode
– Changing the memory map in the MMU (memory management unit). – Flushing the TLB (translation look aside buffer)
– S2. Process context-switch (swapping processes from A to B) – S3. Process B switches from kernel to user mode
– Again requires changing the MMU map and flushing the TLB
The University of Sydney
Page 40
Thread
Threads: smallest unit of CPU utilization – Multiple threads per process
– Portable OS Interface (POSIX) Threads
– Current activity: program counter
– Stack: temporary data
– No data, code, files (but it shares them with other threads)
code data files
registers
stack
code data files
reg reg reg
stack stack stsatacckk
The University of Sydney
Page 41
Single-threaded process
Multithreaded process
thread
Thread
– Portable OS Interface (POSIX) Threads
– Current activity: program counter
– Stack: temporary data
– No data, code, files (but it shares them with other threads)
– Communication between threads always through memory
– Does not need IPC
– No context switches required (when purely at user-level)
The University of Sydney Page 42
Thread Implementation
User-level threads library vs. kernel-scheduled thread
–
User-level thread library:
1. Cheap to create and destroy
2. Blocking system call freezes the entire process of the thread (and other
threads)
3. Cheap context-switch:
• FewinstructionstostoreandreloadCPUregistervalues • NoneedtochangememorymapinMMU
• NoneedtoflushtheTLB
Kernel-scheduled threads:
1. Costly to create and destroy
2. Do not block the current process (and other threads) upon blocking system calls
(I/O)
3. Costly context-switch (similar to process context switch)
–
The University of Sydney
Page 43
•
•
NeedstochangememorymapinMMU NeedstoflushtheTLB
Processes vs. threads
– Processes
– Isolated: prevents one process from interfering with another
– Inefficient: starting/terminating a process and context switches are costly
– Threads
– Non-isolated: avoiding incorrect interferences makes
programming harder
– Efficient: a thread is a lightweight version of a process – Multiple threads of control per process
The University of Sydney
Page 44
Client multi-threading
Example: The Google Chrome web browser
– Multithreading: each tab runs its own thread
– A JavaScript error does not crash the main Chrome process:
Only one tab freezes, the user can close it independently of others
– Takes benefit of multicore architectures, each thread runs on a separate core
The University of Sydney Page 45
one distinct thread running each tab
Conclusion
– Concurrency has been used for decades since hardware was powerful enough to address multiple human needs
– Processes are heavy but protected whereas threads are lightweight but non-protected
– Threads require synchronization to be protected from each other
The University of Sydney
Page 46
What’s Next ?
– Tutorial on Wednesday. – Multithreading
– Read Chapter 1, 2 and 3 of the textbook
– Next week: How do we communicate in distributed systems ? – See you all next week !
The University of Sydney
Page 47