程序代写代做代考 concurrency cache flex distributed system clock algorithm graph database C chain go Systems, Networks & Concurrency 2019

Systems, Networks & Concurrency 2019
Distributed Syst8ems Uwe R. Zimmer – The Australian National University

[Bacon1998]
[Schneider1990]
Bacon, J
Schneider, Fred
Concurrent Systems
Implementing fault-tolerant services using the state machine approach: a tutorial ACM Computing Surveys 1990
vol. 22 (4) pp. 299-319
Addison Wesley Longman Ltd (2nd edition) 1998
[Ben2006]
Ben-Ari, M
[Tanenbaum2001]
Principles of Concurrent and Dis- tributed Programming
second edition, Prentice-Hall 2006
Tanenbaum, Andrew
© 2019 Uwe R. Zimmer, The Australian National University
page 514 of 757 (chapter 8: “Distributed Systems” up to page 640)
Distributed Systems
References for this chapter
Distributed Systems: Prin- ciples and Paradigms Prentice Hall 2001
[Tanenbaum2003]
Tanenbaum, Andrew
Computer Networks
Prentice Hall, 2003

Distributed Systems
Network protocols & standards
OSI network reference model
Standardized as the
Open Systems Interconnection (OSI) reference model by the
International Standardization Organization (ISO) in 1977
• 7 layer architecture
• Connection oriented
Hardy implemented anywhere in full …
…but its concepts and terminology are widely used, when describing existing and designing new protocols …
© 2019 Uwe R. Zimmer, The Australian National University page 515 of 757 (chapter 8: “Distributed Systems” up to page 640)

© 2019 Uwe R. Zimmer, The Australian National University
page 516 of 757 (chapter 8: “Distributed Systems” up to page 640)
User data
User data
Application Presentation Session Transport Network Data link Physical
Network Data link Physical
Application Presentation Session Transport Network Data link Physical
Distributed Systems
Network protocols & standards
OSI Network Layers

1: Physical Layer
Application Presentation Session Transport Network Data link Physical
Network Data link Physical
Application Presentation Session Transport Network Data link Physical
• Service: Transmission of a raw bit stream over a communication channel
Distributed Systems
Network protocols & standards
• Functions: Conversion of bits into electrical or optical signals • Examples: X.21, Ethernet (cable, detectors & amplifiers)
© 2019 Uwe R. Zimmer, The Australian National University page 517 of 757 (chapter 8: “Distributed Systems” up to page 640)
User data
User data
OSI Network Layers

2: Data Link Layer
Application Presentation Session Transport Network Data link Physical
Network Data link Physical
Application Presentation Session Transport Network Data link Physical
• Service: Reliable transfer of frames over a link
Distributed Systems
Network protocols & standards
• Functions: Synchronization, error correction, flow control
• Examples: HDLC (high level data link control protocol), LAP-B (link access procedure, balanced),
LAP-D (link access procedure, D-channel),
LLC (link level control), …
© 2019 Uwe R. Zimmer, The Australian National University page 518 of 757 (chapter 8: “Distributed Systems” up to page 640)
User data
User data
OSI Network Layers

3: Network Layer
Application Presentation Session Transport Network Data link Physical
Network Data link Physical
Application Presentation Session Transport Network Data link Physical
Distributed Systems
Network protocols & standards
• Service: Transfer of packets inside the network
• Functions: Routing, addressing, switching, congestion control • Examples: IP, X.25
© 2019 Uwe R. Zimmer, The Australian National University page 519 of 757 (chapter 8: “Distributed Systems” up to page 640)
User data
User data
OSI Network Layers

4: Transport Layer
Application Presentation Session Transport Network Data link Physical
Network Data link Physical
Application Presentation Session Transport Network Data link Physical
Distributed Systems
Network protocols & standards
• Service: Transfer of data between hosts
• Functions: Connection establishment, management,
termination, flow-control, multiplexing, error detection • Examples: TCP, UDP, ISO TP0-TP4
© 2019 Uwe R. Zimmer, The Australian National University page 520 of 757 (chapter 8: “Distributed Systems” up to page 640)
User data
User data
OSI Network Layers

5: Session Layer
Application Presentation Session Transport Network Data link Physical
Network Data link Physical
Application Presentation Session Transport Network Data link Physical
Distributed Systems
Network protocols & standards
• Service: Coordination of the dialogue between application programs • Functions: Session establishment, management, termination
• Examples: RPC
© 2019 Uwe R. Zimmer, The Australian National University page 521 of 757 (chapter 8: “Distributed Systems” up to page 640)
User data
User data
OSI Network Layers

6: Presentation Layer
Application Presentation Session Transport Network Data link Physical
Network Data link Physical
Application Presentation Session Transport Network Data link Physical
Distributed Systems
Network protocols & standards
• Service: Provision of platform independent coding and encryption • Functions: Code conversion, encryption, virtual devices
• Examples: ISO code conversion, PGP encryption
© 2019 Uwe R. Zimmer, The Australian National University page 522 of 757 (chapter 8: “Distributed Systems” up to page 640)
User data
User data
OSI Network Layers

7: Application Layer
Application Presentation Session Transport Network Data link Physical
Network Data link Physical
Application Presentation Session Transport Network Data link Physical
Distributed Systems
Network protocols & standards
• Service: Network access for application programs
• Functions: Application/OS specific
• Examples: APIs for mail, ftp, ssh, scp, discovery protocols …
© 2019 Uwe R. Zimmer, The Australian National University page 523 of 757 (chapter 8: “Distributed Systems” up to page 640)
User data
User data
OSI Network Layers

Speed only limited by what both sides can survive.
Usually push-pull drivers,
i.e. fast and reliable, yet not friendly to wrong wiring/programming.
Distributed Systems
Network protocols & standards
Serial Peripheral Interface (SPI)
Used by gazillions of devices … and it’s not even a formal standard!
1.8” COLOR TFT LCD display from Adafruit
SanDisk marketing photo
© 2019 Uwe R. Zimmer, The Australian National University page 524 of 757 (chapter 8: “Distributed Systems” up to page 640)

Master
Slave
MISO Transmit shift register
Receive shift register MISO Transmit shift register MOSI
Clock generator Slave selector
NSS CS
© 2019 Uwe R. Zimmer, The Australian National University
page 525 of 757 (chapter 8: “Distributed Systems” up to page 640)
Distributed Systems
Network protocols & standards
Serial Peripheral Interface (SPI)
Full Duplex, 4-wire, flexible clock rate
MOSI SCK SCK
Receive shift register

Clock phase and polarity need to be agreed upon
MISO
MOSI SCK
CS
Network protocols & standards
Master
Slave
Serial Peripheral Interface (SPI)
Receive shift register MISO MISO Transmit shift register MOSI MOSI
Transmit shift register Receive shift register
Set Set
Set Set Set Set Set Set
© 2019 Uwe R. Zimmer, The Australian National University
page 526 of 757 (chapter 8: “Distributed Systems” up to page 640)
Distributed Systems
Sample Sample Sample Sample Sample Sample Sample Sample
Clock generator Slave selector
SCK SCK NSS CS
time

1 shift register?
FIFOs?
CRC?
Data connected to an internal bus?
DMA?
Speed?





Network protocols & standards (SPI)
Master
Slave








Receive shift register MISO MISO Transmit shift register MOSI MOSI
Transmit shift register Receive shift register
Serial Peripheral Interface (SPI)
Clock generator Slave selector
SCK SCK NSS CS
from STM32L4x6 advanced ARM®-based 32-bit MCUs reference manual: Figure 420 on page 1291

Distributed Systems


© 2019 Uwe R. Zimmer, The Australian National University page 527 of 757 (chapter 8: “Distributed Systems” up to page 640)

Full duplex with 1 out of x slaves
Network protocols & standards (SPI)
Master
Slave
Master
Slave 1
Receive shift register MISO MISO Transmit shift register MOSI MOSI
Transmit shift register Receive shift register
Receive shift register Transmit shift register
MISO MISO MOSI MOSI SCK SCK
Transmit shift register Receive shift register
Clock generator Slave selector
SCK SCK NSS CS
Clock generator Slave selector
S1 CS S2
S3
© 2019 Uwe R. Zimmer, The Australian National University
page 528 of 757 (chapter 8: “Distributed Systems” up to page 640)
MISO MOSI SCK
Transmit shift register Receive shift register
MISO MOSI SCK
Transmit shift register Receive shift register
CS
CS
Distributed Systems
Slave 2
Slave 3

Concurrent simplex with y out of x slaves
Network protocols & standards (SPI)
Master
Slave
Master
Slave 1
Receive shift register MISO MISO Transmit shift register MOSI MOSI
Transmit shift register Receive shift register
Receive shift register Transmit shift register
Transmit shift register Receive shift register
Clock generator Slave selector
SCK SCK NSS CS
Clock generator Slave selector
MOSI MOSI SCK SCK
© 2019 Uwe R. Zimmer, The Australian National University
page 529 of 757 (chapter 8: “Distributed Systems” up to page 640)
S1 CS S2
S3
MOSI SCK
Transmit shift register Receive shift register
MOSI SCK
Transmit shift register Receive shift register
CS
CS
Distributed Systems
Slave 2
Slave 3

Concurrent daisy chaining with all slaves
Network protocols & standards (SPI)
Master
Slave
Master
Slave 1
Receive shift register MISO MISO Transmit shift register MOSI MOSI
Transmit shift register Receive shift register
Receive shift register Transmit shift register
MISO MISO MOSI MOSI SCK SCK
Transmit shift register Receive shift register
Clock generator Slave selector
SCK SCK NSS CS
Clock generator Slave selector
NSS CS
© 2019 Uwe R. Zimmer, The Australian National University
page 530 of 757 (chapter 8: “Distributed Systems” up to page 640)
MISO MOSI SCK
Transmit shift register Receive shift register
MISO MOSI SCK
Transmit shift register Receive shift register
CS
CS
Distributed Systems
Slave 2
Slave 3

© 2019 Uwe R. Zimmer, The Australian National University
page 531 of 757 (chapter 8: “Distributed Systems” up to page 640)
User data
User data
OSI
TCP/IP
OSI
Application Presentation Session Transport Network Data link Physical
Application
Application Presentation Session Transport Network Data link Physical
Distributed Systems
Network protocols & standards
Transport IP Network
Physical

OSI
TCP/IP
AppleTalk
Application Presentation Session Transport Network Data link Physical
Application
AppleTalk Filing Protocol (AFP)
© 2019 Uwe R. Zimmer, The Australian National University
page 532 of 757 (chapter 8: “Distributed Systems” up to page 640)
Transport IP Network
Routing Table Maintenance Prot.
AT Update Based Routing Protocol
Name Binding Prot.
AT Transaction Protocol
AT Echo Protocol
Physical
IEEE 802.3
LocalTalk
Token Ring IEEE 802.5
FDDI
Distributed Systems
Network protocols & standards
AT Data Stream Protocol
AT Session Protocol
Zone Info Protocol
Printer Access Protocol
EtherTalk Link Access Protocol
LocalTalk Link Access Protocol
TokenTalk Link Access Protocol
FDDITalk Link Access Protocol
Datagram Delivery Protocol (DDP) AppleTalk Address Resolution Protocol (AARP)

OSI
AppleTalk over IP
Application Presentation Session Transport Network Data link Physical
AppleTalk Filing Protocol (AFP)
© 2019 Uwe R. Zimmer, The Australian National University
page 533 of 757 (chapter 8: “Distributed Systems” up to page 640)
AT Data Stream Protocol AT Session Protocol
Zone Info Protocol
Printer Access Protocol
Routing Table Maintenance Prot.
AT Update Based Routing Protocol
Name Binding Protocol
AT Transaction Protocol
AT Echo Protocol
IP Network
Datagram Delivery Protocol (DDP) AppleTalk Address Resolution Protocol (AARP)
Physical
IEEE 802.3
LocalTalk
Token Ring IEEE 802.5
FDDI
Distributed Systems
Network protocols & standards
EtherTalk Link Access Protocol
LocalTalk Link Access Protocol
TokenTalk Link Access Protocol
FDDITalk Link Access Protocol

Local area network (LAN) developed by Xerox in the 70’s
• 10 Mbps specification 1.0 by DEC, Intel, & Xerox in 1980.
• First standard as IEEE 802.3 in 1983 (10 Mbps over thick co-ax cables).
Distributed Systems
Network protocols & standards
Ethernet / IEEE 802.3
• currently 1 Gbps (802.3ab) copper cable ports used in most desktops and laptops.
• currently standards up to 100 Gbps (IEEE 802.3ba 2010).
• more than 85 % of current LAN lines worldwide (according to the International Data Corporation (IDC)).
Carrier Sense Multiple Access with Collision Detection (CSMA/CD)
© 2019 Uwe R. Zimmer, The Australian National University page 534 of 757 (chapter 8: “Distributed Systems” up to page 640)

User data
User data
OSI reference model
IEEE 802.3 reference model
Application Presentation Session Transport Network Data link Physical
Application Presentation Session Transport Network Data link Physical
Application Presentation Session Transport Network Data link Physical
Upper-layer protocols
OSI Network Layers
© 2019 Uwe R. Zimmer, The Australian National University
page 535 of 757 (chapter 8: “Distributed Systems” up to page 640)
Network Data link Physical
MAC-client Media Access (MAC) Physical (PHY)
IEEE 802-specific IEEE 802.3-specific Media-specific
Distributed Systems
Network protocols & standards
Ethernet / IEEE 802.3
OSI relation: PHY, MAC, MAC-client

User data
OSI relation: PHY, MAC, MAC-client User data
Application Presentation Session Transport Network Data link Physical
Application Presentation Session Transport Network Data link Physical
MAC Client
MAC Client
OSI Network Layers
© 2019 Uwe R. Zimmer, The Australian National University
page 536 of 757 (chapter 8: “Distributed Systems” up to page 640)
Network Data link Physical
MII
MII
Distributed Systems
Network protocols & standards
Ethernet / IEEE 802.3
802.3 MAC
802.3 MAC
Physical medium- independent layer
Physical medium- independent layer
Physical medium- dependent layers
Physical medium- dependent layers
MDI
MDI
MII = Medium-independent interface
MDI = Medium-dependent interface – the link connector
PHY
Link
Link media, signal encoding, and transmission rate
Transmission rate

Wireless local area network (WLAN) developed in the 90’s
Distributed Systems
Network protocols & standards
Ethernet / IEEE 802.11
• First standard as IEEE 802.11 in 1997 (1-2 Mbps over 2.4 GHz).
• Typical usage at 54 Mbps over 2.4 GHz carrier at 20 MHz bandwidth.
• Current standards up to 780 Mbps (802.11ac) over 5 GHz carrier at 160 MHz bandwidth.
• Future standards are designed for up to 100 Gbps over 60 GHz carrier.
• Direct relation to IEEE 802.3 and similar OSI layer association.
Carrier Sense Multiple Access with Collision Avoidance (CSMA/CA) Direct-Sequence Spread Spectrum (DSSS)
© 2019 Uwe R. Zimmer, The Australian National University page 537 of 757 (chapter 8: “Distributed Systems” up to page 640)

Distributed Systems
Network protocols & standards
Bluetooth
Wireless local area network (WLAN) developed in the 90’s with different features than 802.11:
• Lower power consumption. • Shorterranges.
• Lower data rates (typically < 1 Mbps). • Ad-hoc networking (no infrastructure required). Combinations of 802.11 and Bluetooth OSI layers are possible to achieve the required features set. © 2019 Uwe R. Zimmer, The Australian National University page 538 of 757 (chapter 8: “Distributed Systems” up to page 640) Distributed Systems Network protocols & standards Token Ring / IEEE 802.5 / Fibre Distributed Data Interface (FDDI) • “Token Ring “ developed by IBM in the 70’s • IEEE 802.5 standard is modelled after the IBM Token Ring architecture (specifications are slightly different, but basically compatible) • IBM Token Ring requests are star topology as well as twisted pair cables, while IEEE 802.5 is unspecified in topology and medium • Fibre Distributed Data Interface combines a token ring architecture with a dual-ring, fibre-optical, physical network. Unlike CSMA/CD, Token ring is deterministic (with respect to its timing behaviour) FDDI is deterministic and failure resistant None of the above is currently used in performance oriented applications. © 2019 Uwe R. Zimmer, The Australian National University page 539 of 757 (chapter 8: “Distributed Systems” up to page 640) • Developed in the late 80’s. • ANSI standard since 1994. • Current standards allow for 16 Gbps per link. • Allows for three different topologies: Distributed Systems Network protocols & standards Fibre Channel Point-to-point: 2 addresses Arbitrated loop (similar to token ring): 127 addresses deterministic, real-time capable Switched fabric: 224 addresses, many topologies and concurrent data links possible • DefinesOSIequivalentlayersuptothesessionlevel. Mostly used in storage arrays, but applicable to super-computers and high integrity systems as well. © 2019 Uwe R. Zimmer, The Australian National University page 540 of 757 (chapter 8: “Distributed Systems” up to page 640) Network protocols & standards FC-4 Protocol mapping Fibre Channel FC-3 Common service FC-2 Network FC-1 Data link FC-0 Physical Mapping of Fibre Channel to OSI layers: User data User data OSI FibreChannel FC/IP TCP/IP OSI Application Presentation Session Transport Network Data link Physical Application Application Application Application Presentation Session Transport Network Data link Physical © 2019 Uwe R. Zimmer, The Australian National University page 541 of 757 (chapter 8: “Distributed Systems” up to page 640) Distributed Systems FC-4 FC-4 FC-3 FC-3 FC-2 FC-2 Transport IP Transport IP Network FC-1 Network FC-0 Physical Physical • Developed in the late 90’s • Defined by the InfiniBand Trade Association (IBTA) since 1999. • Current standards allow for 25 Gbps per link. • Switched fabric topologies. • Concurrent data links possible (commonly up to 12 300 Gbps). • Defines only the data-link layer and parts of the network layer. • Existing devices use copper cables (instead of optical fibres). Distributed Systems Network protocols & standards InfiniBand Mostly used in super-computers and clusters but applicable to storage arrays as well. Cheaper than Ethernet or FibreChannel at high data-rates. Small packets (only up to 4 kB) and no session control. © 2019 Uwe R. Zimmer, The Australian National University page 542 of 757 (chapter 8: “Distributed Systems” up to page 640) Possibly ... Distributed Systems Distributed Systems Distribution! Motivation ... fits an existing physical distribution (e-mail system, devices in a large craft, ...). ... high performance due to potentially high degree of parallel processing. ... high reliability/integrity due to redundancy of hardware and software. ... scalable. ... integration of heterogeneous devices. Different specifications will lead to substantially different distributed designs. © 2019 Uwe R. Zimmer, The Australian National University page 543 of 757 (chapter 8: “Distributed Systems” up to page 640) Distributed Systems Distributed Systems What can be distributed? • State Common operations on distributed data • Function Distributed operations on central data • State & Function Client/server clusters • none of those Pure replication, redundancy © 2019 Uwe R. Zimmer, The Australian National University page 544 of 757 (chapter 8: “Distributed Systems” up to page 640) Distributed Systems Distributed Systems Common design criteria Achieve De-coupling / high degree of local autonomy Cooperation rather than central control Consider Reliability Consider Scalability Consider Performance © 2019 Uwe R. Zimmer, The Australian National University page 545 of 757 (chapter 8: “Distributed Systems” up to page 640) 1. Unpredictable delays (communication) Are we done yet? 2. Missing or imprecise time-base Causal relation or temporal relation? Distributed Systems Distributed Systems Some common phenomena in distributed systems 3. Partial failures Likelihood of individual failures increases Likelihood of complete failure decreases (in case of a good design) © 2019 Uwe R. Zimmer, The Australian National University page 546 of 757 (chapter 8: “Distributed Systems” up to page 640) Two alternative strategies: Distributed Systems Distributed Systems Time in distributed systems Based on a shared time Synchronize clocks! Based on sequence of events Create a virtual time! © 2019 Uwe R. Zimmer, The Australian National University page 547 of 757 (chapter 8: “Distributed Systems” up to page 640) 1 real clock © 2019 Uwe R. Zimmer, The Australian National University d Maximal clock drift d defined as: 1-(1+d)-1 -1 2 1 ^1+dh # C(t )-C(t ) #^1+dh 1 t 'real-time' (typical.20 PPM in computer applications) page 548 of 757 (chapter 8: “Distributed Systems” up to page 640) Distributed Systems Distributed Systems ‘Real-time’ clocks are: • discrete – i.e. time is not dense and there is a minimal granularity • drift affected: C 'measured time' ideal clock t2 -t1 often specified as PPM (Parts-Per-Million) Resetting the clock drift by regular reference time re-synchronization: ref. time ref. time t2 -t1 ‘real-time’ clock is adjusted ref. time forwards & backwards Calendar time C 'measured time' ideal clock Maximal clock drift d defined as: real clock sync. sync. sync. © 2019 Uwe R. Zimmer, The Australian National University page 549 of 757 (chapter 8: “Distributed Systems” up to page 640) t 'real-time' Distributed Systems Distributed Systems Synchronize a ‘real-time’ clock (bi-directional) -1 2 1 ^1+dh # C(t )-C(t ) #^1+dh ref. time ref. time -121 t2 -t1 ref. time ‘real-time’ clock is adjusted forwards only sync. sync. sync. © 2019 Uwe R. Zimmer, The Australian National University t 'real-time' Monotonic time page 550 of 757 (chapter 8: “Distributed Systems” up to page 640) Distributed Systems Distributed Systems Synchronize a ‘real-time’ clock (forward only) Resetting the clock drift by regular reference time re-synchronization: C 'measured time' ideal clock Maximal clock drift d defined as: ^1+dh # C(t )-C(t ) #1 Distributed Systems Distributed Systems Distributed critical regions with synchronized clocks • 6 times: 6 received Requests: Add to local RequestQueue (ordered by time) 6 received Release messages: Delete corresponding Requests in local RequestQueue 1. Create OwnRequest and attach current time-stamp. Add OwnRequest to local RequestQueue (ordered by time). Send OwnRequest to all processes. 2. Delay by 2L (L being the time it takes for a message to reach all network nodes) 3. While Top (RequestQueue) ≠ OwnRequest: delay until new message 4. Enter and leave critical region 5. Send Release-message to all processes. © 2019 Uwe R. Zimmer, The Australian National University page 551 of 757 (chapter 8: “Distributed Systems” up to page 640) • Minimal release delay: L. Distributed Systems Distributed Systems Distributed critical regions with synchronized clocks Analysis • No deadlock, no individual starvation, no livelock. • Minimal request delay: 2L. • Communications requirements per request: 2^N - 1h messages (can be significantly improved by employing broadcast mechanisms). • Clock drifts affect fairness, but not integrity of the critical region. Assumptions: • L is known and constant • No messages are lost violation leads to loss of mutual exclusion. violation leads to loss of mutual exclusion. © 2019 Uwe R. Zimmer, The Australian National University page 552 of 757 (chapter 8: “Distributed Systems” up to page 640) and C (a), C (b) are the (virtual) times associated with a and b a " b iff: Distributed Systems Distributed Systems Virtual (logical) time [Lamport 1978] a " b & C(a) < C(b) with a " b being a causal relation between a and b, • a happens earlier than b in the same sequential control-flow or • a denotes the sending event of message m, while b denotes the receiving event of the same message m or • there is a transitive causal relation between a and b: a " e1 " f " en " b Notion of concurrency: a z b & J^a " bh/J^b " ah © 2019 Uwe R. Zimmer, The Australian National University page 553 of 757 (chapter 8: “Distributed Systems” up to page 640) Implications: Distributed Systems Distributed Systems Virtual (logical) time a " b & C(a) < C(b) C(a) < C(b) & ? C(a) = C(b) & ? C(a) = C(b) < C(c) & ? C(a) < C(b) < C(c) & ? © 2019 Uwe R. Zimmer, The Australian National University page 554 of 757 (chapter 8: “Distributed Systems” up to page 640) Implications: Distributed Systems Distributed Systems Virtual (logical) time a " b & C(a) < C(b) C(a) < C(b) & J(b " a) C(a) = C(b) & a z b C(a) = C(b) < C(c) & ? C(a) < C(b) < C(c) & ? © 2019 Uwe R. Zimmer, The Australian National University page 555 of 757 (chapter 8: “Distributed Systems” up to page 640) Implications: Distributed Systems Distributed Systems Virtual (logical) time a " b & C(a) < C(b) C (a) < C (b) & J (b " a) = (a " b) 0 (a z b) C(a) = C(b) & a z b = J^a " bh/J^b " ah C(a) = C(b) < C(c) & ? C(a) < C(b) < C(c) & ? © 2019 Uwe R. Zimmer, The Australian National University page 556 of 757 (chapter 8: “Distributed Systems” up to page 640) Implications: Distributed Systems Distributed Systems Virtual (logical) time a " b & C(a) < C(b) C (a) < C (b) & J (b " a) = (a " b) 0 (a z b) C(a) = C(b) & a z b = J^a " bh/J^b " ah C(a) = C(b) < C(c) & J^c " ah C(a) < C(b) < C(c) & J^c " ah © 2019 Uwe R. Zimmer, The Australian National University page 557 of 757 (chapter 8: “Distributed Systems” up to page 640) Implications: Distributed Systems Distributed Systems Virtual (logical) time a " b & C(a) < C(b) C (a) < C (b) & J (b " a) = (a " b) 0 (a z b) C(a) = C(b) & a z b = J^a " bh/J^b " ah C (a) = C (b) < C (c) & J^c " ah = (a " c) 0 (a z c) C (a) < C (b) < C (c) & J^c " ah = (a " c) 0 (a z c) © 2019 Uwe R. Zimmer, The Australian National University page 558 of 757 (chapter 8: “Distributed Systems” up to page 640) Time as derived from causal relations: Message 21 27 28 29 P2 20 22 23 24 25 26 27 30 31 33 P1 Distributed Systems Distributed Systems Virtual (logical) time 35 P3 25 26 27 29 30 31 32 33 34 35 36 37 38 39 40 30 31 37 38 0 5 10 15 20 25 30 35 40 45 time Events in concurrent control flows are not ordered. No global order of time. © 2019 Uwe R. Zimmer, The Australian National University page 559 of 757 (chapter 8: “Distributed Systems” up to page 640) 36 34 35 36 37 1. 6Pi: Ci = 0 2. 6Pi: Distributed Systems Distributed Systems Implementing a virtual (logical) time 6 local events: Ci = Ci +1; 6 send events: Ci = Ci +1; Send (message, Ci); 6 receive events: Receive (message, Cm); Ci = max(Ci,Cm) +1; © 2019 Uwe R. Zimmer, The Australian National University page 560 of 757 (chapter 8: “Distributed Systems” up to page 640) Distributed Systems Distributed Systems Distributed critical regions with logical clocks • 6 times: 6 received Requests: Add to local RequestQueue (ordered by time) Reply with Acknowledge or OwnRequest • 6 times: 6 received Release messages: Delete corresponding Requests in local RequestQueue 1. Create OwnRequest and attach current time-stamp. Add OwnRequest to local RequestQueue (ordered by time). Send OwnRequest to all processes. 2. Wait for Top (RequestQueue) = OwnRequest & no outstanding replies 3. Enter and leave critical region 4. Send Release-message to all processes. © 2019 Uwe R. Zimmer, The Australian National University page 561 of 757 (chapter 8: “Distributed Systems” up to page 640) Distributed Systems Distributed Systems Distributed critical regions with logical clocks Analysis • No deadlock, no individual starvation, no livelock. • Minimal request delay: N - 1 requests (1 broadcast) + N - 1 replies. • Minimal release delay: N - 1 release messages (or 1 broadcast). • Communications requirements per request: 3^N - 1h messages (or N - 1 messages + 2 broadcasts). • Clocks are kept recent by the exchanged messages themselves. Assumptions: • No messages are lost violation leads to stall. © 2019 Uwe R. Zimmer, The Australian National University page 562 of 757 (chapter 8: “Distributed Systems” up to page 640) Distributed Systems Distributed Systems Distributed critical regions with a token ring structure 1. Organize all processes in a logical or physical ring topology 2. Send one token message to one process 3. 6 times, 6processes: On receiving the token message: 1. If required the process enters and leaves a critical section (while holding the token). 2. The token is passed along to the next process in the ring. Assumptions: • Token is not lost violation leads to stall. (a lost token can be recovered by a number of means – e.g. the ‘election’ scheme following) © 2019 Uwe R. Zimmer, The Australian National University page 563 of 757 (chapter 8: “Distributed Systems” up to page 640) Distributed Systems Distributed Systems Distributed critical regions with a central coordinator A global, static, central coordinator Invalidates the idea of a distributed system Enables a very simple mutual exclusion scheme Therefore: • A global, central coordinator is employed in some systems ... yet ... • ... if it fails, a system to come up with a new coordinator is provided. © 2019 Uwe R. Zimmer, The Australian National University page 564 of 757 (chapter 8: “Distributed Systems” up to page 640) Distributed Systems Distributed Systems Electing a central coordinator (the Bully algorithm) Any process P which notices that the central coordinator is gone, performs: 1. P sends an Election-message to all processes with higher process numbers. 2. P waits for response messages. If no one responds after a pre-defined amount of time: P declares itself the new coordinator and sends out a Coordinator-message to all. If any process responds, then the election activity for P is over and P waits for a Coordinator-message All processes Pi perform at all times: • If Pi receives a Election-message from a process with a lower process number, it responds to the originating process and starts an election process itself (if not running already). © 2019 Uwe R. Zimmer, The Australian National University page 565 of 757 (chapter 8: “Distributed Systems” up to page 640) Distributed Systems Distributed Systems Distributed states How to read the current state of a distributed system? Message 30 31 35 36 37 38 P2 20 22 23 24 25 26 27 30 31 33 34 35 36 37 P3 25 26 27 29 30 31 32 33 34 35 36 37 38 39 40 P1 21 27 28 29 0 5 10 15 20 25 30 35 40 45 time This “god’s eye view” does in fact not exist. © 2019 Uwe R. Zimmer, The Australian National University page 566 of 757 (chapter 8: “Distributed Systems” up to page 640) Distributed Systems Distributed Systems Distributed states How to read the current state of a distributed system? P0 12 30 31 35 36 37 38 P2 20 22 23 24 25 26 27 30 31 33 34 35 36 37 P3 25 26 27 29 30 31 32 33 34 35 36 37 38 39 40 P1 21 27 28 29 0 5 10 15 20 25 30 35 40 45 time Instead: some entity probes and collects local states. What state of the global system has been accumulated? © 2019 Uwe R. Zimmer, The Australian National University page 567 of 757 (chapter 8: “Distributed Systems” up to page 640) Distributed Systems Distributed Systems Distributed states How to read the current state of a distributed system? P0 12 30 31 35 36 37 38 P2 20 22 23 24 25 26 27 30 31 33 34 35 36 37 P3 25 26 27 29 30 31 32 33 34 35 36 37 38 39 40 P1 21 27 28 29 0 5 10 15 20 25 30 35 40 45 time Instead: some entity probes and collects local states. What state of the global system has been accumulated? Connecting all the states to a global state. © 2019 Uwe R. Zimmer, The Australian National University page 568 of 757 (chapter 8: “Distributed Systems” up to page 640) Distributed Systems Distributed Systems Distributed states A consistent global state (snapshot) is define by a unique division into: • “The Past” P (events before the snapshot): (e2 ! P) / (e1 " e2) & e1 ! P • “The Future” F (events after the snapshot): (e1 ! F) / (e1 " e2) & e2 ! F © 2019 Uwe R. Zimmer, The Australian National University page 569 of 757 (chapter 8: “Distributed Systems” up to page 640) Distributed Systems Distributed Systems Distributed states How to read the current state of a distributed system? P0 12 30 31 35 37 38 P2 20 22 23 24 25 26 27 30 31 33 34 35 36 37 P3 25 26 27 29 30 31 32 33 34 35 36 37 38 39 40 P1 21 27 28 29 0 5 10 15 20 25 30 35 40 45 time Instead: some entity probes and collects local states. What state of the global system has been accumulated? Sorting the events into past and future events. © 2019 Uwe R. Zimmer, The Australian National University page 570 of 757 (chapter 8: “Distributed Systems” up to page 640) Distributed Systems Distributed Systems Distributed states How to read the current state of a distributed system? P0 12 30 31 35 37 38 P2 20 22 23 24 25 26 27 30 31 33 34 35 36 37 P3 25 26 27 29 30 31 32 33 34 35 36 37 38 39 40 P1 21 27 28 29 0 5 10 15 20 25 30 35 40 45 time Instead: some entity probes and collects local states. What state of the global system has been accumulated? Event in the past receives a message from the future! Division not possible Snapshot inconsistent! © 2019 Uwe R. Zimmer, The Australian National University page 571 of 757 (chapter 8: “Distributed Systems” up to page 640) Distributed Systems Distributed Systems Snapshot algorithm • Observer-process P0 (any process) creates a snapshot token ts and saves its local state s0. • P0 sends ts to all other processes. • 6Pi which receive ts (as an individual token-message, or as part of another message): • Save local state si and send si to P0. • Attach ts to all further messages, which are to be sent to other processes. • Save ts and ignore all further incoming ts‘s. • 6Pi which previously received ts and receive a message m without ts: • Forward m to P0 (this message belongs to the snapshot). © 2019 Uwe R. Zimmer, The Australian National University page 572 of 757 (chapter 8: “Distributed Systems” up to page 640) Running the snapshot algorithm: P0 12 21 27 28 29 P2 20 22 23 24 25 26 27 30 31 33 P1 Distributed Systems Distributed Systems Distributed states 35 P3 25 26 27 29 30 31 32 33 34 35 36 37 38 39 40 0 5 10 15 20 25 30 35 40 45 time • Observer-process P0 (any process) creates a snapshot token ts and saves its local state s0. • P0 sends ts to all other processes. 30 31 37 38 © 2019 Uwe R. Zimmer, The Australian National University page 573 of 757 (chapter 8: “Distributed Systems” up to page 640) 36 34 35 36 37 Running the snapshot algorithm: P0 12 21 27 28 29 P2 20 22 23 24 25 26 27 30 31 33 P1 Distributed Systems Distributed Systems Distributed states 35 P3 25 26 27 29 30 31 32 33 34 35 36 37 38 39 40 0 5 10 15 20 25 30 35 40 45 time • 6Pi which receive ts (as an individual token-message, or as part of another message): • Save local state si and send si to P0. • Attach ts to all further messages, which are to be sent to other processes. • Save ts and ignore all further incoming ts‘s. 30 31 37 38 © 2019 Uwe R. Zimmer, The Australian National University page 574 of 757 (chapter 8: “Distributed Systems” up to page 640) 36 34 35 36 37 Running the snapshot algorithm: P0 12 21 27 28 29 P2 20 22 23 24 25 26 27 30 31 33 P1 Distributed Systems Distributed Systems Distributed states 35 P3 25 26 27 29 30 31 32 33 34 35 36 37 38 39 40 0 5 10 15 20 25 30 35 40 45 time • 6Pi which previously received ts and receive a message m without ts: • Forward m to P0 (this message belongs to the snapshot). 30 31 37 38 © 2019 Uwe R. Zimmer, The Australian National University page 575 of 757 (chapter 8: “Distributed Systems” up to page 640) 36 34 35 36 37 Running the snapshot algorithm: P0 12 21 27 28 29 P2 20 22 23 24 25 26 27 30 31 33 P1 Distributed Systems Distributed Systems Distributed states 35 P3 25 26 27 29 30 31 32 33 34 35 36 37 38 39 40 0 5 10 15 20 25 30 35 40 45 time • 6Pi which receive ts (as an individual token-message, or as part of another message): • Save local state si and send si to P0. • Attach ts to all further messages, which are to be sent to other processes. • Save ts and ignore all further incoming ts‘s. 30 31 37 38 © 2019 Uwe R. Zimmer, The Australian National University page 576 of 757 (chapter 8: “Distributed Systems” up to page 640) 36 34 35 36 37 Running the snapshot algorithm: P0 12 • 0 5 10 15 20 25 30 35 40 45 time Save ts and ignore all further incoming ts‘s. Distributed Systems Distributed Systems Distributed states 21 27 28 29 P2 20 22 23 24 25 26 27 30 31 33 P3 25 26 27 29 30 31 32 33 34 35 36 37 38 39 40 P1 30 31 37 38 © 2019 Uwe R. Zimmer, The Australian National University page 577 of 757 (chapter 8: “Distributed Systems” up to page 640) 35 36 34 35 36 37 Running the snapshot algorithm: P0 12 • 0 5 10 15 20 25 30 35 40 45 time Finalize snapshot 21 27 28 29 30 31 37 38 Distributed Systems Distributed Systems Distributed states P1 P2 20 22 23 24 25 26 27 30 31 33 P3 25 26 27 29 30 31 32 33 34 35 36 37 38 39 40 © 2019 Uwe R. Zimmer, The Australian National University page 578 of 757 (chapter 8: “Distributed Systems” up to page 640) 35 36 34 35 36 37 Running the snapshot algorithm: P0 12 21 27 28 29 P2 20 22 23 24 25 26 27 30 31 33 P1 Distributed Systems Distributed Systems Distributed states 35 P3 25 26 27 29 30 31 32 33 34 35 36 37 38 39 40 30 31 37 38 0 5 10 15 20 25 30 35 40 45 time Sorting the events into past and future events. Past and future events uniquely separated Consistent state © 2019 Uwe R. Zimmer, The Australian National University page 579 of 757 (chapter 8: “Distributed Systems” up to page 640) 36 34 35 36 37 Distributed Systems Distributed Systems Snapshot algorithm Termination condition? Either • Make assumptions about the communication delays in the system. or • Count the sent and received messages for each process (include this in the lo- cal state) and keep track of outstanding messages in the observer process. © 2019 Uwe R. Zimmer, The Australian National University page 580 of 757 (chapter 8: “Distributed Systems” up to page 640) • Find deadlocks. • Find termination / completion conditions. • ... any other global safety of liveness property. Distributed Systems Distributed Systems Consistent distributed states Why would we need that? • Collect a consistent system state for system backup/restore. • Collect a consistent system state for further pro- cessing (e.g. distributed databases). •... © 2019 Uwe R. Zimmer, The Australian National University page 581 of 757 (chapter 8: “Distributed Systems” up to page 640) Client Server Distributed Systems Distributed Systems A distributed server (load balancing) © 2019 Uwe R. Zimmer, The Australian National University page 582 of 757 (chapter 8: “Distributed Systems” up to page 640) Client Server Ring of servers Server © 2019 Uwe R. Zimmer, The Australian National University page 583 of 757 (chapter 8: “Distributed Systems” up to page 640) Distributed Systems Distributed Systems A distributed server (load balancing) Server Server Server Server Server Server Server Server Client Server Send_To_Group (Job) Server Server Server Server © 2019 Uwe R. Zimmer, The Australian National University page 584 of 757 (chapter 8: “Distributed Systems” up to page 640) Distributed Systems Distributed Systems A distributed server (load balancing) Server Server Server Server Server Client Server Contention messages © 2019 Uwe R. Zimmer, The Australian National University page 585 of 757 (chapter 8: “Distributed Systems” up to page 640) Distributed Systems Distributed Systems A distributed server (load balancing) Server Server Server Server Server Server Server Server Server Client Server © 2019 Uwe R. Zimmer, The Australian National University page 586 of 757 (chapter 8: “Distributed Systems” up to page 640) Distributed Systems Distributed Systems A distributed server (load balancing) Job_Completed (Results) Server Server Server Server Server Server Server Server Server with Ada.Task_Identification; use Ada.Task_Identification; task type Print_Server is Distributed Systems Distributed Systems A distributed server (load balancing) entry Send_To_Server (Print_Job : in Job_Type; Job_Done : out Boolean); entry Contention (Print_Job : in Job_Type; Server_Id : in Task_Id); end Print_Server; © 2019 Uwe R. Zimmer, The Australian National University page 587 of 757 (chapter 8: “Distributed Systems” up to page 640) task body Print_Server is begin loop select Distributed Systems Distributed Systems A distributed server (load balancing) accept Send_To_Server (Print_Job : in Job_Type; Job_Done : out Boolean) do if not Print_Job in Turned_Down_Jobs then if Not_Too_Busy then Applied_For_Jobs := Applied_For_Jobs + Print_Job; Next_Server_On_Ring.Contention (Print_Job, Current_Task); requeue Internal_Print_Server.Print_Job_Queue; else Turned_Down_Jobs := Turned_Down_Jobs + Print_Job; end if; end if; end Send_To_Server; © 2019 Uwe R. Zimmer, The Australian National University page 588 of 757 (chapter 8: “Distributed Systems” up to page 640) (...) else end Contention; or terminate; end select; end loop; end Print_Server; © 2019 Uwe R. Zimmer, The Australian National University page 589 of 757 (chapter 8: “Distributed Systems” up to page 640) Distributed Systems or accept Contention (Print_Job : in Job_Type; Server_Id : in Task_Id) do if Print_Job in AppliedForJobs then if Server_Id = Current_Task then Internal_Print_Server.Start_Print (Print_Job); elsif Server_Id > Current_Task then
Internal_Print_Server.Cancel_Print (Print_Job);
Next_Server_On_Ring.Contention (Print_Job; Server_Id); else
null; — removing the contention message from ring end if;
Turned_Down_Jobs := Turned_Down_Jobs + Print_Job;
Next_Server_On_Ring.Contention (Print_Job; Server_Id); end if;

Distributed Systems
Distributed Systems
Transactions
Concurrency and distribution in systems with multiple, interdependent interactions?
Concurrent and distributed client/server interactions
beyond single remote procedure calls?
© 2019 Uwe R. Zimmer, The Australian National University page 590 of 757 (chapter 8: “Distributed Systems” up to page 640)

Definition (ACID properties):
Distributed Systems
Distributed Systems
Transactions
• Atomicity: All or none of the sub-operations are performed.
Atomicity helps achieve crash resilience. If a crash occurs, then it is possible to roll back the system to the state before the transaction was invoked.
• Consistency: Transforms the system from one consistent state to another consistent state.
• Isolation: Results (including partial results) are not revealed unless and until the transaction commits. If the operation accesses a shared data object, invocation does not interfere with other operations on the same object.
• Durability: After a commit, results are guaranteed to persist, even after a subsequent system failure.
© 2019 Uwe R. Zimmer, The Australian National University page 591 of 757 (chapter 8: “Distributed Systems” up to page 640)

Shadow copies?
How to ensure consistency in a distributed system?
Actual isolation or the appearance of isolation?
Actual isolation and efficient concurrency?
Atomic operations spanning multiple processes?
What hardware do we need to assume?
Definition (ACID properties):
Distributed Systems
Distributed Systems
Transactions
• Atomicity: All or none of the sub-operations are performed.
Atomicity helps achieve crash resilience. If a crash occurs, then it is possible to roll back the system to the state before the transaction was invoked.
• Consistency: Transforms the system from one consistent state to another consistent state.
• Isolation: Results (including partial results) are not revealed unless and until the transaction commits. If the operation accesses a shared data object, invocation does not interfere with other operations on the same object.
• Durability: After a commit, results are guaranteed to persist, even after a subsequent system failure.
© 2019 Uwe R. Zimmer, The Australian National University page 592 of 757 (chapter 8: “Distributed Systems” up to page 640)

A closer look inside transactions:
• Transactions consist of a sequence of operations.
Distributed Systems
Distributed Systems
Transactions
• If two operations out of two transactions can be performed in any order with the same final effect, they are commutative and not critical for our purposes.
• Idempotent and side-effect free operations are by definition commutative.
• All non-commutative operations are considered critical operations.
• Two critical operations as part of two different transactions while affecting the same object are called a conflicting pair of operations.
© 2019 Uwe R. Zimmer, The Australian National University page 593 of 757 (chapter 8: “Distributed Systems” up to page 640)

A closer look at multiple transactions:
• Any sequential execution of multiple transactions
will fulfil the ACID-properties, by definition of a single transaction. • A concurrent execution (or ‘interleavings’) of multiple transactions
Distributed Systems
Distributed Systems
Transactions
might fulfil the ACID-properties.
If a specific concurrent execution can be shown to be equivalent to a specific sequential
execution of the involved transactions then this specific interleaving is called ‘serializable’. If a concurrent execution (‘interleaving’) ensures that no transaction ever encounters
an inconsistent state then it is said to ensure the appearance of isolation.
© 2019 Uwe R. Zimmer, The Australian National University page 594 of 757 (chapter 8: “Distributed Systems” up to page 640)

Distributed Systems
Distributed Systems
Achieving serializability
For the serializability of two transactions it is necessary and sufficient for the order of their invocations
of all conflicting pairs of operations to be the same
for all the objects which are invoked by both transactions.
(Determining order in distributed systems requires logical clocks.)
© 2019 Uwe R. Zimmer, The Australian National University page 595 of 757 (chapter 8: “Distributed Systems” up to page 640)

P1 P2
Write (A)
Write (B)
P3
Read (A) Write (C)
Distributed Systems
Distributed Systems
Serializability
Write (B)
0 5 10 15 20 25 30 35 40 45 time
• Two conflicting pairs of operations with the same order of execution.
© 2019 Uwe R. Zimmer, The Australian National University page 596 of 757 (chapter 8: “Distributed Systems” up to page 640)
Order

P1 P2
Write (A)
Write (B) Write (C)
P3
Distributed Systems
Distributed Systems
Serializability
P1 P2
Write (B)
0 5 10 15 20 25 30 35 40 45 time
Serializable
Read (A)
© 2019 Uwe R. Zimmer, The Australian National University page 597 of 757 (chapter 8: “Distributed Systems” up to page 640)

P1 P2
Write (A)
Write (B)
P3
Read (A) Write (C)
Write (B)
Distributed Systems
Distributed Systems
Serializability
P1 P2
Order
0 5 10 15 20 25 30 35 40 45 time
• Two conflicting pairs of operations with different orders of executions. Not serializable.
© 2019 Uwe R. Zimmer, The Australian National University page 598 of 757 (chapter 8: “Distributed Systems” up to page 640)

P1 P2
Write (A)
Read (C)
Write (B)
P3
Read (A) Write (C)
Distributed Systems
Distributed Systems
Serializability
Write (B)
0 5 10 15 20 25 30 35 40 45 time
• Three conflicting pairs of operations with the same order of execution (pair-wise between processes).
• The order between processes also leads to a global order of processes.
© 2019 Uwe R. Zimmer, The Australian National University page 599 of 757 (chapter 8: “Distributed Systems” up to page 640)
Order

P1 P2
Write (A)
Read (C) Write (B)
P3
Write (C)
Distributed Systems
Distributed Systems
Serializability
P3 P1 P2
Order
Read (A)
0 5 10 15 20 25 30 35 40 45 time
• Three conflicting pairs of operations with the same order of execution (pair-wise between processes).
• The order between processes also leads to a global order of processes.
Serializable
© 2019 Uwe R. Zimmer, The Australian National University page 600 of 757 (chapter 8: “Distributed Systems” up to page 640)
Write (B)

P1 P2
Write (A)
Read (C)
Write (B)
P3
Read (A) Write (C)
Distributed Systems
Distributed Systems
Serializability
Write (B)
0 5 10 15 20 25 30 35 40 45 time
• Three conflicting pairs of operations with the same order of execution (pair-wise between processes).
• The order between processes also leads to a global order of processes.
Serializable
© 2019 Uwe R. Zimmer, The Australian National University page 601 of 757 (chapter 8: “Distributed Systems” up to page 640)
Order

P1 P2
Write (A) Read (C)
Read (C)
Write (B)
P3
Read (A) Write (C)
Distributed Systems
Distributed Systems
Serializability
P1 P2 P3
Order
Write (B)
0 5 10 15 20 25 30 35 40 45 time
• Three conflicting pairs of operations with the same order of execution (pair-wise between processes).
• The order between processes does no longer lead to a global order of processes.
Not serializable
© 2019 Uwe R. Zimmer, The Australian National University page 602 of 757 (chapter 8: “Distributed Systems” up to page 640)

Distributed Systems
Distributed Systems
Achieving serializability
For the serializability of two transactions it is necessary and sufficient for the order of their invocations
of all conflicting pairs of operations to be the same
for all the objects which are invoked by both transactions.
• Define: Serialization graph: A directed graph;
Vertices i represent transactions Ti;
Edges Ti ” Tj represent an established global order dependency
between all conflicting pairs of operations of those two transactions.
For the serializability of multiple transactions it is necessary and sufficient
that the serialization graph is acyclic.
© 2019 Uwe R. Zimmer, The Australian National University page 603 of 757 (chapter 8: “Distributed Systems” up to page 640)

P1 P2
Write (A)
Read (C) Write (B)
P3
Write (C)
Distributed Systems
Distributed Systems
Serializability
P3 P1 P2
Order
Read (A)
0 5 10 15 20 25 30 35 40 45 time
• Three conflicting pairs of operations with the same order of execution (pair-wise between processes).
Serialization graph is acyclic.
Serializable
© 2019 Uwe R. Zimmer, The Australian National University page 604 of 757 (chapter 8: “Distributed Systems” up to page 640)
Write (B)

P1 P2
Write (A) Read (C)
Read (C)
Write (B)
P3
Read (A) Write (C)
Distributed Systems
Distributed Systems
Serializability
P1 P2 P3
Order
Write (B)
0 5 10 15 20 25 30 35 40 45 time
• Three conflicting pairs of operations with the same order of execution (pair-wise between processes).
Serialization graph is cyclic.
Not serializable
© 2019 Uwe R. Zimmer, The Australian National University page 605 of 757 (chapter 8: “Distributed Systems” up to page 640)

Three major designs:
Distributed Systems
Distributed Systems
Transaction schedulers
• Locking methods:
Impose strict mutual exclusion on all critical sections.
• Time-stamp ordering:
Note relative starting times and keep order dependencies consistent.
• “Optimistic” methods:
Go ahead until a conflict is observed – then roll back.
© 2019 Uwe R. Zimmer, The Australian National University page 606 of 757 (chapter 8: “Distributed Systems” up to page 640)

Distributed Systems
Distributed Systems
Transaction schedulers – Locking methods
Locking methods include the possibility of deadlocks careful from here on out …
• Complete resource allocation before the start and release at the end of every transaction: This will impose a strict sequential execution of all critical transactions.
• (Strict) two-phase locking:
Each transaction follows the following two phase pattern during its operation:
• Growing phase: locks can be acquired, but not released.
• Shrinking phase: locks can be released anytime, but not acquired (two phase locking)
or locks are released on commit only (strict two phase locking).
Possible deadlocks
Serializable interleavings
Strict isolation (in case of strict two-phase locking)
• Semantic locking: Allow for separate read-only and write-locks
Higher level of concurrency (see also: use of functions in protected objects)
© 2019 Uwe R. Zimmer, The Australian National University page 607 of 757 (chapter 8: “Distributed Systems” up to page 640)

Distributed Systems
Distributed Systems
Transaction schedulers – Time stamp ordering
Add a unique time-stamp (any global order criterion) on every transaction upon start. Each involved object can inspect the time-stamps of all requesting transactions.
• Case 1: A transaction with a time-stamp later than all currently active transactions applies: the request is accepted and the transaction can go ahead.
• Alternative case 1 (strict time-stamp ordering):
the request is delayed until the currently active earlier transaction has committed.
• Case 2: A transaction with a time-stamp earlier than all currently active transactions applies: the request is not accepted and the applying transaction is to be aborted.
Collision detection rather than collision avoidance No isolation Cascading aborts possible.
Simple implementation, high degree of concurrency
– also in a distributed environment, as long as a global event order (time) can be supplied.
© 2019 Uwe R. Zimmer, The Australian National University page 608 of 757 (chapter 8: “Distributed Systems” up to page 640)

Distributed Systems
Distributed Systems
Transaction schedulers – Optimistic control
Three sequential phases:
1. Read & execute:
Create a shadow copy of all involved objects and
perform all required operations on the shadow copy and locally (i.e. in isolation).
2. Validate:
After local commit, check all occurred interleavings for serializability.
3. Update or abort:
3a. If serializability could be ensured in step 2 then all results of involved transactions
are written to all involved objects – in dependency order of the transactions. 3b. Otherwise: destroy shadow copies and start over with the failed transactions.
© 2019 Uwe R. Zimmer, The Australian National University page 609 of 757 (chapter 8: “Distributed Systems” up to page 640)

Aborts happen after everything has been committed locally.
Full isolation and maximal concurrency!
How to create a consistent copy?
How to update all objects consistently?
Distributed Systems
Distributed Systems
Transaction schedulers – Optimistic control
Three sequential phases:
1. Read & execute:
Create a shadow copy of all involved objects and
perform all required operations on the shadow copy and locally (i.e. in isolation).
2. Validate:
After local commit, check all occurred interleavings for serializability.
3. Update or abort:
3a. If serializability could be ensured in step 2 then all results of involved transactions
are written to all involved objects – in dependency order of the transactions. 3b. Otherwise: destroy shadow copies and start over with the failed transactions.
© 2019 Uwe R. Zimmer, The Australian National University page 610 of 757 (chapter 8: “Distributed Systems” up to page 640)

Three major designs:
Distributed Systems
Distributed Systems
Distributed transaction schedulers
• Locking methods: no aborts
Impose strict mutual exclusion on all critical sections.
• Time-stamp ordering: potential aborts along the way
Note relative starting times and keep order dependencies consistent.
• “Optimistic” methods: aborts or commits at the very end Go ahead until a conflict is observed – then roll back.
How to implement “commit” and “abort” operations in a distributed environment?
© 2019 Uwe R. Zimmer, The Australian National University page 611 of 757 (chapter 8: “Distributed Systems” up to page 640)

Client
Server
Data Ring of servers
© 2019 Uwe R. Zimmer, The Australian National University
Server
page 612 of 757 (chapter 8: “Distributed Systems” up to page 640)
Distributed Systems
Distributed Systems
Two phase commit protocol
Start up (initialization) phase
Server Server
Server
Server
Server
Server
Server Server

Client Distributed
Server
Transaction
Server
Server
Server Server
© 2019 Uwe R. Zimmer, The Australian National University
Server
page 613 of 757 (chapter 8: “Distributed Systems” up to page 640)
Distributed Systems
Distributed Systems
Two phase commit protocol
Start up (initialization) phase
Server Server
Server
Server

Client
Server
Determine coordinator
© 2019 Uwe R. Zimmer, The Australian National University
Server
page 614 of 757 (chapter 8: “Distributed Systems” up to page 640)
Distributed Systems
Distributed Systems
Two phase commit protocol
Start up (initialization) phase
Server Server
Server
Server
Server
Server
Server Server

Client
Server
Determine coordinator
© 2019 Uwe R. Zimmer, The Australian National University
Server
page 615 of 757 (chapter 8: “Distributed Systems” up to page 640)
Distributed Systems
Distributed Systems
Two phase commit protocol
Start up (initialization) phase
CSeorovredr. Server
Server
Server
Server
Server
Server Server

Client
Server
Setup & Start operations
© 2019 Uwe R. Zimmer, The Australian National University
Server
page 616 of 757 (chapter 8: “Distributed Systems” up to page 640)
Distributed Systems
Distributed Systems
Two phase commit protocol
Start up (initialization) phase
CSeorovredr. Server
Server
Server
Server
Server
Server Server

Client
Server
Setup & Start operations
© 2019 Uwe R. Zimmer, The Australian National University
Server
page 617 of 757 (chapter 8: “Distributed Systems” up to page 640)
Distributed Systems
Distributed Systems
Two phase commit protocol
Start up (initialization) phase
CSeorovredr. Server
Server Shadow
Server
Server
Server Server
Server copy

Client
Server
Coordinator requests and assembles votes: “Commit” or “Abort”
© 2019 Uwe R. Zimmer, The Australian National University
Server
page 618 of 757 (chapter 8: “Distributed Systems” up to page 640)
Distributed Systems
Distributed Systems
Two phase commit protocol
Phase 1: Determine result state
CSeorovredr. Server
Server
Server
Server
Server
Server Server

Client
Server
Coordinator instructs everybody to “Commit”
© 2019 Uwe R. Zimmer, The Australian National University
Server
page 619 of 757 (chapter 8: “Distributed Systems” up to page 640)
Distributed Systems
Distributed Systems
Two phase commit protocol
Phase 2: Implement results
CSeorovredr. Server
Server
Server
Server
Server
Server Server

Client
Server
Everybody commits
© 2019 Uwe R. Zimmer, The Australian National University
Server
page 620 of 757 (chapter 8: “Distributed Systems” up to page 640)
Distributed Systems
Distributed Systems
Two phase commit protocol
Phase 2: Implement results
CSeorovredr. Server
Server
Server
Server
Server
Server Server

Client
Server
Everybody destroys shadows
© 2019 Uwe R. Zimmer, The Australian National University
Server
page 621 of 757 (chapter 8: “Distributed Systems” up to page 640)
Distributed Systems
Distributed Systems
Two phase commit protocol
Phase 2: Implement results
CSeorovredr. Server
Server
Server
Server
Server
Server Server

Client
Server
Everybody reports “Committed”
© 2019 Uwe R. Zimmer, The Australian National University
Server
page 622 of 757 (chapter 8: “Distributed Systems” up to page 640)
Distributed Systems
Distributed Systems
Two phase commit protocol
Phase 2: Implement results
CSeorovredr. Server
Server
Server
Server
Server
Server Server

Client
Server
Coordinator instructs everybody to “Abort”
© 2019 Uwe R. Zimmer, The Australian National University
Server
page 623 of 757 (chapter 8: “Distributed Systems” up to page 640)
Distributed Systems
Distributed Systems
Two phase commit protocol
or Phase 2: Global roll back
CSeorovredr. Server
Server
Server
Server
Server
Server Server

Client
Server
Everybody destroys shadows
© 2019 Uwe R. Zimmer, The Australian National University
Server
page 624 of 757 (chapter 8: “Distributed Systems” up to page 640)
Distributed Systems
Distributed Systems
Two phase commit protocol
or Phase 2: Global roll back
CSeorovredr. Server
Server
Server
Server
Server
Server Server

Coordinator reports to client: “Committed” or “Aborted”
CSeorovredr. Server
Server
Server
Client
Server
© 2019 Uwe R. Zimmer, The Australian National University
Server
page 625 of 757 (chapter 8: “Distributed Systems” up to page 640)
Distributed Systems
Distributed Systems
Two phase commit protocol
Phase 2: Report result of distributed transaction
Server
Server
Server Server

Distributed Systems
Distributed Systems
Distributed transaction schedulers
Evaluating the three major design methods in a distributed environment:
• Locking methods: No aborts.
Large overheads; Deadlock detection/prevention required.
• Time-stamp ordering: Potential aborts along the way. Recommends itself for distributed applications, since decisions are taken locally and communication overhead is relatively small.
• “Optimistic” methods: Aborts or commits at the very end. Maximizes concurrency, but also data replication.
Side-aspect “data replication”: large body of literature on this topic
(see: distributed data-bases / operating systems / shared memory / cache management, …)
© 2019 Uwe R. Zimmer, The Australian National University page 626 of 757 (chapter 8: “Distributed Systems” up to page 640)

Premise:
Distributed Systems
Distributed Systems
Redundancy (replicated servers)
A crashing server computer should not compromise the functionality of the system (full fault tolerance)
Assumptions & Means:
• k computers inside the server cluster might crash without losing functionality.
Replication: at least k + 1 servers.
• The server cluster can reorganize any time (and specifically after the loss of a computer).
Hot stand-by components, dynamic server group management.
• The server is described fully by the current state and the sequence of messages received.
State machines: we have to implement consistent state adjustments (re-organization) and consistent message passing (order needs to be preserved).
© 2019 Uwe R. Zimmer, The Australian National University page 627 of 757 (chapter 8: “Distributed Systems” up to page 640)
[Schneider1990]

© 2019 Uwe R. Zimmer, The Australian National University
page 628 of 757 (chapter 8: “Distributed Systems” up to page 640)
Received
Deliverable
Job message received locally
Distributed Systems
Distributed Systems
Redundancy (replicated servers)
Stages of each server:
Job message received by all active servers
Processed
Job processed locally

Client
Server
Ring of identical servers
© 2019 Uwe R. Zimmer, The Australian National University
page 629 of 757 (chapter 8: “Distributed Systems” up to page 640)
Distributed Systems
Distributed Systems
Redundancy (replicated servers)
Start-up (initialization) phase Server
Server
Server
Server
Server
Server
Server Server
Server

Client
Server
Determine coordinator
© 2019 Uwe R. Zimmer, The Australian National University
page 630 of 757 (chapter 8: “Distributed Systems” up to page 640)
Distributed Systems
Distributed Systems
Redundancy (replicated servers)
Start-up (initialization) phase Server
Server
Server
Server
Server
Server
Server Server
Server

Client
Server
Coordinator determined
© 2019 Uwe R. Zimmer, The Australian National University
page 631 of 757 (chapter 8: “Distributed Systems” up to page 640)
Distributed Systems
Distributed Systems
Redundancy (replicated servers)
Start-up (initialization) phase CSeorovredr.
Server
Server
Server
Server
Server
Server Server
Server

Client
Server
© 2019 Uwe R. Zimmer, The Australian National University
page 632 of 757 (chapter 8: “Distributed Systems” up to page 640)
Send Job
Distributed Systems
Distributed Systems
Redundancy (replicated servers)
Coordinator receives job message
CSeorovredr. Server
Server
Server
Server
Server
Server Server
Server

Client
Server
Coordinator sends job both ways
© 2019 Uwe R. Zimmer, The Australian National University
page 633 of 757 (chapter 8: “Distributed Systems” up to page 640)
Distributed Systems
Distributed Systems
Redundancy (replicated servers)
Distribute job CSeorovredr.
Server
Server
Server
Server
Server
Server Server
Server

Client
Server
Everybody received job (but nobody knows that)
© 2019 Uwe R. Zimmer, The Australian National University
page 634 of 757 (chapter 8: “Distributed Systems” up to page 640)
Distributed Systems
Distributed Systems
Redundancy (replicated servers)
Distribute job CSeorovredr.
Server
Server
Server
Server
Server
Server Server
Server

Client
Server
First server detects two job-messages ☞ processes job
© 2019 Uwe R. Zimmer, The Australian National University
page 635 of 757 (chapter 8: “Distributed Systems” up to page 640)
Distributed Systems
Distributed Systems
Redundancy (replicated servers)
Processing starts CSeorovredr.
Server
Server
Server
Server
Server
Server Server
Server

Client
Server
All server detect two job-messages ☞ everybody processes job
© 2019 Uwe R. Zimmer, The Australian National University
page 636 of 757 (chapter 8: “Distributed Systems” up to page 640)
Distributed Systems
Distributed Systems
Redundancy (replicated servers)
Everybody (besides coordinator) processes
CSeorovredr. Server
Server
Server
Server
Server
Server Server
Server

Client
Server
Coordinator also received two messages and processes job
© 2019 Uwe R. Zimmer, The Australian National University
page 637 of 757 (chapter 8: “Distributed Systems” up to page 640)
Distributed Systems
Distributed Systems
Redundancy (replicated servers)
Coordinator processes CSeorovredr.
Server
Server
Server
Server
Server
Server Server
Server

Client
Server
Coordinator delivers his local result
Server
© 2019 Uwe R. Zimmer, The Australian National University
page 638 of 757 (chapter 8: “Distributed Systems” up to page 640)
Distributed Systems
Distributed Systems
Redundancy (replicated servers)
Result delivery CSeorovredr.
Server
Server
Server
Server
Server Server
Server

Distributed Systems
Distributed Systems
Redundancy (replicated servers)
Event: Server crash, new servers joining, or current servers leaving.
Server re-configuration is triggered by a message to all (this is assumed to be supported by the distributed operating system).
Each server on reception of a re-configuration message:
1. Wait for local job to complete or time-out.
2. Store local consistent state Si.
3. Re-organize server ring, send local state around the ring.
4. If a state Sj with j > i is received then Si % Sj
5. Elect coordinator
6. Enter‘Coordinator-’or‘Replicate-mode’
© 2019 Uwe R. Zimmer, The Australian National University
page 639 of 757 (chapter 8: “Distributed Systems” up to page 640)

• Networks
• OSI,topologies
• Practical network standards
• Time
• Synchronized clocks, virtual (logical) times
• Distributed critical regions (synchronized, logical, token ring)
• Distributed systems • Elections
Distributed Systems
Summary
Distributed Systems
• Distributed states, consistent snapshots
• Distributed servers (replicates, distributed processing, distributed commits)
• Transactions (ACID properties, serializable interleavings, transaction schedulers)
© 2019 Uwe R. Zimmer, The Australian National University page 640 of 757 (chapter 8: “Distributed Systems” up to page 640)