Open Menu
Home
SQL
SQL Home
Learn SQL Server
SQL Training
Database Administration
TSQL Programming
Performance
Backup and Recovery
SQL Tools
Editors Corner
SSIS
Reporting Services
.NET
.NET Home
ASP.Net
Windows Forms
.Net Framework
Performance
Visual Studio
.Net Tools
Editors Corner
Cloud
Cloud Home
Cloud Data
Platform as a Service
Development
Infrastructure as a Service
Security and Compliance
Data Science
Software as a Service
SysAdmin
SysAdmin Home
Editors Corner
Exchange
General
Virtualization
Unified Messaging
Powershell
Opinion
Books
Books Home
.Net Books
SQL Books
SysAdmin Books
Book Reviews
Blogs
Forums
A technical journal and community hub from Redgate
Join SimpleTalk Sign in
Home
SQL
.NET
Cloud
SysAdmin
Opinion
Books
Blogs
Forums
Phil Factor
SimpleTalk columnist
The Eight Fallacies of Distributed Computing
Published 30 December 2014 7:01 pm
There are eight fallacies about distributed computing, common misconceptions that were first identified at Sun Microsystems in the 1990s, but wellknown even before then: With the passage of time, awareness of these fallacies may have faded amongst IT people, so Id like to remind you of them. They are:
1. The network is reliable.
2. Latency is zero.
3. Bandwidth is infinite.
4. The network is secure.
5. Topology doesnt change.
6. There is one administrator.
7. Transport cost is zero.
8. The network is homogeneous.
I have personal experience of major projects where these strange assumptions were made by senior management and the result was expensive disaster. Many application designers have a touching faith in the network they use, and reality tends to come as a shock.
One might think that, because networks are now faster and cheaper than they were in those faroff days where we first caught on to the idea of distributed systems, these assumptions are no longer fallacies, since technology has turned them into reality. However, despite the marketing spin, the reality of networks continue to trip up the unwary who design distributed systems on the basis of the eight fallacies..
Distributed systems work well within special networks designed and specified for the purpose, and subject to rigorous controls and monitoring, but the chances are that the production environment for your distributed system wont cut the mustard. Why not? Read on.
The network is reliable
When within a domain, networks probably seem rocksolid. After all, how often does a network component fail nowadays? Even if a single component should fail, there is plenty of redundancy, surely? Well, as networks get more complex, so network administration can become more prone to error, mostly mistakes in configuration. In some cases, up to a third of network changes lead to errors that affect the reliability of the network. Both software and hardware can fail, especially routers, which account for around a quarter of all failures. Uninterruptible power supplies can get interrupted, people can make illadvised device configuration changes, and there can be network congestion, denial of service DoS attacks and failed software and firmware upgrades or patches. Networks are subject to disaster both natural and unnatural, and it takes skill to design a network that is resilient to this sort of thing. The widearea links are outside your control and can easily go wrong.
The incidents on Azure alone, in recent months, make painful reading, and this rate of failure is typical of the major cloudservices providers. For mobile applications, all sorts of things can go wrong: network requests will fail at unpredictable intervals, the destination will be unavailable, data will reach the destination but it fails to send back confirmation, data will be corrupted in transmission or arrive incomplete. Mobile applications must be resilient at the scary end of the reliability spectrum of networks, but all distributed applications must cope with all these eventualities, and network nodes must be able to cope with server failures.
Latency is zero
Latency is not the same as bandwidth. Latency is the time spent waiting for a response. The cause? As well as the obvious processing delay, there will be network latency, consisting of propagation delay, node delay and congestion delay. Propagation delay increases with distance: it is around 30 ms. between Europe and the US. The number of nodes in the path determines the node delay.
Often, developers build distributed systems within an intranet that has insignificant latency and so there is almost no penalty to making frequent finegrained network calls. This design mistake only becomes apparent when put into a live system.
One of the disconcerting effects of high latency is that it isnt constant. On a poor network, it can be occasionally counted in seconds. By their nature, there is no guarantee of the order that individual packets will be serviced by a network, or even that the requesting process still exists. Latency just makes things worse. Moreover, Where applications compensate by sending several simultaneous requests, a temporarilyhigh latency can be exascerbated by the response to it.
Bandwidth is infinite
Whereas most modern cables can handle almost infinite bandwidth, weve yet to work out how to build the interconnection devices hubs, switches, routers etc that are fast enough to guarantee high bandwidth to all connected users. The typical corporate intranet will still have areas where bandwidth is throttled.
As fast as bandwidth increases over public networks, so too does the usage of the network for services using video and audio, which once employed broadcast technologies. New uses, such as socialmedia, tend to soak up the increasing bandwidth. Also, there are the constraints of the last mile in many locations outside the major cities, and the increasing likelihood of packet loss.
In general, we need to be careful in assuming that high bandwidth is a universal experience. However impressive the network bandwidth, it is not going to get close to the speed at which cohosted processes can communicate.
The Network is secure
It is strange to still come across networkbased systems that have fundamental security weaknesses. Network attacks have increased year on year, and have moved way beyond their original roots in curiosity, malice and crime to be adopted as part of international conflict and political action. Network attacks are part of life in IT: boring to developers, but essential to prevent. Part of the problem is that networkintrusion detection tends to be low priority and so we just arent always aware of successful network breaches.
Traditionally, breaches were generally the consequence of poorlyconfigured firewalls. Most firewalls are routinely probed for weaknesses, as youll immediately find if you foolishly disable one. However, this is just one of a number of ways of breaching a network and a firewall only part of the defense in depth. WiFi is often a weakness, bring your own device BYOD can allow intrusion via a compromised device, as can virtualization and softwaredefined networking SDN. The increasing DevOps demand for rapidlychanging infrastructure has made it harder to keep essential controls in step. Botnets within enterprise networks are a continuing problem, as are intrusions via business partners.
You need to assume that the network is hostile, and that security must be in depth. This means building security into the basic design of distributed applications and their hosts.
With defense in depth, any part of a distributed system will need to have secure ways of accessing other networked resources.
Security brings its own complications. This will come from the administrative overhead of maintaining the different user accounts, privileges, certificates, accounts and so on. One major Cloud network outage was caused by a permission expiring before it could be renewed.
Topology doesnt change
Network topologies change constantly and with remarkable speed. This is inevitable due to the increasing pressure for network agility, in order to keep it in step with rapidly changing business requirements.
Wherever you deploy an application, you must assume that much of the network topology is likely to be out of your control. Network admins will make changes at a time and for a reason that may not be in your interests. They will move servers and change the networking topology to gain performance or security, and make routing changes as a consequence of server and network faults.
It is therefore a mistake to rely on the permanence of specific endpoints or routes. The physical structure of the network must always be abstracted out of any distributed design.
There is one administrator
Unless the system exists entirely within a small LAN, there will be different administrators associated with the various components of the network. They will have different degrees of expertise, and different responsibilities and priorities.
This will matter if something goes wrong that causes your service to fail. Your servicelevel agreement will require a response within a definite time. The first stage will be to identify the problem. This may not be easy unless the administrator for the part of the network that has problems is part of your development team. Unfortunately, this isnt likely. In many networks, the problem could be the responsibility of a different organization entirely. If a cloud component is an essential part of your application, and the cloud has an outage, you are helpless in asserting your priority. All you can do is to wait.
If there are many adminstrators to the network, then it is more difficult to coordinate upgrades to either networks or applications, especially when several busy people are involved. Upgrades and deployments have to be done in coordination, and the larger the number of people involved, the more difficult this becomes!
Transport cost is zero
By transport costs, we mean the overall cost of transporting data across the network. We can refer to time and computer resources, or we can refer to the financial costs.
Transferring data from the application layer to the transport layer requires CPU and other resources. Structured information needs to be serialized marshalling or parsed to get data onto the wire. The performance impact of this can be greater than from bandwidth and latency hiccoughs , with XML taking twice as long as JSON because of its verbosity and complexity.
The financial transport cost includes not only the hardware and installation costs of creating the network, but also the cost of monitoring and maintaining network servers, services and infrastructure, and the cost of upgrading the network if you discover that the bandwidth is insufficient, or that your servers cant actually handle enough concurrent requests. We also need to consider the cost of leased lines and cloud services, which are paidfor by the bandwidth used
The network is homogeneous
A homogeneous network today is rare, even rarer than when the fallacy was first identified! A network is likely to connect computers and other devices, each with different operating systems, different data transfer protocols, and all connected with network components from a variety of suppliers.
There is nothing particularly wrong with heterogenous networks, though, unless it involves proprietary datatransfer protocols that require specialized support, devices or drivers. From an application perspective, it helps a lot if data is transferred in an openstandard format such as CSV, XML or JSON, and industrystandard methods for querying data such as ODBC is used.
Where all components come from one supplier, there is a better chance of reliability because test coverage can be greater, but the reality is of a rich mix of components. This means the interoperability should be builtin from the start of the design of any distributed system.
Conclusions
Data moves slower, and less reliably, outside a Server, even with modern gigabit networks. This is why we have traditionally preferred to scaleup the hardware than to scale out into networkbased commodity hardware. The trouble is that this preference ossified into a golden rule. When it proved possible to control the network to the point where the eight fallacies get closer to reality, the best practice could be turned on its head, and then the distributed model became more attractive. The problem is that distributed systems, serviceoriented architectures and microservices are only effective where the network has been tamed of all its vices. Even today, the breakdown or outage of a cloud service happens surprisingly frequently. When you are planning or developing a distributed application, it is a bad idea to assume attributes and qualities in your network that arent necessarily there: far better to plan on the assumption that your network will be costly, and will occasionally be unreliable and insecure. Assume that youll face high latency, insufficient Bandwidth and a changing topology. When you need changes, youll be likely to face the prospect of having to contact many administrators, and a bewildering range of network servers and protocols. You may then be pleasantly surprised, but dont count on it!
This post has been viewed 8,765 times.
3 Responses to The Eight Fallacies of Distributed Computing
1. rharding64 says: January 12, 2015 at 7:15 am To be complete any discussion involving distributed computing should include DDSData Distribution Service. Also for distributed computing, for Internet of Things IoT industry being implemented world wide, it will be critical that the interaction of distributed devices with enterprise servers all along the chain be modeled and accounted for. I am an EE and I create tools that interact directly with devices. I have written databases for test systems local and deployed that record measurement results, and OEM test equipment configuration for any given test setup on a device under testDUT. the common view on what is going on with big data is the following;1. in the past before IoT, collected data from sensors and other devices were stored onto servers.2. that same data was analyzed at some point in time AFTER the data has been pushed to the big data servers.Point to take home: the longer that the data sits on the server and not being used, the business value is lost.Goal of Internet of Things, which has its own protocols1. create a two way street so not only sensor data is captured, but also too interaction with process The closer that the measurement is made at the actual process, the better.2. provide these services to interact with existing systems as well as create new ones.Log in to Reply
2. dbacher says: January 13, 2015 at 2:12 am While I agree with the article, Id be really careful with comparing XML to JSON.transactionType:withdraw, amount:100, balance:100 Againstwithdraw amount100 balance100 Against:withdrawgt;amount100amountgt;balance100balancewithdrawAnd then with the old highbit and shift left 7 until its clear, three bytes for the transaction.The point is you have to think about it, you cant just grab some library.Log in to Reply
3. Phil Factor says: January 13, 2015 at 9:27 am dbacher I have a lot of sympathy with your view but for this article I was relying mainly on this study here by Nurseitov, Paulson, Reynolds and Izurieta. Montana State University http:www.cs.montana.eduizurietapubscaine2009.pdf but this is corroborated by other studies. I suspect that it is possible to get the benchmarks closer with good clean code. Also, JSON is a moving target.There is a very good discussion on StackOverflow about the relative merits of XML and JSON http:stackoverflow.comquestions4862310jsonandxmlcomparisonLog in to Reply
Leave a Reply
You must be logged in to post a comment.
Blog archive
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
December 2006
November 2006
October 2006
September 2006
August 2006
July 2006
June 2006
May 2006
April 2006
March 2006
February 2006
January 2006
December 2005
November 2005
October 2005
Active Bloggers
Ben Emmett
Beta release of C6 support in .NET Reflector
Carly Meichen
Times running out to share your .NET performance optimization tips
Damon Armstrong
Managing ClientContext.ExecuteQuery Errors in SharePoint CSOM
Dennes Torres
Be careful when exchanging old text field by varcharmax
James Duffy
Locking objects in SQL Source Control
Melanie Townsend
Change Scripts Research
Phil Factor
Can Code Review Be Automated?
SimpleTalk Editorial Team
Caption Competition 35 Moonstruck
Stephanie Herr
Webinar QA The Definitive Guide to Database Lifecycle Management
Tony Davis
Getting the Measure of Your Application
About
Site map
Become an author
Newsletters
Contact us
Help
Privacy policy
Terms and conditions
20052015 Red Gate Software Ltd