ECS781P
CLOUD COMPUTING
INTRODUCTION TO CLOUD APPLICATIONS
Lecturer: Dr. Sukhpal Singh Gill and Dr Ignacio Castro School of Electronic Engineering and Computer Science
2
Contents
• Challenges in Cloud Networking
• Evolution of server-side applications • Further Reading
3
Post-Brexit – Data Sharing
U.K. Citizens Can’t Access Data Held In A European Cloud
Source: EU’s General Data Protection Regulation
4
Internet infrastructure outages
Giotsas, V., Dietzel, C., Smaragdakis, G., Feldmann, A., Berger, A., & Aben, E. (2017, August). Detecting peering infrastructure outages in the wild. In Proceedings of the conference of the ACM special interest group on data communication (pp. 446-459).
BGP prefix hijacking
BGP Route Hijacking, also called prefix hijacking,
route hijacking or IP hijacking, is the illegitimate takeover of groups of IP addresses by corrupting Internet routing tables maintained using the Border Gateway Protocol (BGP).
https://www.zdnet.com/article/russian-telco-hijacks-internet-traffic-for-google-aws-cloudflare-and-others/
DDoS attacks
https://blog.cloudflare.com/the-ddos-that-almost-broke-the-internet/
Internet in the time of Covid19
Traffic changes during 2020 at multiple vantage points—daily traffic averaged per week, normalized by 3rd week of Jan.
Source: Feldmann A, Gasser O, Lichtblau F, Pujol E, Poese I, Dietzel C, Wagner D, Wichtlhuber M, Tapidor J, Vallina-Rodriguez N, Hohlfeld O. The Lockdown Effect: Implications of the COVID-19 Pandemic on Internet Traffic. ACM IMC 2020. https://arxiv.org/pdf/2008.10959.pdf
Internet in the time of Covid19
ECDF of link utilization before and during the lockdown
All curves are shifted to the right. This indicates an overall increase in port usages which harmonizes well with the observed increase in total traffic volume at the IXP-CE.
Source: Feldmann A, Gasser O, Lichtblau F, Pujol E, Poese I, Dietzel C, Wagner D, Wichtlhuber M, Tapidor J, Vallina-Rodriguez N, Hohlfeld O. The Lockdown Effect: Implications of the COVID-19 Pandemic on Internet Traffic. ACM IMC 2020. https://arxiv.org/pdf/2008.10959.pdf
10
Contents
• Challenges in Cloud Networking
• Evolution of server-side applications • Further Reading
11
Front End Side
12
Server Side
13
1. Servers: The Machinery
14
2. Databases: The Brains
15
3. Middleware: The Plumbing
16
4. Programming Languages & Frameworks: The Nuts & Bolts
17
Five Aspects of Server-Side Development
• Servers
• Static and Dynamic
• Amazon AWS, and Microsoft Azure
• Databases
• Relational and Non- Relational
• NoSQL, SQL Server, MySQL, MongoDB, DynamoDB
• Networks
• The communication protocols
• HTTP, Transport Layer Security (TLS), Secure Sockets Layer (SSL)
• Message Queuing
• RabbitMQ
• supports multiple messaging protocols, can be deployed in distributed configurations to meet high-scale, high- availability requirements
• Frameworks
• Django for Python
18
Client-server architecture
1 server:
• always-on host
• permanent IP address
• serves multiple clients
• data centres/cloud for scaling
Many clients:
• communicate with server
• may be intermittently connected
• may have dynamic IP addresses
• do not communicate directly with each other
client/server
19
HTTP (HyperText Transfer Protocol)
• The Web: application layer protocol
Tim Berners-Lee WWW & HTTP
Vicent Cerf TCP/IP
HTTP (hypertext transfer protocol)
• THE Web application layer protocol
• client/server model
• client: browser? that requests, receives, (using HTTP protocol) and “displays” Web objects
• server: Web server sends (using HTTP protocol) objects in response to requests
PC running Firefox browser
server running Apache Web server
iphone running Safari browser
21
Web and HTTP
• web page consists of objects
• object can be HTML file, XML-Json data, js client-
side code, JPEG image, audio file,…
• web page consists of base HTML-file which
includes several referenced objects
• each object is addressable by a URL, e.g.,
www.someschool.edu/someDept/pic.gif
host name
path name
22
HTTP response message
status line (protocol status code status phrase)
header lines
HTTP/1.1 200 OK\r\n
Date: Sun, 26 Sep 2010 20:09:20 GMT\r\n Server: Apache/2.0.52 (CentOS)\r\n Last-Modified: Tue, 30 Oct 2007 17:00:02
GMT\r\n
ETag: “17dc6-a5c-bf716880″\r\n Accept-Ranges: bytes\r\n
Content-Length: 2652\r\n
Keep-Alive: timeout=10, max=100\r\n Connection: Keep-Alive\r\n
Content-Type: text/html; charset=ISO-8859-
1\r\n \r\n
data data data data data …
data, e.g., requested HTML file
23
HTTP request message
• ASCII (human-readable format)
request line
(GET, POST, HEAD commands)
header lines
carriage return, line feed at start
of line indicates end of header lines
carriage return character line-feed character
GET /index.html HTTP/1.1\r\n
Host: www-net.cs.umass.edu\r\n
User-Agent: Firefox/3.6.10\r\n
Accept: text/html,application/xhtml+xml\r\n Accept-Language: en-us,en;q=0.5\r\n Accept-Encoding: gzip,deflate\r\n Accept-Charset: ISO-8859-1,utf-8;q=0.7\r\n Keep-Alive: 115\r\n
Connection: keep-alive\r\n
\r\n
24
HTTP Status Codes
200 OK
• request succeeded, requested object embedded in message body 301 Moved Permanently
• requested object moved, new location specified later in this msg (Location:)
• client can send a new request to the updated location 400 Bad Request
• request msg not understood by server 404 Not Found
• requested document not found on this server 505 HTTP Version Not Supported
25
Web Server
• Web Server: application that serves content (HTML pages. images, applications) via HTTP protocol.
• HTTP Protocol: Stateless, request – response • Plain text messages
• Default TCP Port: 80
• Static Web Server (Apache)
• Requests are addressed by locating and serving a stored static resource (html, jpeg, gif, pdf,…)
26
Static Web Server
HTTP Request
User Agent
Agente de Usuario
(Navegador)
(Web Browser)
Static
Servidor Estático
HTTP
HTTP Server
PETICION HTTP
GET /index.html
GET /index.html
RESPUESTA HTTP
HTTP Response
index.html
HTML
27
28
Dynamic Web Server
HTTP Request POST/signup.do
HTTP Response
User Agent (Web Browser)
Server Logic
Dynamic HTTP Server
DB
29
Static Web Server
Dynamic Web Server
Prebuilt content is same every time the page is loaded.
Content is generated quickly and changes regularly.
It uses the HTML code for developing a website.
It uses the server-side languages such
as PHP,SERVLET, JSP, and ASP.NET etc. for developing a website.
It sends exactly the same response for every request.
It may generate different HTML for each of the request.
The content is only changed when someone publishes and updates the file (sends it to the web server).
The page contains “server-side” code which allows the server to generate the unique content when the page is loaded.
Flexibility is the main advantage of static website.
Content Management System (CMS) is the main advantage of dynamic website.
30
The three-tier logical services architecture
31
Dynamic WWW: 3-tier logical architecture
• Handles user requests
• Controls navigation flows
• Creates the visual
elements comprising the response
Dynamic WWW: 3-tier logical architecture
The business logic:
• Processes domain-specific
information(domain
objects)
• Keeps relationship
between service elements
• Executes domain-specific
functions
Dynamic WWW: 3-tier logical architecture
Data access:
• Automatic ORM (Object-
Relational Mapping)
• Database communications
• Provides query
functionality
Modern cloud applications
• Agile
• Cloud: elasticity, scale
• Continuously integrated and delivered • Rapid evolution
• Data-intensive
• Analytics-infused and user experience-centric.
35
Modern cloud applications
• Agile based Cloud Application Development • Quick delivery of recognizable value
• Consistent deployment of working code • Low operational cost
• Shorter development cycles
• High degree of reusability
36
Modern Cloud Applications
37
Data-intensive applications
• Store data to find it later (databases)
• Periodically process large amounts of data (batch
processing)
• Remember the result of expensive operation (cache)
• Send a message to another application, to be handled asynchronously (event-driven, stream processing)
38
Compute-intensive applications
• Compute refers to applications and workloads that require a great deal of computation, necessitating sufficient resources to handle these computation demands in an efficient manner.
39
CPU-intensive applications
• Sorting, search, graph traversal, matrix multiply are all CPU operations, a process is CPU-intensive
• It does not depend on how much and how frequent are their execution.
40
Load Balancing for Web Applications
Adel Nadjaran Toosi, Chenhao Qu, Marcos Dias de Assuncao, and Rajkumar Buyya, Renewable-aware Geographical Load Balancing of Web Applications for Sustainable Data Centers, Journal of Network and Computer Applications (JNCA), Vol. 83, pp. 155-168, Apr. 2017.
Overall System Architecture
Challenges in Load Balancing for Web Applications
• Geographical load balancing for other types of workloads/applications
• Bag of tasks, scientific workflows, map-reduce
• Demand response and capping the brown power
consumption
• To promote carbon neutrality
• “Sticky load balancing” policies
• When a client and an application server connection is established, all subsequent requests from this session are redirected to the same server
• Network proximity of the user
44
Contents
• Challenges in Cloud Networking
• Evolution of server-side applications • Further Reading
45
Further Reading
• Evolution of Computing Paradigms
• Important factors related to Cloud Applications
46
Evolution of Computing Paradigms
Sukhpal Singh Gill et al. Transformative Effects of IoT, Blockchain and Artificial Intelligence on Cloud Computing: Evolution, Vision, Trends and Open Challenges. Internet of Things, Vol. 8, 1-26, (2019).
47
Evolution of Computing Paradigms
Centralization
Mainframe (1955)
Cluster (1962)
ARPANET, Datagram
TCP/IP, UDP, Unix
HTTP, HTML
Network Computing (1967)
Grid Computing (1999)
Mobile Computing (2004)
Cloud Computing (2006)
Fog Computing (2009)
IoT (2008)
Home Computer (1978)
WWW (1994)
SOA (2009)
Decentralization
Dominic Lindsay, Sukhpal Singh Gill, Daria Smirnova & Peter Garraghan, The evolution of distributed computing systems: from fundamental to new frontiers. Computing (2021).
Edge Computing (2009)
P2P (1999)
48
49
Further Reading
• Evolution of Computing Paradigms
• Important factors related to Cloud Applications
50
Important factors related to Cloud Applications
• Execution models
• Programming models
• Scheduling
• Platform
• Objectives
• Evaluation Methods
• Constraints (Parameters)
Execution Models In Distributed Systems
Execution Models
Batch Processing
Interactive Processing (Online Processing)
Stream Processing
Real-time Processing
Parallel Processing
Batch Processing
• Batch Processing: allows users to submit series of programs (jobs) and they will be executed to completion without further user input and manual intervention.
• Is Hadoop a batch processing framework?
• In better words, Hadoop is an open source distributed processing framework.
• Hadoop Map-Reduce is best suited for batch processing.
• Spark and Storm can be used for real time and stream processing.
• Strom is Hadoop of real-time processing.
Batch Processing
Bag of Tasks
HPC jobs
HTC jobs
Scientific Workflows
Parameter Sweep Tasks
Map-Reduce Tasks
Graph Processing
Important Terms
• HPC tasks are characterized as needing large amounts of computing power for short periods of time, whereas HTC tasks also require large amounts of computing, but for much longer times (months and years, rather than hours and days).
• A Parametric Sweep runs a command a specified number of times (indicated by start, end, and increment values), generally across indexed input and output files. The steps of
the sweep may or may not run in parallel, depending on the resources that are available on the cluster when the task is running.
• Scientific workflow in terms of tasks and their dependencies, tasks are computational steps for scientific simulations or data analysis steps.
• Bag-of-tasks refers to the jobs that are parallel among which there are no dependencies
Interactive (Online) Processing
• Interactive Processing: Interactive computing refers to application which accepts input from humans, e.g., Web applications, Massively Multiplayer Online (MMO) games.
• Online Processing: Another term for Interactive processing.
• Interactive or online processing requires a user to supply an input.
• Bar code scanning, online analytical processing (OLAP), online transaction processing (OLTP)
Stream Processing
•Stream Processing:record-by-recordanalysisof machine data in motion, e.g., Sensor Networks analytics, IoT applications, Online video processing.
• Characteristics
• Compute Intensity • Data Parallelism
• Data Locality
• Spark is a batch processing system at heart, but Spark Streaming is a stream processing system.
Real-time Processing
• Real-time Processing: real time data processing involves a continual input, process and output of data. Data must be processed in a small time period (or near real time).
• Hard real-time
• Nuclear systems, avionics
• Firm real time
• Sound system
• Soft real-time
• Weather stations
• Real-time Processing vs. Stream Processing: There are no compulsory time limitations in stream processing while small guaranteed deadline is compulsory in real-time processing
• Storm is a stream or real-time processing system?
• Other examples: airline ticket reservations, stock market, Fly-by-wire, antilock brakes, Videoconference applications, VoIP
Parallel Processing
• Parallel Processing: is the processing of program instructions by dividing them among multiple processors with the objective of running a program in less time.
• Concurrent computing vs. Parallel Processing
• It is possible to have parallelism without concurrency (such as bit-level parallelism)
• Concurrent and parallel programming are different. for instance, you can have two threads (or processes) executing concurrently on the same core through context switching. When the two threads (or processes) are executed on two different cores (or processors), you have parallelism.
• Examples: Parallel programs in MPI and OpenMP.
bit-level
Parallel Processing
instruction level
task parallelism
Other types
• Data Warehouse
• Transaction Processing •?
Application Programming Models In Distributed Systems
Programming Models
Thread
Task
Map-Reduce
Message Passing
Data Flow
Workflow
Parameter Sweep
Bag of Tasks
Scheduling
• Scheduling is the process of arranging, controlling and optimizing work and workloads by assigning them to resources.
• Three main components of any scheduling problem: • Consumer, e.g. processes, threads, cloud clients.
• Resource, e.g., CPU, I/O, VMs
• Policy
• Allocation vs. Scheduling!!!
• Often implicit distinction between the terms
in the literature, but, in general, it can be
said that:
• Allocation is from resources’ point of view, while Scheduling is form consumers’ point of view.
Scheduling
Scheduling
Planning scheme
Dynamic (online)
Static (deterministic, offline)
Decision Making
Local
Global
Optimal
Approximate
Optimality
Sub-optimal
Heuristics
Meta-Heuristics
Load Balancing
Admission Control
Goal
Mapping
Resource Provisioning
Selection
Centralized
Decentralized
Architecture
Hierarchical
Peer-to-Peer
hybrid
Platforms
• Cluster
• Grid
• Cloud
• Peer-to-Peer Systems • Super Computers
• Mobile Computing
• Sensor Networks
• Internet of things
• Content delivery networks (CDN) • Software Defined Networks (SDN) •…
Objectives
Objectives
Energy-related
Monetary cost
Cost-related
Utility
Welfare
Response time
Throughput
Time-related
Availability
Accuracy
Ease of use
Delay
Utilization
Security
Others
Privacy
Reliability
Robustness
Interoperability
Evaluation Methods
Evaluation Methods
Analytical Modeling
Measurement
Empirical Analysis
Emulation
Simulation
Constraints (Parameters)
Budget
Deadline
Parameters
Accuracy
Capacity
Regulation
67
Cloud Applications topics
• Application communications: REST
• Cloud-scale data: distributed challenges
• Cloud data management
• Security
• Designing cloud applications: micro services • The edge of the cloud: CDNs, IoT
68