PowerPoint Presentation
What is the Cloud?
1
1
Cloud Computing in a Nutshell
Cloud Computing is the transformation of computer hardware, software and networks into a Utility just like the your Electric Company, Water Company, or Gas Company.
1 instance runs 1000 h = 1000 instances run 1 h
2
2
Defining the Cloud
“Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.
Characteristics
3
On-demand – Resources should be always available when you need them, and you have control over turning them on or off to ensure there’s no lack of resource or wastage happen.
Scalable – You should be able to scale (increase or decrease the resource) when necessary. The cloud providers should have sufficient capacity to meet customer’s needs.
Multi-tenant – Sometimes you may be sharing the same resource (e.g. hardware) with another tenant. But of course, this is transparent to the customer. Cloud provider shall responsible the security aspect, ensuring that one tenant won’t be able to access other’s data.
Self-service computation and storage resource – Related processes including: billing, resource provisioning, and deployment should be self-service and automated, involving much less manual processing. If a machine where our service is hosted fails, the cloud provider should be able to failover our service immediately.
Reliability – Cloud provider should be able to provide customer reliability service, committing to uptimes of their service.
Utility-based subscription – You will pay the cloud provider as a utility based subscription, just like paying your electricity bill – without any upfront investment.
3
On Demand
Scalable / Rapid Elasticity
Multi-Tenant
Self Service
Reliability
Utility Based Subscription
Defining the Cloud
Service Models
Deployment Models
4
Public Cloud
Public cloud provider refers to the cloud platform that targets any types of customers, regardless of whether they’re an independent consumer, enterprise, or even public sector. Normally, public cloud providers are considered prominent players which have invested huge amount of capital. Windows Azure Platform by Microsoft, AWS by Amazon, AppEngine and Gmail by Google, etc. are all examples of public cloud services. Customers who possess sensitive data and application normally do not feel comfortable using public cloud due to privacy, policy, and security concerns. Remember, for public cloud, the application and data will be stored in the provider’s data center.
Private Cloud
Private cloud is infrastructure that’s hosted internally, targeting specific customers or sometimes exclusively within an organization. Setting up a private cloud is normally more affordable when compared to a public cloud. As the matter of fact, there are many organizations who have implemented their own private cloud system with product offering from vendors such as IBM, HP, Microsoft, and so on. Customers who possess sensitive data and application feel more comfortable going with this approach since the data and application are hosted privately.
Hybrid Cloud
Hybrid cloud is the combination of public and private clouds, or sometimes on-premise services. Customers who look into this solution generally want to utilize the scalability and cost-competitiveness that public cloud providers offer, but also want to retain their sensitive data on-premise or in a private cloud. With the benefits derived from both deployment models, the hybrid model solution has become more popular nowadays.
IaaS (Infrastructure as a Service)
IaaS helps you to take care of some of the components, starting from networking to provisioning the OS. But you are responsible for the middleware, runtime, data, and application. Sometimes IaaS vendors will just provide the OS but will not manage updates or patches for you. You basically just rent the virtual machine (VM) with the preferred OS installed. They won’t care what you do with the VM.
Example of IaaS market players: Amazon Web Service, Rackspace, and VMware vCloud.
PaaS (Platform as a Service)
Paas is one level up from IaaS, where cloud providers not only take care of the components that IaaS does; but also manage the platform-level components like middleware and runtime. Middleware such as applications / web server (IIS, JBoss, Tomcat, etc.) and runtime (.NET Framework, Java runtime) will be pre-installed. As a customer, you just need to focus on managing application and data.
Example of PaaS market player: Google AppEngine, Windows Azure Platform, and force.com.
SaaS (Software as a Service)
SaaS is probably the most common one as we may have been using it, unaware that they are actually cloud services. SaaS takes care of all the stacks from networking to application level. You don’t even manage the application and data storage. All you need to do is to use the system.
Example of SaaS market player: GMail, Office 365, and Google Docs.
4
Infrastructure as a Service
(IaaS)
Platform as a Service
PaaS
Software as a Service
SaaS
Private Cloud
Public Cloud
Hybrid Cloud
Cloud deployment models
5
Public Cloud
Providers let clients access the cloud via Internet
Made available to the general public
6
Public Cloud
Multi-tenant virtualization, global-scale infrastructure
Functions and pricing vary
7
Copyright: Google
Private Cloud
The cloud is used solely by an organization (e.g. HKUST, Facebook, HSBC)
May reside in-house or off-premise
8
Private Cloud
Secure, dedicated infrastructure with the benefits of on-demand provisioning
Not burdened by network bandwidth and availability issues and security threats associated with public clouds.
Greater control, security, and resilience.
9
Hybrid Cloud
Composed of multiple clouds (private, public, etc.) that remain independent entities, but interoperate using standard or proprietary protocols
Banks, hospitals, government
10
Hybrid Cloud
Allows applications and data to flow across clouds
11
Copyright: iWeb
Cloud Service Models
12
Cloud computing stack
13
Infrastructure-as-a-Service
Providers give you the computing infrastructure made available as a service. You get “bare-metal” machines.
Providers manage a large pool of resources, and use virtualization to dynamically allocate
Customers “rent” these physical resources to customize their own infrastructure
Full control of OS, storage, applications, and some networking components (e.g., firewalls)
14
Infrastructure-as-a-Service
15
IaaS use case
Netflix rents thousands of servers, terabytes of storage from Amazon Web Services (AWS)
Develop and deploy specialized software for transcoding, storage, streaming, analytics, etc. on top of it
Is able to support tens of millions of connected devices, used by 40+ million users from 40+ countries
16
Platform-as-a-Service (PaaS)
Providers give you a software platform, or middleware, where applications run
You develop and maintain and deploy your own software on top of the platform
The hardware needed for running the software is automatically managed by the platform. You can’t explicitly ask for resources.
17
PaaS
You have automatic scalability, without having to respond to request load increase/decrease
No control of OS, storage, or network, but can control the deployed applications and host environment
Best for web apps
Language and API support: Python, Java, PHP, and Go
18
Software-as-a-Service (SaaS)
Providers give you a piece of software/application. They take care of updating, and maintaining it.
You simply use the software through the Internet.
19
SaaS use case
HKUST uses Google Apps and Office 365 for student and staff email, calendar, etc.
Don’t know how much they charge HKUST though…
20
21
Source: K. Remde, “SaaS, PaaS, and IaaS.. Oh my!” TechNet Blog, 2011
A comparison
22
Flexibility/Customization
IaaS
PaaS
SaaS
Convenience/Ease of management
Tradeoff between flexibility and “built-in” functionality
Why The Cloud?
23
23
Cloud Computing Patterns
Compute
Time
“Unpredictable Bursting“
Average Usage
Unexpected/unplanned peak in demand
Sudden spike impacts performance
Can’t over provision for extreme cases
Compute
Time
Average Usage
“Predictable Bursting“
Services with micro seasonality trends
Peaks due to periodic increased demand
IT complexity and wasted capacity
24
24
Business Benefits of the Cloud
Top five advantages of cloud computing:
Pay only for what you use
Easy/fast deployment to end users
Monthly payments
Encourages standard systems
Requires less in-house staff, costs
25
25
Chief Objections to the Cloud
Top Objections to the Cloud:
Privacy Issues
Security
Control
26
26
What about the cloud provider?
27
Resource pooling
From the provider’s perspective
28
Resource pooling
The provider’s resources are pooled to serve consumers using a multi-tenant model, with different physical and virtual resources dynamically allocated according to consumer demand.
Location independence: the customer generally has no control or knowledge over the exact location of the provided resources but may be able to specify location at a higher level of abstraction (e.g., country, state, or datacenter).
Advantage for providers: efficiency in utilization
29
Economy of scale
A medium-sized datacenter (~1k servers) vs. a large datacenter (~50k servers) in 2006
30
5 – 7x decrease of cost!
Source: Ambrust et al., “Above the clouds: A Berkeley’s view of Cloud Computing.”
Statistical multiplexing
31
User 1 1 2 3 4 5 127 36 72 13 102 User 2 1 2 3 4 5 55 143 19 120 70 User 3 1 2 3 4 5 27 90 110 45 87 Time (days)
# Servers Requested
Highly profitable business for Cloud providers
32
Summary: Why cloud?
Better capital utilization
The unit cost of on-demand capacity may be higher than the unit cost of fixed capacity; offset by no charge when capacity is not being used
Elasticity, easy to scale up and down
Access to complex infrastructure and resources without internal resources
Providers: better resource utilization, lower cost
33
Virtual Machines
Infrastructure as a Service (IaaS)
Provides an operating system, storage, and networking
User needs to maintain the software on the VM
Examples of when to use Virtual Machines
Test and Development
Running applications in the cloud
Disaster recovery
34
Virtualization
Virtualization is an enabling technology for IaaS Cloud
Suppose an IaaS provider owns a large cluster and wants to provision cloud services for its users
Users require
Different machines with diverse computing capabilities, e.g., CPU, memory, networking, storage, etc.
Different OS, e.g., CentOS, Ubuntu, Windows, etc.
Different softwares and libraries pre-installed, e.g., Python, Java, vim, git, etc
35
What is virtualization?
Virtualization is a broad term. It can be applied to all types of resources (CPU, memory, network, etc.)
Allows one computer to “look like” multiple computers, doing multiple jobs, by sharing the resources of a single machine across multiple environments.
36
History
Believe it or not:
virtualization started in 1960’s in IBM’s mainframe
37
Virtualization
38
App App
OS
App App App
Win Ubuntu CentOS
Virtualization layer
The old model
A server for every application
Software and hardware are tightly coupled
39
The old model
Big disadvantage: low utilization
40
10%
12%
18%
20%
The new model
Physical resources are virtualized. OS and applications as a single unit by encapsulating them into virtual machines
Separate applications and hardware
41
The new model
Big advantage: improved utilization
42
10%
12%
18%
20%
60%
Virtual networks
You can create multi VMs in the same virtual network
Called Virtual Private Cloud (VPC) in Alibaba Cloud
Each VM has two IP addresses:
Public IP Addresses allow VMs to communicate with Internet
Private IP Addresses allows communication between VMs in a virtual network
43
Demo: Creating a VM on Alibaba Cloud
Select appropriate configurations
Choose region
Choose between subscription based or pay-as-you-go
Set firewall rules
44
Installing HDFS
Create a few VMs
Install java
apt update
apt install default-jre
apt install openjdk-11-jdk-headless
Distribute Authentication
On the master node, generate an ssh-key:
ssh-keygen -b 4096
Copy the key to the other nodes:
ssh-copy-id -i .ssh/id_rsa.pub localhost
ssh-copy-id -i .ssh/id_rsa.pub 172.31.11.116 (private ip address of second VM)
Download and Unpack Hadoop Binaries:
wget http://apache.01link.hk/hadoop/common/hadoop-2.8.5/hadoop-2.8.5.tar.gz
tar xf hadoop-2.8.5.tar.gz
mv hadoop-2.8.5 hadoop
Set Environment Variables
Add the following to .profile (re-login needed)
PATH=$HOME/hadoop/bin:$HOME/hadoop/sbin:$PATH
45
Installing HDFS
Configure the Master Node
Configuration will be done on master and replicated to other nodes.
Set JAVA_HOME
Get your Java installation path.
update-alternatives –display java
Take the value of the current link and remove the trailing /bin/java
Edit ~/hadoop/etc/hadoop/hadoop-env.sh and replace {JAVA_HOME} with your actual java installation path
Try running Hadoop:
cd
hadoop/bin/hadoop
Set NameNode Location (default port is 9000)
Update ~/hadoop/etc/hadoop/core-site.xml:
46
Installing HDFS
Set path for HDFS
Edit hdfs-site.conf:
47
Installing HDFS
Configure Slaves
Edit ~/hadoop/etc/hadoop/slaves to be:
vm1
vm2
…
Duplicate Config Files on Each Node:
scp hadoop-2.8.5.tar.gz 172.31.11.116 :~/
ssh 172.31.11.116
tar xf hadoop-2.8.5.tar.gz
mv hadoop-2.8.5 hadoop
exit
scp ~/hadoop/etc/hadoop/* vm2:~/hadoop/etc/hadoop/
48
Installing HDFS
Format HDFS
On the master node, run
hdfs namenode –format
Start and Stop HDFS
On the master node, run
start-dfs.sh
Check that every process is running with the jps command on each node. You should get on node-master (PID will be different):
21922 Jps
21603 NameNode
21787 SecondaryNameNode
On a data node, you should get:
12186 Jps
11918 DataNode
To stop HDFS on master and slave nodes, run the following command from node-master:
stop-dfs.sh
49
Use your HDFS Cluster
Monitor your HDFS Cluster
http://master-public-IP:50070
Need to open the port
Put and Get Data to HDFS
Writing and reading to HDFS is done with command hdfs dfs.
Create directory :
hdfs dfs -mkdir test
Put some files to HDFS:
hdfs dfs -put LICENSE.txt
hdfs dfs -put spark-2.2.0-bin-hadoop2.7.tgz
Get files from HDFS:
hdfs dfs -get test/LICENSE.txt
50
Setting up a Spark Cluster
Get spark:
wget http://apache.website-solution.net/spark/spark-2.4.0/spark-2.4.0-bin-hadoop2.7.tgz
tar xf spark-2.4.0-bin-hadoop2.7.tgz
mv spark-2.4.0-bin-hadoop2.7 spark
Set environment variables.
Add the following to .profile:
export SPARK_HOME=/root/spark
export SPARK_MASTER_HOST=master private IP
Start spark master:
spark/sbin/start-master.sh
Start slaves:
Create conf/slaves containing hostnames of all slaves
sbin/start-slaves.sh
51
51
Setting up a Spark Cluster
You can check the cluster status at master node at port: MasterPublicIP:8080
Need to set security rules to allow such inbound traffic
Run pyspark:
bin/pyspark –master spark://MasterPrivateIP:7077
Load files from HDFS:
rdd = sc.textFile(‘hdfs://MasterPrivateIP:9000/user/yike/test/LICENSE’)
Check status of spark application at port 4040, 4041, …
Stop slaves and master:
sbin/stop-slaves.sh
Sbin/stop-master.sh
52
Monitor Your Bill
Shut down VMs when not needed (unless your VM is subscription-based)
Public IP addresses may change next time you start
Private IPs won’t change
Data on VMs won’t be deleted
VMs won’t cost when shutdown
But storage still costs money
Releasing VMs will also delete data
53
Cloud computing
/docProps/thumbnail.jpeg