COMP5349 – Cloud Computing
Week 3: Container Technology
Dr. Ying Zhou School of Computer Science
Container Brief Intro
Docker Overview Images
Containers Storage
Networking Security
Outline
COMP5349 “Cloud Computing” – 2020 (Y. Zhou)
02-2
Administrative: Lab Arrangement
University’s data on special cohort students may not be accurate
We have 9 labs in three different settings
Online only labs using zoom
R16H: Thursday 4pm on SIT457 F16E: Friday 4pm on SIT457
Face-to-face only labs
R16C: Thursday 4pm on SIT 117 F16A: Friday 4pm on SIT 118
Mixed labs with tutor in room and zoom session running Thursday: SIT114, SIT115 and SIT130B
Friday: SIT117 and SIT 116
If you are on campus but are assigned to an online only lab, please attend any lab with a tutor in the room
If you are not on campus but are assigned to a face-to-face only labs
Please contact Chenhao Huang
class with zoom session
COMP5349 “Cloud Computing” – 2020 (Y. Zhou) 02-3
Administrative: AWS Educate Update
AWS changed the way credit is distributed to student late last year (2019)
There are three types of account for using AWS Regular AWS account
Any one with a credit card can apply
Full service support Educate starter account
Any university student can apply, the credit amount depends on if the university is a member institute
Our student will have $100 credit per year
Last until you graduate
With limited services
Educate classroom account
By invitation only with $50 credit
Last only for the semester With limited services
They are not related in anyway
COMP5349 “Cloud Computing” – 2020 (Y. Zhou) 03-4
Administrative: Covid-19 health management
What do I do if a staff member or student is visibly unwell?
Politely ask the person to excuse themselves from the work areas or class and seek
medical attention.
What should I do if I feel unwell?
If you feel unwell and experience symptoms (including a fever, a cough, sore throat or shortness of breath):
Do not attend work if you feel unwell. Speak to your manager to make alternative arrangements.
Isolate yourself from others immediately – the University has support measures in place for self-isolation of staff and students.
Phone (do not visit) a local general practioner (GP) or the closest hospital emergency department for instructions on what to do next.
The University Health Service can be reached on (02) 9351 3484. Royal Prince Alfred emergency department closest to our main campus in Camperdown is (02) 9515 6111.
What if I come to University feeling well, but become unwell while on campus?
If symptoms appear while at the University, please excuse yourself from the class or work area and seek medical attention.
COMP5349 “Cloud Computing” – 2020 (Y. Zhou) 03-5
Last Week: Virtualization The key motivation for virtualization
Maintain isolation, increase utilization
Two components in server virtualization
Hypervisor and virtual machine
The design principle of hypervisor
Similar to OS design principle Kernel and other modules
Can be implemented in different ways
Different options for managing critical instructions
Full virtualization: keep OS unchanged With software emulation
With hardware assisted execution mode
Modify OS
Paravirtualization
COMP5349 “Cloud Computing” – 2020 (Y. Zhou) 02-6
Containerization
“Operating-system-level virtualization, also known
as containerization, refers to an operating
system feature in which the kernel allows the existence of multiple isolated user-space instances.”
“Such instances, called containers, partitions, virtualization engines (VEs) or jails (FreeBSD jail or chroot jail), may look like real computers from the point of view of programs running in them.”
https://www.ibm.com/developerworks/library/l-linux-kernel/index.html
https://en.wikipedia.org/wiki/Operating-system-level_virtualization#Implementations
COMP5349 “Cloud Computing” – 2020 (Y. Zhou) 03-7
Container vs. System Virtual Machine
All containers use the kernel of the host machine
Each VM contains a full OS
You cannot run windows container on Linux machine
All Linux distros, share more or less the “same” upstream kernel,
running Ubuntu container on Redhat would not cause any drama
COMP5349 “Cloud Computing” – 2020 (Y. Zhou) 03-8
Linux Kernel, System and Distribution
Linux kernel is the most basic component of a Linux Operating System. It does the four basic jobs
Memory Management Process Management Device Drivers
System Calls and Security
Linux System
Kernel + system libraries and tools
E.g. GNU tools like gcc, Linux distribution
https://upsilon.cc/~zack/talks/2011/20111031-uds.pdf
Pre-packaged Linux system + more applications
E.g. news servers, web browsers, text-processing and editing tools, etc.
Redhat, Debian, Amazon Linux, Android(?)
COMP5349 “Cloud Computing” – 2020 (Y. Zhou) 03-9
Container vs. System Virtual Machine
Operating Systems and Resources
Running full fledged OS inside VM takes up certain amount of
resources even if no app is running in the VM
Starting OS takes some time
Container exits when the process inside it finishes
Container is much faster to start, similar to starting an application
Isolation for performance and security
VMs have very good isolation and security
They enjoy hardware support and mature technology
Containers offer reasonable level of isolation using kernel techniques like namespace and control groups
Containers can run inside VM
COMP5349 “Cloud Computing” – 2020 (Y. Zhou) 03-10
Linux Kernel Feature: Namespace
To create an illusion of full computer system for each container, we need to give each a “copy” of the kernel resources
An independent file system starting with a root directory: /
An independent set of process ids, with id 1 assigned to init process An independent set of user uids, with id 0 assigned to root user
Etc..
To avoid conflict, OS provides a feature called namespace
The namespace provides containers their own view of the
system
COMP5349 “Cloud Computing” – 2020 (Y. Zhou)
03-11
Namespace Kinds
Early Linux has a single namespace for each resource type Gradually, namespaces are added to different resources
Mount (mnt)
This deals with file system
Each container can have its own rootfs
Each container manages its own mount points
Process ID (pid)
Each container has its own numbering starting at 1
When PID 1 goes away, all other processes exit immediately
PID name spaces are nested. The same process may have different PIDs in different namespaces
User ID
Provide user segregation and privilege isolation
There is a mapping between container UID to host OS UID
UID 0 (root) in container may have a different UID in the host
Net, IPC, etc..
COMP5349 “Cloud Computing” – 2020 (Y. Zhou)
03-12
Linux Kernel Feature: Cgroup
Control Groups (cgroups) is another kernel feature to enable isolated container
It is used to control kernel resource allocated to each container/process
Metering and limiting
The resources include: Memory
CPU
I/O (File and Network)
Cgroups are organized hierarchically for each resource type Each process belongs to 1 node in each hieararchy
COMP5349 “Cloud Computing” – 2020 (Y. Zhou) 03-13
Cgroups Hierarchy Example
COMP5349 “Cloud Computing” – 2020 (Y. Zhou) 03-14
Container Runtimes
Runtime
Life cycle of a program, e.g runtime error
Language specific environment to support its execution, e.g. JRE
Container runtime has similar responsibilities as a Java Runtime Environment (JRE)
It enables containers to run
by setting up namespaces and cgroups
It can also be viewed as the counterpart of hypervisor
Low level container runtimes, focuse on just running containers LXC, Systemd-nspawn, OpenVZ, Sandboxie, etc…
High level container runtimes, contain lot of other features like defining image formats, managing images, etc
Docker
COMP5349 “Cloud Computing” – 2020 (Y. Zhou)
03-15
Container Brief Intro
Docker Overview Images
Containers Storage
Networking
Outline
COMP5349 “Cloud Computing” – 2020 (Y. Zhou)
03-16
Docker Overview
Docker is the most “famous” container
It is not just a container, it is a packaging and deployment system build on container technology
There is a large ecosystem of various components
The container part is usually presented as a black box where docker
users do not need to know a lot about how it works
For most users, the dependency management and deploy everywhere are the most prominent features
Its success partly relies on good use case
COMP5349 “Cloud Computing” – 2020 (Y. Zhou) 03-17
The dependency hell
Modularization and layering are key principles in computer science .
But it has an unwanted by product “dependency hell”
APIs may change across versions
Conflicting dependencies
Some app runs on Python 2 while others on Python 3, and some may be
very particular on the exact version Alternative solution?
Missing dependencies
You may install some app successfully, but may encounter problem in
execution
Alternative solution?
Platform differences
Development and production have different environments
Docker is promised to solve all those problems COMP5349 “Cloud Computing” – 2020 (Y. Zhou)
03-18
Docker Components
Docker daemon is a long running process to provide overall control
images, containers, network and data volumes
The daemon exposes a REST API to allow remote control
A command line interface (CLI) to interact with docker daemon
Any similarity with system VM? Any similarity with JVM? Which component is managing access to the critical resources like cpu and memory?
https://docs.docker.com/engine/docker-overview/
COMP5349 “Cloud Computing” – 2020 (Y. Zhou)
03-19
Docker Architecture
https://docs.docker.com/engine/docker-overview/
The client and daemon can run on the same or different system
Docker daemon has command for managing the life cycle of a container
The docker registries are where docker images are stored and can be used
COMP5349 “Cloud Computing” – 2020 (Y. Zhou) 03-20
Docker Objects
Container
A relatively isolated running environment for user’s applications, e.g. a web application. Container uses kernel technologies to manage resource allocation and isolation.
Images
A read-only template with instructions for building and executing
some application inside container Network
Mechanisms for connection among Docker and non-Docker workload
Data Volume
Is the standard mechanism for persisting data for container
COMP5349 “Cloud Computing” – 2020 (Y. Zhou)
03-21
Docker image
Docker images are defined in Dockerfile
A text document containing a sequence of instructions/commands to
build and run your application
If you ever written something like a make file, ant build file, Maven
POM file, etc. etc. the Dockerfile is designed for similar purpose You declare dependencies, set environment variables and other
configurations in it
The images are created by calling the command docker
The images are stored in a local Docker image registry and can be published in a public registry
build
COMP5349 “Cloud Computing” – 2020 (Y. Zhou)
03-22
Example Dockerfile
https://docs.docker.com/get-started/part2/#dockerfile
COMP5349 “Cloud Computing” – 2020 (Y. Zhou) 03-23
Image Layering
Docker uses Copy-on-Write strategy to organize storage inside containers to guarantee the lightweight feature
Small space requirement and fast start-up time Various drivers can be used aufs, Btrfs, ZFS, etc
Copy-on-Write
Storages are organized into multiple logical layers
If a file or directory exists in the lower layers, the upper layers can use it
If an upper layer needs to modify anything (write) on the lower layer, it creates a copy on that layer and modify it.
Common lower layers can be shared by many images
Docker images are build on top of each other, upper images
do not need to copy lower image files if nothing is changed.
COMP5349 “Cloud Computing” – 2020 (Y. Zhou) 03-24
Image Layers
Obtain some layers from parent image
Create a layer each
https://docs.docker.com/v17.09/engine/userguide/storagedriver/imagesandcontainers/
COMP5349 “Cloud Computing” – 2020 (Y. Zhou)
03-25
Container and image layers
Images are read-only templates, all image layers are read- only
When an image is loaded into a container to run, the container add a thin writable layer on top of it.
All writes to the container are stored in this layer It is deleted when container exits (is deleted) Should be used as temporary storage
If multiple images use a same base image, only one copy of the base image is required
Small space
When container starts, only a new writable layer is added on top of the existing image layers
Fast start up
Application persistence should be handled differently
COMP5349 “Cloud Computing” – 2020 (Y. Zhou) 03-26
Multiple containers sharing the same image
COMP5349 “Cloud Computing” – 2020 (Y. Zhou)
03-27
Docker Container
The actual container uses many OS technologies to provide isolation and to allocation resources: cpu, memory, i/o
This is achieved by utilized various features provided by linux kernel
Namespaces Cgroups
Others
Linux Containers (LXC) was used as the execution driver The current driver Libcontainer is more general, allowing
Docker to run on platform other than Linux
COMP5349 “Cloud Computing” – 2020 (Y. Zhou)
03-28
Data Volume
Apart from storing data within the writable layer of a container, there are other preferred ways to persist data
Bind mount is an early version storage option, it allows a client to mount any file or directory on the host machine into a container
Volumes uses a designated location in the host machine as container storage, it is completely managed by Docker
tmpfs is a rarely used option, which uses host memory to simulate storage
COMP5349 “Cloud Computing” – 2020 (Y. Zhou)
03-29
Networking
Docker provides a few options for container to decide how it wants to connect with outside
Host
Bridge: default/user defined None
Overlay
Others…
These are specified during the start of the container, if nothing is specified, the default bridge driver is used.
COMP5349 “Cloud Computing” – 2020 (Y. Zhou)
03-30
Networking: Host
Host driver is the simplest option,
It removes the isolation between container and host Container is treated in the same way as a process in the host It connects directly to the host NIC (host networking namepace)
docker run –d –-name nginx-1 –net=host nginx
https://mesosphere.com/blog/networking-docker-containers/
COMP5349 “Cloud Computing” – 2020 (Y. Zhou) 03-31
Docker manages its own private network
Adding a software bridge to the host
The mapping from internal/private address to public/host based address is done through Network Address Translation (NAT)
10000 10001
Networking: Bridge
docker run –d –-name nginx-1 -p 10000:80 nginx docker run –d –-name nginx-2 -p 10001:80 nginx
COMP5349 “Cloud Computing” – 2020 (Y. Zhou)
03-32
COMP5349 “Cloud Computing” – 2020 (Y. Zhou) 03-33
Docker Security Issues
Namespaces provide the first and most straightforward form of isolation
But for various reasons, a container might not follow strict name space based isolation
E.g. the Host networking driver puts container and host in the same namespace
Even though each container has its own file system. The host file system is not completely invisible to container.
The special requirements to run Docker daemon as root largely compromises security
Docker containers are started with restricted capabilities and can be further restricted, but the default profile does not provide complete isolation
It is relatively easy for a user to escalate to some level of ‘root’ privilege through container.
COMP5349 “Cloud Computing” – 2020 (Y. Zhou)
03-34
Next Week
This is the end of cloud enabling technology content
Starting from next week, we will focus on “big data analytic”
frameworks
The topic for next week is basic MapReduce programming
Dean, Sanjay Ghemawat, MapReduce: Simplified Data Processing on Large Clusters. In OSDI’04,
Tom White, Hadoop, the definitive Guide, O’reilly, 2009
COMP5349 “Cloud Computing” – 2020 (Y. Zhou)
03-35
References
Container’s Anatomy,
https://www.slideshare.net/jpetazzo/anatomy-of-a-container-namespaces-
cgroups-some-filesystem-magic-linuxcon
Dirk Merkel, Docker: Lightweight Linux Containers for Consistent Development and Deployment, Linux Journal, May 19, 2014
https://www.linuxjournal.com/content/docker-lightweight-linux-containers- consistent-development-and-deployment
Docker Documentation: Docker Overview
https://docs.docker.com/engine/docker-overview/#the-docker-platform
Docker File Systems
https://docs.docker.com/storage/storagedriver/#images-and-layers https://docs.docker.com/storage/volumes/
Docker Networking
https://mesosphere.com/blog/networking-docker-containers/
Docker Security
https://docs.docker.com/engine/security/security/
COMP5349 “Cloud Computing” – 2020 (Y. Zhou)
03-36