Microsoft PowerPoint – cluster-1 [Compatibility Mode]
High Performance Computing
Course Notes
Cluster Technologies -I
Dr Ligang He
2Computer Science, University of Warwick
History and Evolution of HPC Systems
(revisit)
1960s: Scalar processor
1970s: Vector processor
Later 1980s: Massively Parallel Processing (MPP)
Later 1990s: Cluster
Connecting stand-alone computers with high-speed network
(over-cable networks)
• Commodity off the shelve computers
• high-speed network: Gigabit Ethernet, infiniband
• Over-cable network vs. on-board network
Not a new term itself, but renewed interests
• Performance improvement in CPU and networking
• Advantage over custom-designed mainframe computers: Good
portability
3Computer Science, University of Warwick
Why did Clusters gain popularity then?
Clustering gained new wave of interests
when 3 technologies converged:
1. Very high performance Microprocessors
• PC performance today = old time supercomputers
2. High speed communication
3. Standard tools for parallel/ distributed programming
4Computer Science, University of Warwick
Cluster Architectures
5Computer Science, University of Warwick
Cluster Components…1
Nodes
• Multiple High Performance Components:
PCs
Workstations
SMPs
• They can be based on different architectures and
running different OS
• But usually, the nodes in a cluster are
homogenous
They have same architecture and performance, and are
installed with the same os
6Computer Science, University of Warwick
Cluster Components…2
OS
• OS used in various cluster systems :
Linux (Beowulf)
Microsoft NT (Illinois HPVM)
SUN Solaris (Berkeley NOW)
IBM AIX (IBM SP2)
HP UX (Illinois – PANDA)
Mach (Microkernel based OS)(CMU)
7Computer Science, University of Warwick
Cluster Components…3
High Performance Networks
Ethernet (10Mbps),
FDDI (100Mbps): Fibre Distributed Data Interface
Fast Ethernet (100Mbps),
Gigabit Ethernet (1Gbps)
Myrinet (10Gbps)
10 Gigabit Ethernet (10Gbps)
Infiniband (24-290Gbps)
8Computer Science, University of Warwick
Cluster Components…4
Network Interfaces
Network Interface Card
Myrinet NIC
Ethernet NIC
Infiniband NIC
…
9Computer Science, University of Warwick
Cluster Components…5
Communication Software
Traditional OS supported protocols (heavy weight
due to protocol processing)..
7 layer OSI reference model,
Sockets (TCP/IP), pipes, etc.
Light weight protocols (User Level)
Active Messages (Berkeley): used in NOW system
Fast Messages (Illinois): used in HPVM
U-net (Cornell)
XTP (Virginia)
Active Message:
http://digitalassets.lib.berkeley.edu/techreports/ucb/text/CS
D-92-675.pdf
Fast Messages:
https://courseware.ee.calpoly.edu/~jharris/3comproject/Refe
rence/high%20performance%20messaging%20on%20workst
ations.pdf
10Computer Science, University of Warwick
Cluster Components…6
Cluster Middleware
• Provide workload and resource management
• Present the single system image of the cluster
• Examples:
Moab
SLURM
PBS
Condor
11Computer Science, University of Warwick
Cluster Components…8
Development Tools
Processes: MPI, PVM, DSMs
Threads (Multicore computers)
OpenMP, POSIX Threads, Java Threads
Compilers
C/C++/Java;
MPICC
Debugger for sequential programs: gdb and dbx
Debuggers for parallel programs: Buster
Performance Analysis Tools and Visualization
Tools: e.g. Vampir Trace
12Computer Science, University of Warwick
Vampir Visualizes Communication Pattern
Output from several parallel performance profilers – e.g. VampirTrace.
Example above shows nearest neighbor communications for a 1-D data
decomposition (Each PE sends to PE+1, and PE-1).
Symmetrical iff equal data flow between sub-grids in both directions.
Communication patterns can be identified from the matrix.
13Computer Science, University of Warwick
Vampir Visualizes Communication Pattern
Output from several parallel performance profilers – e.g. VampirTrace.
Example above shows nearest neighbor communications for a 1-D data
decomposition (Each PE sends to PE+1, and PE-1).
Symmetrical iff equal data flow between sub-grids in both directions.
Communication patterns can be identified from the matrix.
14Computer Science, University of Warwick
Vampir Visualizes Communication Pattern
Communication pattern typical of a 2-D decomposition.
Equal amount of traffic (and messages) occur in shaded
locations (in this example).
15Computer Science, University of Warwick
Vampir Visualizes Communication Pattern
Communication pattern typical of a 2-D decomposition.
Equal amount of traffic (and messages) occur in shaded
locations (in this example).
16Computer Science, University of Warwick
Vampir Visualizes Communication Pattern
Communication pattern for a 3-D decomposition.
Level of traffic in X > Y > Z (in this example).
17Computer Science, University of Warwick
Cluster Components…9
Applications
Sequential application
Parallel / Distributed application
Scientific applications: each is computation-intensive
Weather Forecasting
Computational Fluid Dynamics
Molecular Biology Modeling
Engineering Analysis (CAD/CAM)
……………….
Service applications: high arrival rate of service requests
Ebay
amazon
18Computer Science, University of Warwick
Cluster Architectures
19Computer Science, University of Warwick
What is Single System Image (SSI) ?
A single system image is the illusion, created by Cluster
management software (middleware), that presents a
collection of resources as a single powerful resource.
Single Entry Point
ssh cluster.my_institute.edu √
ssh node1.cluster. institute.edu ×
20Computer Science, University of Warwick
Benefits of SSI or Middleware
Simplified system management
Use system resources transparently
Users need not be aware of the detailed resource
information and underlying system architecture to use
these machines effectively
Transparent load balancing and process migration
across nodes.
Improved reliability and availability
Improved system-oriented performance
• Global view of middleware and local view of a user
21Computer Science, University of Warwick
Cluster Management Software
Goal: Help the allocation of resources to jobs, given jobs’
resource requirements and local policy restrictions
Three parties in a cluster environment
Users: supplying the job and job requirements
Administrators: describing local use policies
Cluster management software: monitoring the state of the cluster,
scheduling the jobs and tracking the resource usage
Typical activities performed by cluster management
software
Queuing
Scheduling
Monitoring
Resource management
Accounting
22Computer Science, University of Warwick
Queuing
Job submission usually consists of two primary parts:
Job description (e.g. job name, the location of the required
input files)
Resource requirements (e.g. the amount of memory, the
number of CPUs needed)
Once submitted, the jobs are held in the queue until
the job is at the head of the queue and the matching
resources are available
23Computer Science, University of Warwick
Scheduling
Determining at what time a job should be put into
execution on which resources
There are a variety of metrics to measure scheduling
performance
System-oriented metrics (e.g. throughput, utilisation, average
response time of all jobs)
user-oriented metrics (e.g. response time of a job submitted by
a user)
They can contradicts each other and balance needs to be made
24Computer Science, University of Warwick
Monitoring
Providing information to administrators, users and the
Cluster manager on the status of jobs and resources
The method of collecting the info may differ between
different cluster management systems, but the general
purpose is the same
25Computer Science, University of Warwick
Resource management
Handling the details of
Starting the job execution on the resources
Stopping a job
Cleaning up the temporary files generated by the jobs
after the jobs are completed or aborted
Removing or adding resources
For the batch system, the jobs are put into execution
in such a way that the users don’t have to be present
during execution
For interactive systems, the users have to be present
to supply arguments or information during the
execution of the jobs.
26Computer Science, University of Warwick
Accounting
Accounting for which users are using what resources for
how long
Collecting resource usage data (e.g. job owner, resources
requested by the job, resource consumption by the job)
Accounting data can be used for:
Producing system usage and user usage reports
Tuning the scheduling policy
Anticipating future resource requirements by users
Calculating future resource allocations
Determining the area of improvement within the cluster
27Computer Science, University of Warwick
Schedule Polices
The simplest policy:
First-Come First-Served
Jobs are run in the same order as they are submitted.
Does not require prior knowledge about jobs (e.g.
runtime).
Problems: jobs can block other jobs from starting,
despite there being no performance benefit to either user.
28Computer Science, University of Warwick
First-Come First-Served
29Computer Science, University of Warwick
Backfilling
The problem with FCFS is that idle time (sum of unused
processing intervals) can be significant.
One improvement is to “backfill”.
Allows a job to start if it does not delay the execution of
the first job in the queue.
30Computer Science, University of Warwick
Backfilling
31Computer Science, University of Warwick
Backfilling
Advantages:
• Utilisation is improved.
Disadvantages:
• Information about the job execution time is required.
• User estimation are usually inaccurate.
• It is a policy decision to decide what to do if a job overruns;
many administrators choose to terminate a job if it exceeds
its allocated execution time otherwise some users may
deliberately underestimate the job length to get an earlier job
start time.
32Computer Science, University of Warwick
Backfilling
A problem if predicted runtime is wrong:
33Computer Science, University of Warwick
Schedule Polices
Reservation:
Increasingly user-based quality of service (QoS) is an
important scheduling metric.
In addition to normal scheduling, reservation services can
be used to plan resource allocation.
Users are able to
set up a reserved block of processing capability that
they are able to use at some point in the future.
reserve a part of resources in the cluster to be
dedicated to a certain group of users