The Spartan HPC System at the University of Melbourne
COMP90024 Cluster and Cloud Computing
University of Melbourne, March 23, 2021
Copyright By PowCoder代写 加微信 powcoder
Outline of Lecture
“This is an advanced course but we get mixed bag: students that have 5+ years of MPI programming on supercomputers, to students that have only done Java on Windows.”
Some background on supercomputing, high performance computing, parallel computing, scientific computing (there is overlap, but they’re not the same thing).
An introduction to Spartan, University of Melbourne’s HPC general purpose HPC system. Logging in, help, and environment modules.
Job submission with Slurm workload manager; simple submissions, multicore, job arrays, job dependencies, interactive jobs.
Parallel programming with shared memory and threads (OpenMP) and distributed memory and message passing (OpenMPI)
Tantalising hints about more advanced material on message passing routines.
Why Supercomputers?
Data size and complexity is growing much faster than the capacity of personal computational devices to process that data[1][2], primarily to the heat issues associated with increased clock speed (Dennard scaling).
Various technologies to mitigate this problem; multiprocessor systems, multicore processors, shared and distributed memory parallel programming, general purpose graphics processing units, nonvolatile memory (the PDP11 gets the last laugh), RoCE/RDMA and network topologies, and so forth.
All of these are incorporated in various HPC systems and at scale. Increased research output[3], exceptional return on investment (44:1 profits or costsavings [4]).
Supercomputers are critical for the calculations involved in climate and meteorological models, geophysics simulations, biomolecular behaviour, aeronautics and aerospace engineering, radio telescope data processing, particle physics, brain science, etc.
Some Local Examples
Researchers from Monash University, the Institute in Melbourne, the Birkbeck College in London, and VPAC in 2010 unravelled the structure the protein perforin to determine how pathogenic cells are attacked by white blood cells [5].
In 2015 researchers from VLSCI announced how natural antifreeze proteins bind to ice to prevent it growing which has important implications for extending donated organs and protecting crops from frost damage [6].
In 2016 CSIRO researchers successfully manipulated the
behaviour of Metallic Organic Frameworks to control their structure and alignment which provides opportunities for realtime and implantable medical electric devices [7].
In 2019 a research team, including University of Melbourne on Spartan, broke the Zodic cipher 360 which had eluded criminal and legal teams in the United States for over fifty years.
Supercomputers
‘Supercomputer” arbitrary term. In general use it means any single computer system (itself a contested term) that has exceptional processing power for its time. One metric is the number of floatingpoint operations per second (FLOPS) such a system can carry out.
Supercomputers, like any other computing system, have improved significantly over time. The Top500 list is based on FLOPS using LINPACK HPC Challenge is a broader, more interesting metric. The current number #1 system is Fugaku from the Riken Center for Computational Science in Japan.
1994: 170.40 GFLOPS 1996: 368.20 GFLOPS 1997: 1.338 TFLOPS 1999: 2.3796 TFLOPS
2000: 7.226 TFLOPS 2008: 1.105 PFLOP 2010: 2.566 PFLOPS 2012: 17.59 PFLOPS 2014: 33.86 PFLOPS 2016: 93.01 PFLOPS 2018: 143.00 PFLOPS 2020: 442.01 PFLOPS
2004; 70.72 TFLOPS 2009: 1.759 PFLOPS 2011: 10.51 PFLOPS 2013: 33.86 PFLOPS 2015: 33.86 PFLOPS 2017: 93.01 PFLOPS 2019: 148.60 PFLOPS 2021: 442.01 PFLOPS
2005: 280.6 TFLOPS
2007: 478.2 TFLOPS
High Performance Computing
Highperformance computing (HPC) is any computer system whose architecture allows for above average performance. A system that is one of the most powerful in the world, but is poorly designed, could be a “supercomputer”.
Clustered computing is when two or more computers serve a single resource. This improves performance and provides redundancy; typically a collection of smaller computers strapped together with a highspeed local network (e.g., Myrinet, InfiniBand, 10 Gigabit Ethernet). Even a cluster of with
Lego chassis (University of Southampton, 2012)!
Horse and cart as a computer system and the load as the computing tasks. Efficient arrangement, bigger horse and cart, or a teamster?
The clustered
HPC is the most efficient, economical, and scalable method, and for that reason it dominates supercomputing.
Parallel and Research Programming
With a cluster architecture, applications can be more easily parallelised across them. Parallel computing refers to the submission of jobs or processes over multiple processors and by splitting up the data or tasks between them (random number generation as data parallel, driving a vehicle as task parallel).
Research computing is the software applications used by a research community to aid research. This skills gap is a major problem and must be addressed because as the volume, velocity, and variety of datasets increases then researchers will need to be able to process this data.
Computational capacity does have a priority (the system must exist prior to use), in order for that capacity to realised in terms of usage a skillset competence must also exist. The the core issue is that high performance compute clusters is just speed and power but also usage, productivity, correctness, and reproducibility.
Reproducibility in science is a huge issue! Many of the problems relate to inattentiveness to software versions, compilers and options, etc. all of which can be very sitespecific in HPC facilities. See: The Ten Year Reproducibility
Challenge (https://www.nature.com/articles/d41586020024627). Also, the limitations of number systems used in computing.
HPC Cluster Design
It’s A GNU/Linux World
From November 2017 onwards of the Top 500 Supercomputers worldwide, every single machine used Linux.
The commandline interface provides a great deal more power and is very resource efficient.
GNU/Linux scales and does so with stability and efficiency. Critical software such as the Message Passing Interface (MPI)
and nearly all scientific programs are designed to work with GNU/Linux.
The operating system and many applications are provided as “free and open source”, which means that not only are there are some financial savings, were also much better placed to improve, optimize and maintain specific programs.
Free or open source software (not always the same thing) can be can be compiled from source for the specific hardware and operating system configuration, and can be optimised according to compiler flags. There is necessary where every clock cycle is important.
Flynn’s Taxonomy and Multicore Systems
It is possible to illustrate the degree of parallelisation by using Flynn’s Taxonomy of Computer Systems (1966), where each process is considered as the execution of a pool of instructions (instruction stream) on a pool of data (data stream).
Over time computing systems have moved towards multiprocessor, multicore, and often multithreaded and multinode systems.
The engineering imperative to these systems comes down to heat. From the mid2000s clock speed on CPUs have largely stalled.
Some trends include GPGPU development, massive multicore systems (e.g.,
The Angstrom Project, the Tile CPU with 1000 cores) and massive network connectivity and shared resources (e.g., Plan9 Operating System).
(Image from Dr. , Canisius College)
Limitations of Parallel Computation
Parallel programming and multicore systems should mean better performance. This can be expressed a ratio called speedup
Speedup (p) = Time (serial)/ Time (parallel)
Correctness in parallelisation requires synchronisation. Synchronisation and atomic operations causes loss of performance, communication latency.
Amdahl’s law, establishes the maximum improvement to a system when only part of the system has been improved. Gustafson and Barsis noted that Amadahl’s Law assumed a computation problem of fixed data set size.
UniMelb’s HPC System: Spartan
A detailed review was conducted in 2016 looking at the infrastructure of the Melbourne Research Cloud, High Performance Computing, and Research Data Storage Services. University desired a ‘more unified experience to access compute services’
Recommended solution, based on technology and usage, is to make use of existing NeCTAR Research cloud with an expansion of general cloud compute
.provisioning and use of a smaller “true HPC” system on bare metal nodes.
Since then Spartan has taken up a large GPGPU partition, moving from a small, experimental system, to a worldclass facility. We have also moved all our partitions to physical nodes.
Complete list of current partitions and storage at : https://dashboard.hpc.unimelb.edu.au/status_specs/
Spartan is Small but Important
Spartan as a model of an HPCCloud Hybrid has been featured at Multicore World, Wellington, 2016, 2017; eResearchAustralasia 2016 and several European HPC centres, including the European Organization for Nuclear Research (CERN), 2016, and the OpenStack Summit, Barcelona 2016.
Also featured in OpenStack and HPC Workload Management in (ed), The Crossroads of Cloud and HPC: OpenStack for Scientific Research, Open Stack, 2016 http://openstack.org/assets/science/OpenStackCloudandHPC6x9Bookletv4online.pdf
.Architecture also featured in:
Spartan and NEMO: Two HPCCloud Hybrid Implementations.
2017 IEEE 13th International Conference on eScience, DOI: 10.1109/eScience.2017.70
The Chimera and the Cyborg, Hybrid Compute Advances in Science, Technology and Engineering Systems Journal Vol. 4, No. 2, 0107, 2019
Other presentations on Spartan include use of the GPGPU partition at eResearch 2018, its development path at eResearchAU 2020, interactive HPC at eResearchNZ 2021.
Over 60 papers cite Spartan as a contributing factor their research.
Setting Up An Account and Training
Spartan uses its own authentication that is tied to the university Security Assertion Markup Language (SAML). The login URL is https://dashboard.hpc.unimelb.edu.au/karaage
Users on Spartan must belong to a project. Projects must be led by a University of Melbourne researcher (the “Principal Investigator”) and are subject to approval by the Head of Research Compute Services.
Participants in a project can be researchers or research support staff from anywhere.
The University, through Research Platforms, has an extensive training programme for researchers who wish to use Spartan. This includes daylong courses in “Introduction to Linux and HPC Using Spartan”, “Advanced Linux and Shell Scripting for High Performance Computing”, “Parallel Programming On Spartan”, “GPU Programming with OpenACC and CUDA”. Coming soon “Linux Regular Expressions” and “From Spartan to NCI”.
University of Melbourne is a major contributor to the International HPC Certification Program.
https://www.hpccertification.org/
University of Melbourne also contributes to the Easybuild software build system repository https://easybuild.io/
Logging In and Help
To log on to a HPC system, you will need a user account and password and a Secure Shell (ssh) client. Linux distributions almost always include SSH as part of the default installation as does Mac OS 10.x, although you may also wish to use the Fugu SSH client. For MSWindows users, the free PuTTY client is recommended. To transfer files use scp, WinSCP, Filezilla, and especially rsync.
Logins to Spartan are based on POSIX identity for the system ssh or
To consider making an SSH config file, and using passwordless SSH. See: https://dashboard.hpc.unimelb.edu.au/ssh/
For help go to http://dashboard.hpc.unimelb.edu.au or check man spartan. Lots of example scripts at /usr/local/common
Need more help? Problems with submitting a job, need a new application or extension to an existing application installed, if job generated unexpected errors etc., an email can be sent to:
The Linux Environment and Modules
Assumption here is that everyone has had exposure to the GNU/Linux command line. If not, you’d better get some! At least learn the twenty or so basic environment commands to navigate the environment, manipulate files, manage processes. Plenty of good online material available (e.g., my book “Supercomputing with Linux”, https://github.com/VPAC/superlinux)
Environment modules provide for the dynamic modification of the user’s environment (e.g., paths) via module files. Each module contains the necessary configuration information for the user’s session to operate according according to the modules loaded, such as the location of the application’s executables, its manual path, the library path, and so forth.
Modulefiles also have the advantages of being shared with many users on a system and easily allowing multiple installations of the same application but with different versions and compilation options. Sometimes users want the latest and greatest of a particular version of an application for the featureset they offer. In other cases, such as someone who is participating in a research project, a consistent version of an application is desired. Having multiple version of applications available on a system is essential in research computing.
Modules Commands
Some basic module commands include the following:
module help
The command module help , by itself, provides a list of the switches, subcommands, and subcommand arguments that are available through the environment modules package.
module avail
This option lists all the modules which are available to be loaded.
module whatis
This option provides a description of the module listed.
module display
Use this command to see exactly what a given modulefile will do to your environment, such as what will be added to the PATH, MANPATH, etc. environment variables.
More Modules Commands
module load
This adds one or more modulefiles to the user’s current environment (some modulefiles load other modulefiles.
module unload
This removes any listed modules from the user’s current environment.
module switch
This unloads one modulefile (modulefile1) and loads another (modulefile2).
module purge
This removes all modules from the user’s environment.
In the lmod system as used on Spartan there is also “module spider” which will search for all possible modules and not just those in the existing module path and provide descriptions.
(Image from NASA, Apollo 9 “spider module”)
Batch Systems and Workload Managers
The Portable Batch System (or simply PBS) is a utility software that performs job scheduling by assigning unattended background tasks expressed as batch jobs among the available resources.
The original Portable Batch System was developed by MRJ Technology Solutions under contract to NASA in the early 1990s. In 1998 the original version of PBS was released as an opensource product as OpenPBS. This was forked by Adaptive Computing (formally, Cluster Resources) who developed TORQUE (Terascale Opensource Resource and QUEue Manager). Many of the original engineering team is now part of Altair Engineering who have their own version, PBSPro.
In addition to this the popular job scheduler Slurm (originally “Simple Linux Utility for Resource Management”), now simply called Slurm Workload Manager, also uses batch script where are very similar in intent and style to PBS scripts.
Spartan uses the Slurm Workload Manager. A job script written on one needs to be translated to another (handy script available pbs2slurm https://github.com/bjpop/pbs2slurm)
In addition to this variety of implementations of PBS different institutions may also make further elaborations and specifications to their submission filters (e.g., sitespecific queues, user projects for accounting). (Image from the otherwise dry IBM ‘Red Book’ on Queue Management)
Submitting and Running Jobs
Submitting and Running Jobs
Submitting and running jobs is a relatively straight-forward process consisting of:
1) Setup and launch
2) Job Control, Monitor results 3) Retrieve results and analyse.
Don’t run jobs on the login node! Use the queuing system to submit jobs.
1. Setup and launch consists of writing a short script that initially makes resource requests and then commands, and optionally checking queueing system.
Core command for checking queue: Alternative command for checking queue: Core command for job submission:
squeue | less
showq -p cloud | less sbatch [jobscript]
2. Check job status (by ID or user), cancel job.
Core command for checking job in Slurm: Detailed command in Slurm:
Core command for deleting job in Slurm:
squeue -j [jobid] scontrol show job [jobid] scancel [jobid]
3.Slurm provides an error and output files They may also have files for post-job processing. Graphic visualisation is best done on the desktop.
Simple Script Example
#!/bin/bash
#SBATCH partition=physical
#SBATCH time=01:00:00
#SBATCH nodes=1
#SBATCH ntaskspernode=1
module load myappcompiler/version
myapp data
The script first invokes a shell environment, followed by the partition the job will run on (the default is ‘physical’ for Spartan). The next four lines are resource requests, specifically for one compute node, one task. Note default values don’t need to be included.
After these requests are allocated, the script loads a module and then runs the executable against the dataset specified. Slurm also automatically exports your environment variables when you launch your job, including the directory where you launched the job from. If your data is a different location this has to be specified in the path!
After the script is written it can be submitted to the scheduler.
sbatch myfirstjob.slurm
Multithreaded, Multicore, and Multinode Examples
Modifying resource allocation requests can improve job efficiency.
For example shared-memory multithreaded jobs on Spartan (e.g., OpenMP), modify the –cpus- per-task to a maximum of 8, which is the maximum number of cores on a single instance.
#SBATCH cpuspertask=8
For distributed-memory multicore job using message passing, the multinode partition has to be invoked and the resource requests altered e.g.,
#!/bin/bash
#SBATCH partition=physical
#SBATCH nodes=2
#SBATCH ntaskspernode=4
module load myappcompiler/version
srun mympiapp
Note that multithreaded jobs cannot be used in a distributed memory model across nodes. They can however exist be conducted on distributed memory jobs which include a shared memory component (hybrid OpenMP-MPI jobs).
Arrays and Dependencies
Alternative job submissions include specifying batch arrays, and batch dependencies.
In the first case, the same batch script, and therefore the same resource requests, is used multiple times. A typical example is to apply the same task across multiple datasets. The following example submits 10 batch jobs with myapp running against datasets dataset1.csv, dataset2.csv, … dataset10.csv
#SBATCH array=110
myapp ${SLURM_ARRAY_TASK_ID}.csv
In the second case a dependency condition is established on which the launching of a batch script depends, creating a conditional pipeline. The dependency directives consist of `after`, `afterok`, `afternotok`, `before`, `beforeok`, `beforenotok`. A typical use case is where the output of one job is required as the input of the next job.
#SBATCH dependency=afterok:myfirstjobid mysecondjob
Interactive Jobs
For realtime interaction, with resource requests made on the command line, an interactive job is called. This puts the user on to a compute node.
This is typically done if they user wants t
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com