Introduction to
aka “Where to begin…” https://warwick.ac.uk/fac/sci/dcs/teaching/material/cs402/ 10/01/2022 ● CS402/922 High Performance Computing ● ●
10/01/2022
Copyright By PowCoder代写 加微信 powcoder
What is High Performance Computing (HPC)?
Big machines?
• Study of performing huge calculations on massive machines as fast as possible
• Often done by increasing the number of Floating Point Operations per second (FLOP/s)
• PS5à10.3 TFLOP/s
• ASCI Red (SNL)à1.068 TFLOP/s (First Teraflop
computer, around 1997)
• Roadrunner (LANL)à1.7 PFLOP/s (First Petaflop computer, built in 2008)
• Fugakuà0.442 EFLOP/s (Current fastest machine in the world (as of November 2021))
Supercomputer FLOP/s per year
1980’s 1990’s 2000’s 2010’s 2021/22 ????
10/01/2022
What is High Performance Computing (HPC)?
Big machines?
• Most HPC machines consists of a large collection of smaller machines
• To increase performance:
• Build/Improve highly parallel code
• Make transistors smaller (so we can add more in the same space)
• Add more cores to processors (so we can parallel programs more)
• Add more smaller machines to the overall machine
10/01/2022
Moore’s Law
Speedy thing gets smaller, speedy thing gets faster
• Moore’s Law stated in 1965, that:
For a given size of chip, the number of components
would double every year
• . Moore would later start a small chip company; Intel…
• This held largely true, but was revised to a doubling every two years
• Current processors are down to 14nm to 5nm
About 10 silicon atoms!
10/01/2022
Amdahl’s Law
More processors, more speed!
• Proposed in 1967 by
• Shows the relation between serial portions and parallel portions in a program
• Serial improvementsàIncrease clock speed and/or core complexity
• Parallel improvementsàIncrease number of cores
Proportion of time spentintheserial parts of the program
𝑝 is the proportion of the program that can be
madeparallel
𝑛 𝑛 is the number of processors when running
the parallel code
Proportion of time spent in the parallel part of the program
10/01/2022
Flynn’s Taxonomy
There’s more than one way to parallelise code‽
• (in 1966) generalised a computer to 2 streams:
• Instruction StreamàProvides a list of operations
• Data StreamàProvides a list of data
• Each of these streams can be accessed
by a single, or multiple processor unit
• Therefore we have 4 different paradigms
10/01/2022
Single Instruction, Single Data (SISD)
Think one person focusing on one thing at a time…
• Simplest conceptàserial processors
• One instruction is performed on a single piece of data
• Mainly seen in older machines
• To improve these processors:
• Increase clock speed Requires higher voltages and more cooling • Decrease transistor size to increase number of transistors
• Increased complexity They already were complex!
• Very expensive to design and improve these days…
Hit(ting) the limit…
10/01/2022
Single Instruction, Multiple Data (SIMD)
Think one person doing multiple things at the same time…
• One instruction is applied to multiple pieces of data in the same clock cycle
• Often achieved through vectorisation on CPU’s
• Most common implementation is Intel’s Streaming SIMD
Extension (SSE) and it’s many versions
• Advanced Vector Extension (AVX) allowed for larger
registers and more complex instructions • Latest versionàAVX-512
• Also seen with modern GPU processors
For more info, have a look at CS257 Advanced Computer Architecture module
10/01/2022
Multiple Instructions, Single Data (MISD)
Think multiple people doing different stuff on the same thing… at the same time…
• One piece of data is operated upon by multiple instructions within the same clock cycle
• Very few algorithms fit this requirement
• Usually very bespoke computers used in managing
redundancy
• Not often used in HPC
10/01/2022
Multiple Instructions, Multiple Data (MIMD)
Think lots of people each doing there own thing…
• Multiple instructions are issues to multiple pieces of data within the same clock cycle
• Most common parallelisation methodologies used in HPC systems
• Allows for independence across unrelated instructions
10/01/2022
Levels of Parallelism
Like layers of a onion…
• Pipelining
• Vectorisation
Instruction Thread
Process Job
• GPU programming
Covered in CS257 Advanced Computer Architecture module
• Independent codes • Queue-based
Covered in this module
Granularity Complexity
10/01/2022
Why do we need HPC?
Met Office – Weather
Surely my laptop can do all of this stuff!
• Modelling the weather with increased accuracy
• Simulate natural disasters and there effects
• C40àXeon E5-2695v4 processors
• 241,920 compute cores with 116,032GB of RAM
• Ranked 58 in the world (June 2021)
• Computes data over 215 billion weather observations each day
https://www.metoffice.gov.uk/binaries/content/gallery/ metofficegovuk/images/about-us/what-we-do/01202- supercomputer-benefits-tiles-v8-01_web.jpg
10/01/2022
Why do we need HPC?
Surely my laptop can do all of this stuff!
Rolls Royce – ASiMoV project
• Need to gain more efficiency out of jet engines
• Increase power to weight ration
• Increase thrust
• Reduce amount of fuel required, allowing for greener flights
• Aim is to simulate an entire engine with very high levels of fidelity
10/01/2022
Current State of HPC
In the year 2022…
• 4 of the top 10 supercomputers are in the US
• 2 of the top 10 supercomputers are in China
• Top machines in the UKàMet Office @ 58 and University of Cambridge @ 100
• Number of GPU’s being included in supercomputers is increasing
• Increasing diversity in CPU’s, GPU’s and memory layout
10/01/2022
Key Information about the module
• 2 lectures a week
• Monday 3pm, Room MS.05 • Tuesday 1pm, Room MS.04
Everyone loves admin …
• 2 assessments
• 1 exam at the end of the academic year
https://campus.warwick.ac.uk/?cmsid=2498&project_id=1 https://campus.warwick.ac.uk/?cmsid=2500&project_id=1
10/01/2022
Labs & Assessments
• 1 lab session a week
• Check Tabula for your particular lab
• Two parts of the modules assessment • 2 assignments
• Week3Term2–OpenMP • 10%ofthemodulemark • Deadline: 8th February
• Week6Term2–MPI
• 20%ofthemodulemark • Deadline: 17th March
*rolls up sleeves*
Possible lab dates
Monday 4pm, Room MSB3.17 Monday 5pm, Room MSB3.17
https://campus.warwick.ac.uk/?cmsid=17306&project_id=1
• 2 hour exam (Term 3) – 70% of the module mark
10/01/2022 Slide 17
Structure of the Module
Everyone loves a plan …
Lecture Topics
Fundamentals of HPC
Programming Models
Lab Topics
We are here!
This may change, keep an eye on the module webpage for latest info!
Deadline for Coursework 1
Thread-level Parallelism
Coursework 1
Coursework 2
Calculating Improvements
C and C++ programming
DEQN (Lab for CW1)
Troubleshooting CW1
MPI & Intro to Clusters
Karman (Lab for CW2)
Advanced MPI
Troubleshooting CW2
Deadline for Coursework 2
Cluster Networking
10/01/2022
Interesting related reads
Some of this might even be fun…
• Post Moore’s Law
• T. N. Theis and H.-S. P. Wong. The End of Moore’s Law: A New Beginning for Information Technology. Computing in Science Engineering,
19(2):41–50, 2017
• R. S. Williams. What’s Next? [The end of Moore’s law]. Computing in Science & Engineering, 19(2):7–13, 2017
• Zetascale and Exascale Papers
• X. ke Liao, K. Lu, C. qun Yang, J. wen Li, Y. Yuan, M. che Lai, L. bo Huang, P. jing Lu, J. bin Fang, J. Ren, and J. Shen. Moving from exascale to zettascale computing: challenges and techniques. Frontiers of Information Technology & Electronic Engineering, 19(10):1236 –1244, October 2018
• L. A. Parnell, D. W. Demetriou, V. Kamath, and E. Y. Zhang. Trends in High Performance Computing: Exascale Systems and Facilities Beyond the First Wave. In 2019 18th IEEE Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic Systems (ITherm), pages 167 – 176, Las Vegas, NV, May 2019. IEEE Computer Society, Los Alamitos, CA
• Top500àhttps://www.top500.org/
Next lecture: Further Fundamentals of HPC
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com