程序代写代做代考 computer architecture c/c++ algorithm cuda c++ GPU finance cache compiler Microsoft PowerPoint – 1-fundamentals-1 [Compatibility Mode]

Microsoft PowerPoint – 1-fundamentals-1 [Compatibility Mode]

High Performance Computing
Course Notes

HPC Fundamentals

2Computer Science, University of Warwick

Contacts details

Dr. Ligang He

Home page: http://www.dcs.warwick.ac.uk/~liganghe

Email: ligang.he@warwick.ac.uk

Office: Room 205

3Computer Science, University of Warwick

Course Administration

Course Format

Monday: 1100-1200 lecture in CS104,

1200-1300 lab session in CS001 and CS003: 1) Practice
the knowledge learned in lectures; 2) Gain foundation skills for
completing assignments; 3) Using the Tinis cluster; 4)
troubleshoot the assignments

Thursday: 1000-1100 Lecture in CS104

Assessment:

15 CATs

70% examined, 30% Assignments

2-hour final exam in Term 3

4Computer Science, University of Warwick

Learning Objectives

•Commonly used models (e.g., OpenMP, MPI, GPU) to
write HPC applications (mainly parallel programs)

• Commonly used HPC platforms (e.g., cluster)

• The means by which to measure, analyse and predict
the performance of HPC applications running on their
supporting HPC platforms

•The role of administration, scheduling and data
management in an HPC management software

5Computer Science, University of Warwick

Materials

•The slides will be made available online after each
lecture

•Relevant reference books, papers and online resources
will be announced throughout the course

6Computer Science, University of Warwick

Lab sessions

 Practising C/C++ programming

 OpenMP programming

 MPI programming

 GPU programming

 Using the Tinis Cluster

 Troubleshooting

7Computer Science, University of Warwick

Assignments

-Two assignments counts 30% of the final mark

-The first assignment counts 10%

-The second assignment counts 20%

– The first assignment involves using OpenMP to write a
parallel program

– The second assignment involve the development of a
parallel application using the Message Passing Interface (MPI)

-Deadlines:

– Assignment 1: 12pm, Feb 5th, 2018; Assignment 2: 12pm, Mar 14th,

2018

8Computer Science, University of Warwick

Introduction

•What is High Performance Computing (HPC)?

•Difficult to answer – it’s a moving target.

• Later 1980s, a supercomputer performs 100m FLOPs

• Today, a typical desktop/laptop performs tens of giga
Flops (e.g., i7 core is about 70 giga Flops)

• Today, a supercomputer typically performs hundreds of
Tera Flops

• Sunway Taihulight, No. 1 in Top 500 list, 93 Peta Flops – China

• TianHe-2: No.2, 33.8 Peta Flops – China

• Titan: No.3 in Top 500 list, 17.6 Peta Flops – US (No. 1, 2012)

• The entry level in the Top 500 list is 548.7 Tera Flops

• The entry level last year is 349.3 Tera Flops

• The entry level in Nov 2012 is 76.5 Tera Flops
Note: Mega (106), giga (109), tera (1012), peta (1015), exa (1018)

9Computer Science, University of Warwick

•What is High Performance Computing (HPC)?

O(1000) more powerful than the latest desktops

If using i7 core as the baseline, which is about 70 giga Flops

A HPC system should have the performance of 70 tera Flops

10Computer Science, University of Warwick

Growth of performance in Top500

– Performance increases by ten folds every four years

– Moore’s law (double every 18 months): better or worse?

11Computer Science, University of Warwick

Applications of HPC

•HPC is driven by demand of
computation-intensive
applications from various areas

• Weather forecast

• Weather model captures the
relation among weather parameters

12Computer Science, University of Warwick

Governing Equation of Weather Forecast

g
z

p

ρ

1

z

w
w

y

w
v

x

w
u

t

w

fu-
y

p

ρ

1

z

v
w

y

v
v

x

ρ
u

t

v

fv
x

p

ρ

1

z

u
w

y

u
v

x

u
u

t

u





















Q
z

θ
w

y

θ
v

x

θ
u

t

θ 






Momentum equations

Thermodynamic equation

V
z

w
y

v
x

u
t












Mass continuity equation

RTp 

Ideal gas law

z

q
w-

y

q
v-

x

q
-u

t

q
micro(q)+


=

Moisture equation

– Impossible to use math derivation to solve the
equation;
– Use the numerical method

13Computer Science, University of Warwick

Applications of HPC

•HPC is driven by demand of
computation-intensive
applications from various areas

• Weather forecast

• Finance (e.g. predict the trend
of the stock market)

• Biology, neuroscience (e.g.
simulation of brains)

14Computer Science, University of Warwick

An HPC application in neuroscience

– Project: Blue Brain

– Aim: construct a virtual brain

– Building blocks of a brain are neurocortical columns

– A column consists of about 60,000 neurons, interacting with each other

– First step: simulate a single column (each processor acting as
one neuron)

– Then: simulate a small network of columns

– Ultimate goal: simulate the whole human brain

– Scale of the problem:

– Human brain contains millions of such columns

15Computer Science, University of Warwick

Applications of HPC

•HPC is driven by demand of
computation-intensive
applications from various areas

• Weather forecast

• Finance (e.g. modelling the
trend of the stock market)

• Biology, neuroscience (e.g.
simulation of brains)

• Engineering (e.g. simulations
of a car crash)

16Computer Science, University of Warwick

Simulation of Car Crash

17Computer Science, University of Warwick

Applications of HPC

•HPC is driven by demand of
computation-intensive
applications from various areas

• Weather forecast

• Finance (e.g. modelling the
trend of the stock market)

• Biology, neuroscience (e.g.
simulation of brains)

• Engineering (e.g. simulations
of a car crash)

• Military and Defence (e.g.
modelling explosion of nuclear
bombs)

18Computer Science, University of Warwick

Related Technologies

•HPC covers a wide range of technologies:

• Computer architecture

• CPU, memory,

• VLSI: transistors

• increasingly difficult (density and heat)

• multicore,

19Computer Science, University of Warwick

20Computer Science, University of Warwick

21Computer Science, University of Warwick

Related Technologies

•HPC covers a wide range of technologies:

• Computer architecture

• CPU, memory,

• VLSI: transistors

• increasingly difficult (density and heat)

• multicore,

• GPU

22Computer Science, University of Warwick

Related Technologies

•HPC covers a wide range of technologies:

• Computer architecture

• Networking

• bandwidth, latency,

• communication protocols,

• Network topology

23Computer Science, University of Warwick

Related Technologies

•HPC covers a wide range of technologies:

• Computer architecture

• Networking

• Compilers

• Identify inefficient implementations

• Make use of the characteristics of the computer architecture

• Choose suitable compiler for a certain architecture

24Computer Science, University of Warwick

Related Technologies

•HPC covers a wide range of technologies:

• Computer architecture

• Networking

• Compilers

• Algorithms

• Design algorithm -> choose the language and write the program to implement it

• Design parallel algorithm: partition the task into sub-tasks, collaboration among
multiple CPUs

• Choose the parallel programming paradigm and implement the algorithm

25Computer Science, University of Warwick

Related Technologies

•HPC covers a wide range of technologies:

• Computer architecture

• Networking

• Compilers

• Algorithms

• Workload and resource manager

• A big HPC system handles many parallel programs from different users

• Task scheduling and resource allocation

• metrics: system throughput, resource utilization, mean response time

26Computer Science, University of Warwick

Related Technologies

•HPC covers a wide range of technologies:

• Computer architecture

• Networking

• Compilers

• Algorithms

• Workload and resource manager

27Computer Science, University of Warwick

History and Evolution of HPC Systems

1960s: Scalar processor

 Process one data item at a time

28Computer Science, University of Warwick

Scalar processor

29Computer Science, University of Warwick

History and Evolution of HPC Systems

1960s: Scalar processor

1970s: Vector processor

 Can process an array of data items in one go

 Architecture: one master processor and many math co-
processors (ALU)

 Each time, the master processor fetches an instruction and a
vector of data items and feed them to ALUs

 Overhead: more complicated address decoding and data
fetching procedure

 Difference between vector processor and scalar processor

30Computer Science, University of Warwick

GPU (Vector processor)

GPU: Graphical Processing Unit

GPU is treated as a PCIe device by the main CPU

31Computer Science, University of Warwick

32Computer Science, University of Warwick

GPU (Vector processor)

GPU: Graphical Processing Unit

GPU is treated as a PCIe device by the main CPU

33Computer Science, University of Warwick

Data processing on GPU

– CUDA: programming on GPU

– Get the array A and B in one memory access operation

– Different threads process different data items

– If no much parallel processing, slower on GPU due to overhead

34Computer Science, University of Warwick

History and Evolution of HPC Systems

1960s: Scalar processor

1970s: Vector processor

Later 1980s: Massively Parallel Processing (MPP)

 Up to thousands of processors, each with its own memory

 Processors can fetch and run instructions in parallel

 Break down the workload in a parallel program

• Workload balance and processor communications

 Difference between MPP and vector processor

35Computer Science, University of Warwick

Architecture of BlueGene/L (MPP)

 Create a philosophy of using a massive number of low
performance processors to construct supercomputers

36Computer Science, University of Warwick

History and Evolution of HPC Systems

1960s: Scalar processor

1970s: Vector processor

Later 1980s: Massively Parallel Processing (MPP)

Later 1990s: Cluster

 Connecting stand-alone computers with high-speed network
(over-cable networks)

• Commodity off the shelve computers

• high-speed network: Gigabit Ethernet, infiniband

• Over-cable network vs. on-board network

 Not a new term itself, but renewed interests

• Performance improvement in CPU and networking

• Advantage over custom-designed mainframe computers: Good
portability

37Computer Science, University of Warwick

Cluster Arechitecture

38Computer Science, University of Warwick

History and Evolution of HPC Systems

1960s: Scalar processor

1970s: Vector processor

Later 1980s: Massively Parallel Processing (MPP)

Later 1990s: Cluster

Later 1990s: Grid

 Integrate geographically distributed resources

 Further evolution of cluster computing

 Draw an analogue from Power grid

39Computer Science, University of Warwick

History and Evolution of HPC Systems

1960s: Scalar processor

1970s: Vector processor

Later 1980s: Massively Parallel Processing (MPP)

Later 1990s: Cluster

Later 1990s: Grid

Since 2000s: Multicore computing

– Release the pressure of further increasing the transistor density

– Multiple cores reside on one CPU chip (processor)

– There can be multiple CPU chips (processors) in one computer

– Multicore computers are often interconnected to form a cluster

– On-board communication and over-cable communication

40Computer Science, University of Warwick

Architecture Types

 All previous HPC systems can be divided into two
architecture types

• Shared memory system

• Distributed memory system

41Computer Science, University of Warwick

Architecture Types

 Shared memory (uniform memory access – SMP)

• Multiple CPU cores, single memory, shared I/O (Multicore CPU)

• All resources in a SMP machine are equally available to each
core

• Due to resource contention, uniform memory access systems do
not scale well

• CPU cores share access to a common memory space.

• Implemented over a shared system bus or switch

• Support for critical sections is required

• Local cache is critical:

• If not, bus/switch contention (or network traffic) reduces the systems
efficiency.

• Cache introduces problems of coherency (ensuring that stale cache
lines are invalidated when other processors alter shared memory).

42Computer Science, University of Warwick

Architecture Types

•Shared memory (Non-Uniform Memory Access: NUMA)

• Multiple CPUs

• Each CPU has fast access to its local area of the memory, but slower
access to other areas

• Scale well to a large number of processors due to the hierarchical
memory access

• Complicated memory access pattern: local and remote memory address

• Global address space.

Node of a NUMA machine

NUMA machine

43Computer Science, University of Warwick

Architecture Types

 Distributed Memory (MPP, cluster)

• Each processor has it’s own independent memory

• Interconnected through over-cable networks

• When processors need to exchange (or share data), they must
do this through an explicit communication

• Message passing (MPI language)

• Typically larger latencies between processors

• Scalability is good if the task to be computed can be divided
properly

44Computer Science, University of Warwick

Parallel computing vs. distributed computing

•Parallel Computing

• Breaking the problem to be computed into parts that can
be run simultaneously in different processors

• Example: an MPI program to perform matrix multiplication

• Solve tightly coupled problems

•Distributed Computing

• Parts of the work to be computed are computed in
different places (Note: does not necessarily imply
simultaneous processing)

• An example: running a workflow in a Grid

• Solve loosely-coupled problems (no much
communication)

45Computer Science, University of Warwick

Lab session today – Practising C/C++

 Write a “Hello World” program

 Calculate factorials

 Work with pointers

 Allocating memory

 Classes in C++

 Use gdb for debugging

Download the lab session sheet today from this link:
https://warwick.ac.uk/fac/sci/dcs/teaching/material/cs402/cs402_seminar1-C.pdf

46Computer Science, University of Warwick

Lets move down to Lab CS001 and CS003 now!