Overview Admin The need for parallel programming Remainder of the module
XJCO3221 Parallel Computation
1 University of Leeds
Lecture 1: Introduction
Copyright By PowCoder代写 加微信 powcoder
XJCO3221 Parallel Computation
Admin The need for parallel programming Remainder of the module
This lecture
This lecture
This lecture we will cover:
Materials available for this module.
Assessments (3×coursework plus exam), with deadlines.
How historical trends in computer architectures have lead to the current ubiquity of parallel machines.
The three classes of parallel architecture we will look at. Module overview
XJCO3221 Parallel Computation
The need for parallel programming Remainder of the module
Module admin
Programming language and books Objectives and syllabus
Module adminstration
Lecture slides
All lectures will be made available on Minerva at least 24
hours in advance of the timetabled slots
Worksheets
Formative, i.e. not assessed.
A new worksheet will appear on Minerva prior to lectures 2, 8 and 14 respectively.
Specimen answers will appear on Minerva roughly 1 week after the corresponding lecture.
Computer resources
All computer assignments will be undertaken using the Cloud Accounts have been created for you: ONLY use for work relating to THIS module
Access details provided as part of worksheets/assignments
XJCO3221 Parallel Computation
The need for parallel programming Remainder of the module
Module admin
Programming language and books Objectives and syllabus
Other support
Discussion on is a forum for each part of the module. Post your queries relating to lectures, worksheets,
assessments, etc.
Ensure everyone sees the same guidance.
Check first your query has not already been answered.
For you to practice coding exercises (worksheets) and courseworks.
Please see Joint School for locations/times.
XJCO3221 Parallel Computation
The need for parallel programming Remainder of the module
Module admin
Programming language and books Objectives and syllabus
Assessments (summative)
Assessment is by both coursework and exam: 50% spread over 3 items of coursework. 50% in a closed book exam.
Each coursework is code–only submissions via Minerva.
Please test your submissions on the cloud service – even
if you develop and run on your own computer!
Each worksheet covers similar material to each coursework, so
attempting the current worksheet prior to the assignment will help you significantly with the next coursework.
XJCO3221 Parallel Computation
The need for parallel programming Remainder of the module
Module admin
Programming language and books Objectives and syllabus
Coursework schedule and deadlines
Coursework Weight
Release Deadline
15th March Tuesday 29th March 5th 19th April
19th 3rd attempting the courseworks you should familiarise yourself
1 15% 2 20% 3 15%
with the relevant material:
Coursework Up to and including
1 Lecture 6
2 Lecture 11
3 Lecture 16
XJCO3221 Parallel Computation
The need for parallel programming Remainder of the module
Module admin
Programming language and books
Objectives and syllabus
For this module we will use C.
We will cover three different parallel libraries/API’s, and the
only languages that cover all of them are C/C++. Since our codes are short, we will just use C, not C++. Will provide starting codes in C for each coursework. Coursework submissions must be in C.
If you have not programmed in C for a while you may like to revise XJCO1711 Procedural Programming.
We will mostly use loops, conditionals, arrays and pointers. XJCO3221 Parallel Computation
The need for parallel programming Remainder of the module
Module admin
Programming language and books
Objectives and syllabus
For additional information on for parallel programming in general:
Parallel Programming, Wilkinson and Allen (Pearson). Old (2nd ed. 2005), covers CPU architectures but not GPU.
Many examples, though some only schematic. Multiple copies in UoL libraries (1st and 2nd eds.)
Structured Parallel Programming, McCool, Robison and Reinders (Morgan-Kauffman, 2012).
Modern, focuses on patterns of parallel algorithm design. Few code examples, mainly for shared memory systems. eBook available via the library.
Books for specific architectures will also be mentioned when introduced. You do not need to buy any of these books.
XJCO3221 Parallel Computation
The need for parallel programming Remainder of the module
Module admin
Programming language and books Objectives and syllabus
Why this module?
Almost all1 modern computers and devices fall into one of three classes of parallel architecture.
Software must be parallelised to use these resources.
This is the job of the programmer, i.e. you. Popular APIs/frameworks are constantly changing.
No point focusing on any one as it may not last.
Need to develop portable skills in parallel algorithm design that can be applied to current and future APIs, frameworks and architcetures.
1With very few exceptions, e.g. feature phones etc.
XJCO3221 Parallel Computation
The need for parallel programming Remainder of the module
Module admin
Programming language and books Objectives and syllabus
Objectives and learning outcomes
Objectives: This module will introduce the fundamental skills and knowledge required to develop parallel computer software.
Learning outcomes: On successful completion of this module a student will have demonstrated the ability to:
Recall key concepts of parallel software and hardware. Apply parallel design paradigms to serial algorithms.
Evaluate and select appropriate parallel solutions for real world problems.
Generalise parallel concepts to future hardware and software developments.
Skills outcomes: Programming, design, performance measurement, evaluation.
XJCO3221 Parallel Computation
The need for parallel programming Remainder of the module
Module admin
Programming language and books Objectives and syllabus
This module covers the following 3 topic areas:
Parallel programming design patterns: Work pools, data parallelism, synchronisation, locks, MapReduce and atomic instructions.
Parallel computation models: shared memory parallelism (SMP), distributed memory parallelism and general purpose graphics processing unit (GPGPU).
Common frameworks: OpenMP, Message passing interface (MPI) and OpenCL.
XJCO3221 Parallel Computation
Overview Admin The need for parallel programming Remainder of the module
Trends in computing technology
Architectural improvements
Classes of parallel architecture
Key concepts in parallel architectures
Background and motivation
Early computers followed the so-called von Neumann architecture (1946):
Based on Turing’s universal machine (1936). Fundamentally sequential, i.e. processes a
series of instructions.
Central Processing Unit
Control Unit
Arithmetic / Logic Unit
Input device
Output device
Memory Unit
XJCO3221 Parallel Computation
Turing (top), von Neumann (bottom) (from Wikipedia)
Overview Admin The need for parallel programming Remainder of the module
Trends in computing technology
Architectural improvements
Classes of parallel architecture
Key concepts in parallel architectures
Moore’s law
In 1965 made the empirical observation that the number of transistors on a chip doubles every 18-24 months.
This is known as Moore’s law and holds to this day.
XJCO3221 Parallel Computation
Component cost versus no. of components.
Projected 1970 data.
Exponential increase of the most cost-effective number of components.
Overview Admin The need for parallel programming Remainder of the module
Trends in computing technology
Architectural improvements
Classes of parallel architecture
Key concepts in parallel architectures
Processor speeds also used to follow Moore’s law, but stopped
After Moore’s law | Technology Quarterly | The Economist
around 15 years ago at ≈ 3.3GHz (ignoring overclocking).
http://www.economist.com/technology-quarterly/2016-03-12/after-moores-law
XJCO3221 Parallel Computation
That fulfilment was made possible largely
Overview Admin The need for parallel programming Remainder of the module
Trends in computing technology
Architectural improvements
Classes of parallel architecture
Key concepts in parallel architectures
Limitations on clock speed
Increased frequencies result in greater leakage and greater power
consumption1 :
P = αCLV2f
P is the processor’s power consumption. CL is a load capacitance.
V is the supply voltage.
f is the frequency.
However, V ∝ f , so P ∝ f3.
Rapid increase that exceeds 100W for f ≈ 3.3GHz. Unsustainable even with sophisticated cooling technology.
1You don’t need to learn this equation [which was taken from Parallel Programming, 2nd ed., Rauber and Ru ̈nger (Springer, 2013)].
XJCO3221 Parallel Computation
Overview Admin The need for parallel programming Remainder of the module
Trends in computing technology
Architectural improvements
Classes of parallel architecture
Key concepts in parallel architectures
ILP: Instruction Level Parallelism
Chip designers have tried various architectural improvements to increase performance (memory cache, speculative execution etc.)
One is pipelining, where different stages of subsequent instructions are overlapped.
Only one fetch, decode etc. at any given time.
instructions
Write back
Write back
Write back
Write back
This instruction level parallelism (ILP) is limited to around 10-20 instructions.
We say it does not scale.
XJCO3221 Parallel Computation
Overview Admin The need for parallel programming Remainder of the module
Trends in computing technology Architectural improvements
Classes of parallel architecture
Key concepts in parallel architectures
Multi-core CPUs (Lectures 2-7)
These architectural improvements did not require changes to code. Legacy sequential code automatically benefited.
Each improvement has limitations that have not been overcome. Starting around 2005, chips for consumer machines have been
multi-core, where each core has distinct control flows. For a few cores, can run applications simultaneously.
With new chips having many cores (6, 8, 12, 16, 24, 28, . . . ), running one application per core is not feasible.
Single applications need to use multiple cores. Requires new program logic.
and XJCO3221 Parallel Computation
Overview Admin The need for parallel programming Remainder of the module
Trends in computing technology Architectural improvements
Classes of parallel architecture
Key concepts in parallel architectures
Clusters / Supercomputers (Lectures 8-13)
Even before clock speeds plateaued, some applications used multiple machines.
Scientific computing. Weather forecasting. …
PFlops = petaflops = 1015 floating point operations per second;
EFlops = exaflops = 1018 floating point operations per second.
https://www.top500.org/statistics/perfdevel
XJCO3221 Parallel Computation
Overview Admin The need for parallel programming Remainder of the module
Trends in computing technology Architectural improvements
Classes of parallel architecture
Key concepts in parallel architectures
GPGPU (Lectures 14-19)
In the mid-1990s, the rise of grahical applications (especially games) drove the development of graphics accelerators:
Chips specialised to computations for 2D/3D graphics.
In 2006 Nvidia released the first graphics card capable of general
purpose calculations using its CUDA architecture. GPGPU = General Purpose Graphics Processing Unit. Now supported by most manufacturers via OpenCL.
Suitable for other applications including machine learning. GPUs are part of the deep learning revolution.
Now have dedicated neural processing units (NPUs).
XJCO3221 Parallel Computation
Overview Admin The need for parallel programming Remainder of the module
Trends in computing technology Architectural improvements
Classes of parallel architecture
Key concepts in parallel architectures
Precedent from nature
Arguably the most complex system known is the human brain. If regarded as a computer, it would be massively parallel:
Synapse speeds are about 5 ms, so the ‘clock speed’ would be less than 1kHz.
We have about 1011 neurons, each connected to 104 others.
The current fastest supercomputer has ≈ 107 cores.
http://scitechconnect.elsevier.com
XJCO3221 Parallel Computation
Overview Admin The need for parallel programming Remainder of the module
Trends in computing technology Architectural improvements
Classes of parallel architecture
Key concepts in parallel architectures
Parallel versus concurrent
Two or more applications run concurrently if they both execute ‘in the same time frame.’
i.e. a multi-tasking OS, where processes are swapped in and out without the user noticing.
Possible on a single-core architecture.
Whereas parallel applications actually perform calculations
simultaneously on a parallel architecture.
Parallelism implies concurrency, but not vice versa, i.e.
Parallel ⊂ Concurrent
and XJCO3221 Parallel Computation
Overview Admin The need for parallel programming Remainder of the module
Trends in computing technology Architectural improvements
Classes of parallel architecture
Key concepts in parallel architectures
Shared versus distributed memory
Shared memory
All cores can ‘see’ whole of main memory.
e.g. multi-core CPU.
Distributed memory
Cores only see a fraction of the total memory.
e.g. cluster, supercomputer.
Core 4 CPU
XJCO3221 Parallel Computation
Overview Admin The need for parallel programming Remainder of the module
Trends in computing technology Architectural improvements
Classes of parallel architecture
Key concepts in parallel architectures
Computation versus communication
Moving data to and from the cores also affects performance:
Registers for each processing unit.
Cache memory to registers.
Main memory to cache memory.
Fast communication in high-performance clusters (e.g. InfiniBand, Gigabit Ethernet).
Local area network communication (e.g. Ethernet). File I/O.
Wide area network communication (e.g. the internet).
XJCO3221 Parallel Computation
Overview Admin The need for parallel programming Remainder of the module
Trends in computing technology Architectural improvements
Classes of parallel architecture
Key concepts in parallel architectures
Flynn’s taxonomy1
Characterises parallel architectures by data and control flows.
Instruction/data streams
Single Instruction, Single Data
Single-core CPU
Single Instruction, Multiple Data
GPU (also SIMT; c.f. Lec- ture 14)
Multiple Instruction, Multiple Data
Multi-core CPU; cluster/su- percomputer
Multiple Instruction, Single Data
Specialist hardware only
1Flynn, IEEE Transactions on Computers 21, 948 (1972).
XJCO3221 Parallel Computation
Overview Admin The need for parallel programming Remainder of the module
Module overview
Next lecture
Module overview
Introduction
Shared memory parallelism
C with OpenMP
Worksheet 1 and Coursework 1
Distributed memory parallelism
Worksheet 2 and Coursework 2
General purpose GPU
OpenCL (a C-based language) Worksheet 3 and Coursework 3
Module review
XJCO3221 Parallel Computation
Admin Module overview
The need for parallel programming Next lecture Remainder of the module
Next lecture
Next lecture is the first of six on shared memory parallelism: Relevant to multi-core architectures, such as on modern
laptops, desktops, tablets and phones.
Overview typical hardware architecture, including memory
Look at some language and library support. How to install and run OpenMP programs.
XJCO3221 Parallel Computation
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com