PowerPoint Presentation
Dr Massoud Zolgharni
mzolgharni@lincoln.ac.uk
Room SLB1004, SLB
Dr Grzegorz Cielniak
gcielniak@lincoln.ac.uk
Room INB2221, INB
Dr Grzegorz Cielniak (module coordinator)
• lecturer, researcher in autonomous systems
(robotics, machine vision)
• week 1-6
Dr Massoud Zolgharni
• lecturer, expertise in medical imaging
• week 7-13
Demonstrators
• Jacob Carse, Cheng Hu, Jacobus Lock
Lecture
• Theoretical aspects of parallel
computing
• Friday, 10:00-11:00, AAD0W25
Workshops
• Tutorials & Assignment
• Targeted specifically to GPGPU &
OpenCL
• 2 hours/week
• Group A: Monday, 9:00-11:00, MC3204
• Group B: Monday, 11:00-13:00, MC3204
• Group C: Friday, 15:00-17:00, MC3204
Assessment Item 1
• Coursework – programming assignment, 30%
• Released in week 6
• Code only submission (no report)
• In-class demonstration (weeks B11-B13)
Assessment Item 2
• Exam, 70%
• Paper-based on theory (mock-paper provided after
Easter)
Check Blackboard for all hand-in dates!
Week W/C Lecture Workshop
1 23/01 Introduction –
2 30/01 Architectures
Tutorial-1
3 06/02 Patterns 1
4 13/02 Patterns 2
Tutorial-2
5 20/02 Patterns 3
6 27/02 Patterns 4
Tutorial-3
7 06/03 Communication & Synchronisation
8 13/03 Algorithms 1
Assessment
support
9 20/03 Algorithms 2
10 27/03 Algorithms 3
11 03/04 Performance & Optimisation
Tutorial-4 &
Demonstration
12 24/04 Parallel Libraries
13 01/05 –
Essential:
• Structured Parallel Programming: Patterns
for Efficient Computation, McCool et al. (e-
book)
• An Introduction to Parallel Programming, P.
Pacheco
Recommended:
• Heterogeneous computing with OpenCL:
revised OpenCL 1.2 edition, B. Gaster (e-
book)
• OpenCL in action: how to accelerate
graphics and computation, M. Scarpino
Why parallelism? Motivation
Applications
Overview of module
Why do we parallelise at all?
• to solve bigger problems (more complex, more accurate,
more realistic)
• to solve more problems (faster)
• to reduce power consumption related to the
computation (cheaper)
Parallelism = Optimisation!
Analogy: teamwork
Parallel machines already popular in the 70’s
• long development time
• expensive
Moore’s Law driving optimisation through
increasing speed of serial processors
• much easier/cheaper to simply wait few years for
technology to catch up rather than invest in
complex/expensive architectures
ILLIAC IV
the number of transistors on a chip doubles approximately every two years
Transistor clock speed (clock frequency, clock rate)
how fast switch between on to off → speed of operation
Smaller transistor
shorter critical path
quicker charging/discharging
capacitor
faster clock
more transistors
Transistor count (number of transistors on chip)
• deeper instruction pipelining, superscalar processor
• more operations per time period
• perform more complicated instructions
increase in frequency
↓
decrease in runtime
Power consumption P
• C – capacitance
• V – voltage
• F – frequency
The faster we switch transistors on and off, the more
heat will be generated.
• In reality, increasing F requires increasing V for a given
transistor size and hence:
Transistor
count still
rising
Clock rate
flattening
sharply
Frequency-scaling dominant reason for
improvements in computer performance until 2004
“Free lunch” of automatically faster serial
applications through faster microprocessors ended
Industry-wide shift to parallel computing in form of
multi-core processors
All computers nowadays are Parallel Computers!
Serial Processing Parallel Processing
increase clock rate ≃
simply make person work faster
multiple processing cores ≃
workforce group
organizing, strategy, communication
Autonomous Cars Augmented Reality
Video Games Weather Forecasting
Source: nvidia.com
Different solutions and programmer
support
• pipelines, vector instructions in CPU
• limited access by a programmer, built-in
or through compiler
• multi-core CPUs and multi-processors
• OS level, multithreading libraries (e.g.
Boost.Thread)
• dedicated parallel processing units
(e.g. GPU)
• libraries with different level of
granularity (e.g. OpenCL,
Boost.Compute)
• distributed systems
• distributed parallel libraries (e.g. MPI)
Serial programming is straightforward
• and we know how to teach this subject
Parallel programming
• less intuitive than serial programming due to non-trivial
communication & synchronisation
• many different hardware solutions and software
libraries
• relatively recent adoption by the programming
community
In this module
• focus on general programming patterns and algorithms
• practical programming aspect using a popular/open
specification (OpenCL)
Task: C = A + B
A, B, C – vectors, N – number of elements
for (int i = 0; i < N; i++) C[i] = A[i] + B[i]; par_for (int i = 0; i < N; i++) C[i] = A[i] + B[i]; serial parallel No dependencies so each element can be processed separately for (int i = 0; i < N; i++) b = b + A[i] serial parallel ? Standard C++ • no special compiler/extensions Library-based solution • no special build-system Vendor-neutral • open-standard managed by the Khronos group Multi-platform, portable performance Heterogeneous computing (CPU/GPU/FPGA) Week W/C Lecture Workshop 1 23/01 Introduction - 2 30/01 Architectures Tutorial-1 3 06/02 Patterns 1 4 13/02 Patterns 2 Tutorial-2 5 20/02 Patterns 3 6 27/02 Patterns 4 Tutorial-3 7 06/03 Communication & Synchronisation 8 13/03 Algorithms 1 Assessment support 9 20/03 Algorithms 2 10 27/03 Algorithms 3 11 03/04 Performance & Optimisation Tutorial-4 & Demonstration 12 24/04 Parallel Libraries 13 01/05 - Parallelism is the future of computing Multi-core and many-core era is here to stay due to technology trends It is a rare skill and is likely to be in demand in your future jobs Understanding parallel techniques makes you a better programmer Structured parallel programming: patterns for efficient computation • Section 1.1-1.3 An Introduction to Parallel Programming • Chapter 1