18-646 – How to Write Fast Code II
1
Carnegie Mellon University
How to Write Fast Code?
Fast Platforms
Multicore platforms Manycore platforms Cloud platforms
Good Techniques
Data structures
Algorithms
Software Architecture
Course Goals
To write fast code for your research/application, you should:
1. Feel comfortable hacking up a solution
2. Leverage existing software building blocks
3. Indicate which platform is the best one to use
4. Reason about why a piece of existing code is slow
5. Take care of potential performance bottlenecks
18-646 – How to Write Fast Code II 2
Course Staff
Instructor: Ian Lane
Preferred contact method: Piazza
Office Hours: Wed 4:30-5:30pm ET Email: ianlane@andrew.cmu.edu
Teaching Assistants
Deepak Arumugam Sankara Subramanianh
Yunfan Jiang (Silicon Valley)
18-646 – How to Write Fast Code II 3
Course Information
Lectures:
Tuesday and Thursday 6:00pm-7:20pm ET
Office Hours:
Instructor Office Hours: Wednesdays 4:30pm-5:30pm ET TA Office Hours: TBD
Grading:
10% – Homeworks 30% – Mini-Projects 30% – Term Project 30% – Final Exam
Course Links:
Canvas: https://canvas.cmu.edu/courses/21510/pages/course-schedule Piazza: https://piazza.com/class/kkmp02yc92h598
Gradescope: https://www.gradescope.com/courses/241050
18-646 – How to Write Fast Code II 4
https://canvas.cmu.edu/courses/21510/pages/course-schedule
18-646 – How to Write Fast Code II 5
https://piazza.com/class/kkmp02yc92h598 Sign Up Link: https://piazza.com/cmu/spring2021/18646
18-646 – How to Write Fast Code II 6
https://www.gradescope.com/courses/241050
18-646 – How to Write Fast Code II 7
Outline
Why Fast Code Now?
What does 100x speedup mean? Course structure and organization Problem solving when writing fast code
18-646 – How to Write Fast Code II 8
Why Fast Code?
Need is driven by the applications… …NOT by the availability of the platforms.
18-646 – How to Write Fast Code II 9
RMS Appears Everywhere
Recognition: Interpretations of the world (with models)
Mining: Understanding of the world (discover hidden patterns with model) Synthesis: Anticipate outcomes in the world (using the models to predict)
18-646 – How to Write Fast Code II 10
What is being valued today?
int main(void) { mykernel<<<1,1>>>(); printf(“Hello World!\n”); return 0;
}
Hardware Software
Data
Commoditized Open Sourced
King!
18-646 – How to Write Fast Code II
11
Outline
Why Fast Code Now?
What does 100x speedup mean? Course structure and organization Problem solving when writing fast code
18-646 – How to Write Fast Code II 12
What does 100x Speedup Mean?
12 hours: 10x speedup à 1.2 100x speedup à 7.2 1000x speedupà 43.2
Game changing technology advances Overnight jobs becomes interactive
Speech Analytics Medical Imaging Image Recognition
hours
minutes
seconds
18-646 – How to Write Fast Code II
13
Computational Finance
Fast, Robust Pediatric MRI
100X faster reconstruction Higher-quality, faster MRI
This image: 8 month-old patient with cancerous mass in liver
256 x 84 x 154 x 8 data size
Serial Recon: 1 hour
Parallel Recon: less than 1 minute
Fast enough for clinical use
Software currently deployed at Lucile Packard Children’s Hospital for clinical study of the reconstruction technique
18-646 – How to Write Fast Code II 14
Support-Vector Machines
Algorithmic changes and parallel implementation lead to performance speedup: Core 2 Duo versus G80
Computation
LIBSVM
Our algorithm2- core Parallel CPU
Our algorithm, 16- core Parallel GPU
SVM Training (geo-mean)
771.8 s
— 38.5 s
SVM 41.42 s 4.21 s 0.38 s Classification
100X speed-up
896 downloads since release in 10/2008
18-646 – How to Write Fast Code II
15
Image Contours Detection
1 0.8 0.6 0.4 0.2 0
We achieve equivalent accuracy on the Berkeley Segmentation Dataset
Comparing to human segmented “ground truth”
F-measure 0.70 for both
Human agreement = 0.79
3.8 minutes to 1.8 seconds: 126x speedup
616 downloads since release in October 2009
CVPR 2008
Damascene
0 0.2 0.4 0.6 0.8 1
Recall
18-646 – How to Write Fast Code II
16
Precision
Computational Finance
Value-at-Risk Computation with Monte Carlo Method
Summarizes a portfolio’s vulnerabilities to market movements
Important to algorithmic trading, derivative usage
Improved implementation to run
60x faster on a parallel microprocessor
Four Steps of Monte Carlo Method in Finance
18-646 – How to Write Fast Code II
17
Uniform Random Number Generation
Market Parameter Transformation
f(x)
Instrument Pricing
Data Assimilation
NY Times TIFF to PDF
In 2007, the New York Times decided to make all the public domain articles from 1851-1922 available free of charge
Needed to convert from TIFF to PDF
Data set:
Compute Instances:
Time taken: Cost:
4TB of raw image TIFF to 11 million PDF 100 Amazon EC2 instances
24 hours $240
http://open.blogs.nytimes.com/2007/11/01/self-service-prorated-super-computing-fun/
18-646 – How to Write Fast Code II 18
18-646 – How to Write Fast Code II 19
Philosophy on Platforms
Provide theoretical background and hands-on practices…
…to innovate with multicore/manycore/cloud-based platforms.
Intel Sandy Bridge Multicore Processor (Core i7-2600K)
NVIDIA Fermi Manycore Processor – GTX580
Yahoo! Hadoop Cluster ~2000 nodes in one cluster
Significant scientific advances will be empowered by multicore/manycore/cloud- based platforms over the next decade
Knowledge of these new computing capabilities will give you an advantage over your peers in developing innovative techniques to solve challenging problems
18-646 – How to Write Fast Code II 20
Philosophy on Techniques
Efficient software architecture is the most important to designing fast code
Software design patterns
Understanding the implementation platform will help reason about application performance bottlenecks
Hands-on experience provide confidence for you to effectively use these technologies for your research and development needs
Software Architecture Optimization
Algorithm Optimization
Data structure Optimizations
20-100x 10-40x
1.5-8x
18-646 – How to Write Fast Code II
21
Outline
Why Fast Code Now?
What does 100x speedup mean? Course structure and organization Problem solving when writing fast code
18-646 – How to Write Fast Code II 22
https://canvas.cmu.edu/courses/21510/quizzes/55032
18-646 – How to Write Fast Code II 23
Outline
Why Fast Code Now?
What does 100x speedup mean? Course structure and organization Problem solving when writing fast code
18-646 – How to Write Fast Code II 24
Organization of This Course
Lectures
Tuesdays and Thursdays
All mini-project introductions held on Tuesdays, mini-project reviews on Thursdays Keep an eye on the course schedule for changes
Canvas – https://canvas.cmu.edu/courses/21510/pages/course-schedule Course content (slides and videos)
Homework and Mini-Project Descriptions with links to Gradescope
Piazza – https://piazza.com/class/kkmp02yc92h598 Course Announcements
Course Questions and Discussions
Gradescope – https://www.gradescope.com/courses/241050 Assignment Submissions
Final Exam
18-646 – How to Write Fast Code II 25
Organization of This Course
During class:
Lectures and Q&A sessions
Outside class:
3 homework assignments
3 mini homework projects
1 term project (focused on your own research or interests)
Lots of coding 😉
18-646 – How to Write Fast Code II 26
Grading and Expectations
GRADING
10% Homework assignments
30% Homework projects 30% Term Project
30% Final examination
EXPECTATIONS
Attend majority of lectures
Hand in all assignments and mini-projects Complete a term project
Complete the final exam
18-646 – How to Write Fast Code II 27
Software Reuse and Plagiarism
Question: Can I learn to use code from the web?
Answer: Yes! But only under certain circumstances…
Plagiarism
Acceptable SW Reuse
Collusion Unacknowledged code from a third-party
File-level Reuse Third-party code factored out to a separate file
Reverse Engineering Unacknowledged re-use of some code abstraction
Acknowledgement in Documentation Third-party code clearly distinguished in documentation
Translation
Unacknowledged re-use by translating functions from one language to another
Adequate Testing
Third-party code must be tested to one’s own requirements
Code Generation Unacknowledged assistance by using some code generators
Reuse Without Testing
Reuse third-party code without testing against one’s own requirements
Gibson, “Software Reuse and Plagiarism: A Code of Practice”, In Proceedings of ITiCSE’2009. pp.55-59
18-646 – How to Write Fast Code II 28
Structure of Lectures
Module 1: Background and Multicore Programming Hardware architectures, applications
Application design with OpenMP
Module 23: Manycore Programming Manycore architectures
Application design with CUDA
Module 3: Cluster Programming Distributed architectures
Application design with Hadoop
18-646 – How to Write Fast Code II 29
Homework & Mini-Projects
Three mini projects throughout the semester
1. Multicore project with OpenMP
2. Manycore project with CUDA
3. Cloud project with Hadoop
Projects are done in teams of two or three students
(40%)
Programming assignments will start with template code
Accelerate them with techniques discussed in class
Will submit code, a project write-up and selected teams will present during class
Homeworks are done in preparation for the Mini-Projects
Internalize concepts discussed in class
Set up the project environment
Discussion among peers is allowed and encouraged. However, tasks and write-ups must be completed individually
18-646 – How to Write Fast Code II 30
Homework & Mini-Project Schedule
Homeworks (Project Setup) Module 1 – Multicore
Module 2 – Manycore
Module 3 – Cluster Computing
10% Thursday, Feburary 25th
Thursday, March 18th Thursday, April 15th
Mini-Projects (Coding and Evaluation) 30%
Module 1 – Multicore
Module 2 – Manycore
Module 3 – Cluster Computing
Monday, March 8th Monday, April 5th Monday, April 26th
18-646 – How to Write Fast Code II
31
Term Project (30%)
Term project
An application area of your choice
Term project introduction on Tuesday next week (1/9)
Projects are done in teams
Does not need to be the same team as for the Mini-Project
18-646 – How to Write Fast Code II 32
Term Project Schedule
Term (Team) Project Project Proposal (10%)
Poster Presentations (40%) Final Report (50%)
30%
Tuesday, March 9th
Tuesday, May 4th Friday, May 14th
18-646 – How to Write Fast Code II
33
Exam (30%)
Final Exam – TBD ( Week of May 10th-14th )
18-646 – How to Write Fast Code II
34
Outline
Why Fast Code Now?
What does 100x speedup mean? Course structure and organization Problem solving when writing fast code
18-646 – How to Write Fast Code II 35
Problem Solving for Fast Code
Writing fast code is a process coherent with “general problem solving behavior”
– Newell and Simon, Human Problem Solving (1972), pp. 72-73
The process of problem solving involves:
1. Understand the current state
2. Observe the internal representation
3. Search among alternatives
4. Select from a set of choices
18-646 – How to Write Fast Code II 36
The k-means Problem
Find k cluster centers that minimize the distance from each data
Important algorithm in machine learning:
Statistical data analysis
Vector quantization (Speech Recognition)
NP-hard for arbitrary input
k-means algorithm frequently finds a reasonable
solutions quickly
Issues:
Worst case running time is super-polynomial Approximation can be arbitrarily bad
point to a cluster center
18-646 – How to Write Fast Code II 37
The k-means Problem
Find k cluster centers that minimize the distance from each data
.
. .
point to a cluster center
.
Cluster
Cluster center (centroid)
k: Number of clusters (defined a-priori) Cluster: Assignment of data points to a class Cluster Center: μ of data points in a cluster
.
18-646 – How to Write Fast Code II
38
k-means Algorithm (“Lloyd’s algorithm”) Given an initial set of k means
Expectation Step: Assign each observation to the cluster with the closest mean
Maximization Step: Calculate the new means to be the centroid of the observations in the cluster.
Iterate until convergence or stopping criteria met
18-646 – How to Write Fast Code II 39
The Algorithm
Example: k=5
Distance metric=euclidean Dimensions=2
1. Randomly select k cluster Centers
2. Assign closest Center to each data point
3. Update Centers based on assignments from (2)
4. Re-iterate steps 2-3 until convergence or stopping criteria met
18-646 – How to Write Fast Code II 40
The Algorithm
Example: k=5
Distance metric=euclidean Dimensions=2
1. Randomly select k cluster Centers
2. Assign closest Center to each data point
3. Update Centers based on assignments from (2)
4. Re-iterate steps 2-3 until convergence or stopping criteria met
18-646 – How to Write Fast Code II 41
The Algorithm
Example: k=5
Distance metric=euclidean Dimensions=2
1. Randomly select k cluster Centers
3. Update Centers based on assignments from (2)
4. Re-iterate steps 2-3 until convergence or stopping criteria met
2. Assign each data point to closest Center
18-646 – How to Write Fast Code II 42
The Algorithm
Example: k=5
Distance metric=euclidean Dimensions=2
1. Randomly select k cluster Centers
2. Assign closest Center to each data point
4. Re-iterate steps 2-3 until convergence or stopping criteria met
3. Update Centers based on assignments from (2)
18-646 – How to Write Fast Code II 43
The Algorithm
Example: k=5
Distance metric=euclidean Dimensions=2
1. Randomly select k cluster Centers
2. Assign closest Center to each data point
3. Update Centers based on assignments from (2)
4. Re-iterate steps 2-3 until convergence or stopping criteria met
18-646 – How to Write Fast Code II 44
The Phases
1. 2.
3. 4.
Initialization: Randomly select k cluster centers
Select k samples from data as initial centers [Forgy Partition]
Expectation: Assign each data point go closest center Compare each data point (N) to each cluster center (k) Distance Metric: Euclidean distance (D dimensions)
Maximization: Update centers based on assignments
For each cluster (k) compute mean (D dimensions) from data
points assigned to that cluster
Evaluate: Re-iterate steps 2-3 until convergence or stopping criteria met
Percentage of data points re-assigned
Number of iterations (2-3)
18-646 – How to Write Fast Code II 45
18-646 – How to Write Fast Code II 46
A Fast Implementation of k-means Writing fast code is a process coherent with
“general problem solving behavior”
– Newell and Simon, Human Problem Solving (1972), pp. 72-73
The process of problem solving involves:
1. Understand the current state
2. Observe the internal representation
3. Search among alternatives
4. Select from a set of choices
18-646 – How to Write Fast Code II 47
A Fast Implementation of k-means Following the process of problem solving with k-means:
1.
Understand the current state
Running on a platform
Using a specific set of resources
Achieving a specific performance
Meeting a specific criteria/requirement
Observe the internal representation Search among alternatives
Select from a set of choices
Assumption:
Starting from a functionally correct reference implementation
Implication:
Must observe the current state and implementation requirements before starting to solve a problem
2.
3.
4.
18-646 – How to Write Fast Code II 48
A Fast Implementation of k-means Following the process of problem solving with k-means:
1. 2.
Understand the current state
Observe the internal representation Application structure
Identified four phases of execution Implementation concerns
Task considerations
Data representations
Concurrency opportunities
Search among alternatives Select from a set of choices
3. 4.
18-646 – How to Write Fast Code II 49
A Fast Implementation of k-means Following the process of problem solving with k-means:
1. 2.
Understand the current state
Observe the internal representation Application structure
Identified four phases of execution Implementation concerns
Task considerations
Data representations
Concurrency opportunities
Search among alternatives Select from a set of choices
3. 4.
18-646 – How to Write Fast Code II 50
A Fast Implementation of k-means Following the process of problem solving with k-means:
1. 2.
Understand the current state
Observe the internal representation Application structure
Identified four phases of execution Implementation concerns
Task considerations
Data representations
Concurrency opportunities
Search among alternatives Select from a set of choices
3. 4.
18-646 – How to Write Fast Code II 51
18-646 – How to Write Fast Code II 52