CS计算机代考程序代写 concurrency GPU algorithm finance hadoop cuda data structure 18-646 – How to Write Fast Code II

18-646 – How to Write Fast Code II
1
Carnegie Mellon University

How to Write Fast Code?
Fast Platforms
— Multicore platforms — Manycore platforms — Cloud platforms
Good Techniques
— Data structures
— Algorithms
— Software Architecture
— Course Goals
— To write fast code for your research/application, you should:
1. Feel comfortable hacking up a solution
2. Leverage existing software building blocks
3. Indicate which platform is the best one to use
4. Reason about why a piece of existing code is slow
5. Take care of potential performance bottlenecks
18-646 – How to Write Fast Code II 2

Course Staff
—Instructor: Ian Lane
— Preferred contact method: Piazza
— Office Hours: Wed 4:30-5:30pm ET — Email: ianlane@andrew.cmu.edu
—Teaching Assistants
— Deepak Arumugam Sankara Subramanianh
— Yunfan Jiang (Silicon Valley)
18-646 – How to Write Fast Code II 3

Course Information
— Lectures:
— Tuesday and Thursday 6:00pm-7:20pm ET
— Office Hours:
— Instructor Office Hours: Wednesdays 4:30pm-5:30pm ET — TA Office Hours: TBD
— Grading:
— 10% – Homeworks — 30% – Mini-Projects — 30% – Term Project — 30% – Final Exam
— Course Links:
— Canvas: https://canvas.cmu.edu/courses/21510/pages/course-schedule — Piazza: https://piazza.com/class/kkmp02yc92h598
— Gradescope: https://www.gradescope.com/courses/241050
18-646 – How to Write Fast Code II 4

https://canvas.cmu.edu/courses/21510/pages/course-schedule
18-646 – How to Write Fast Code II 5

https://piazza.com/class/kkmp02yc92h598 Sign Up Link: https://piazza.com/cmu/spring2021/18646
18-646 – How to Write Fast Code II 6

https://www.gradescope.com/courses/241050
18-646 – How to Write Fast Code II 7

Outline
—Why Fast Code Now?
—What does 100x speedup mean? —Course structure and organization —Problem solving when writing fast code
18-646 – How to Write Fast Code II 8

Why Fast Code?
Need is driven by the applications… …NOT by the availability of the platforms.
18-646 – How to Write Fast Code II 9

RMS Appears Everywhere
— Recognition: Interpretations of the world (with models)
— Mining: Understanding of the world (discover hidden patterns with model) — Synthesis: Anticipate outcomes in the world (using the models to predict)
18-646 – How to Write Fast Code II 10

What is being valued today?
int main(void) { mykernel<<<1,1>>>(); printf(“Hello World!\n”); return 0;
}
Hardware Software
Data
Commoditized Open Sourced
King!
18-646 – How to Write Fast Code II
11

Outline
—Why Fast Code Now?
—What does 100x speedup mean? —Course structure and organization —Problem solving when writing fast code
18-646 – How to Write Fast Code II 12

What does 100x Speedup Mean?
12 hours: 10x speedup à 1.2 100x speedup à 7.2 1000x speedupà 43.2
— Game changing technology advances — Overnight jobs becomes interactive
Speech Analytics Medical Imaging Image Recognition
hours
minutes
seconds
18-646 – How to Write Fast Code II
13
Computational Finance

Fast, Robust Pediatric MRI
—100X faster reconstruction — Higher-quality, faster MRI
— This image: 8 month-old patient with cancerous mass in liver
— 256 x 84 x 154 x 8 data size
— Serial Recon: 1 hour
— Parallel Recon: less than 1 minute
— Fast enough for clinical use
— Software currently deployed at Lucile Packard Children’s Hospital for clinical study of the reconstruction technique
18-646 – How to Write Fast Code II 14

Support-Vector Machines
— Algorithmic changes and parallel implementation lead to performance speedup: Core 2 Duo versus G80
Computation
LIBSVM
Our algorithm2- core Parallel CPU
Our algorithm, 16- core Parallel GPU
SVM Training (geo-mean)
771.8 s
— 38.5 s
SVM 41.42 s 4.21 s 0.38 s Classification
100X speed-up
896 downloads since release in 10/2008
18-646 – How to Write Fast Code II
15

Image Contours Detection
1 0.8 0.6 0.4 0.2 0
— We achieve equivalent accuracy on the Berkeley Segmentation Dataset
— Comparing to human segmented “ground truth”
— F-measure 0.70 for both
— Human agreement = 0.79
— 3.8 minutes to 1.8 seconds: 126x speedup
— 616 downloads since release in October 2009
CVPR 2008
Damascene
0 0.2 0.4 0.6 0.8 1
Recall
18-646 – How to Write Fast Code II
16
Precision

Computational Finance
— Value-at-Risk Computation with Monte Carlo Method
— Summarizes a portfolio’s vulnerabilities to market movements
— Important to algorithmic trading, derivative usage
— Improved implementation to run
60x faster on a parallel microprocessor
Four Steps of Monte Carlo Method in Finance
18-646 – How to Write Fast Code II
17
Uniform Random Number Generation
Market Parameter Transformation
f(x)
Instrument Pricing
Data Assimilation

NY Times TIFF to PDF
— In 2007, the New York Times decided to make all the public domain articles from 1851-1922 available free of charge
— Needed to convert from TIFF to PDF
Data set:
Compute Instances:
Time taken: Cost:
4TB of raw image TIFF to 11 million PDF 100 Amazon EC2 instances
24 hours $240
http://open.blogs.nytimes.com/2007/11/01/self-service-prorated-super-computing-fun/
18-646 – How to Write Fast Code II 18

18-646 – How to Write Fast Code II 19

Philosophy on Platforms
Provide theoretical background and hands-on practices…
…to innovate with multicore/manycore/cloud-based platforms.
Intel Sandy Bridge Multicore Processor (Core i7-2600K)
NVIDIA Fermi Manycore Processor – GTX580
Yahoo! Hadoop Cluster ~2000 nodes in one cluster
— Significant scientific advances will be empowered by multicore/manycore/cloud- based platforms over the next decade
— Knowledge of these new computing capabilities will give you an advantage over your peers in developing innovative techniques to solve challenging problems
18-646 – How to Write Fast Code II 20

Philosophy on Techniques
— Efficient software architecture is the most important to designing fast code
— Software design patterns
— Understanding the implementation platform will help reason about application performance bottlenecks
— Hands-on experience provide confidence for you to effectively use these technologies for your research and development needs
Software Architecture Optimization
Algorithm Optimization
Data structure Optimizations
20-100x 10-40x
1.5-8x
18-646 – How to Write Fast Code II
21

Outline
—Why Fast Code Now?
—What does 100x speedup mean? —Course structure and organization —Problem solving when writing fast code
18-646 – How to Write Fast Code II 22

https://canvas.cmu.edu/courses/21510/quizzes/55032
18-646 – How to Write Fast Code II 23

Outline
—Why Fast Code Now?
—What does 100x speedup mean? —Course structure and organization —Problem solving when writing fast code
18-646 – How to Write Fast Code II 24

Organization of This Course
— Lectures
— Tuesdays and Thursdays
— All mini-project introductions held on Tuesdays, mini-project reviews on Thursdays — Keep an eye on the course schedule for changes
— Canvas – https://canvas.cmu.edu/courses/21510/pages/course-schedule — Course content (slides and videos)
— Homework and Mini-Project Descriptions with links to Gradescope
— Piazza – https://piazza.com/class/kkmp02yc92h598 — Course Announcements
— Course Questions and Discussions
— Gradescope – https://www.gradescope.com/courses/241050 — Assignment Submissions
— Final Exam
18-646 – How to Write Fast Code II 25

Organization of This Course
— During class:
— Lectures and Q&A sessions
— Outside class:
— 3 homework assignments
— 3 mini homework projects
— 1 term project (focused on your own research or interests)
— Lots of coding 😉
18-646 – How to Write Fast Code II 26

Grading and Expectations
— GRADING
— 10% Homework assignments
— 30% Homework projects — 30% Term Project
— 30% Final examination
— EXPECTATIONS
— Attend majority of lectures
— Hand in all assignments and mini-projects — Complete a term project
— Complete the final exam
18-646 – How to Write Fast Code II 27

Software Reuse and Plagiarism
— Question: Can I learn to use code from the web?
— Answer: Yes! But only under certain circumstances…
Plagiarism
Acceptable SW Reuse
Collusion Unacknowledged code from a third-party
File-level Reuse Third-party code factored out to a separate file
Reverse Engineering Unacknowledged re-use of some code abstraction
Acknowledgement in Documentation Third-party code clearly distinguished in documentation
Translation
Unacknowledged re-use by translating functions from one language to another
Adequate Testing
Third-party code must be tested to one’s own requirements
Code Generation Unacknowledged assistance by using some code generators
Reuse Without Testing
Reuse third-party code without testing against one’s own requirements
Gibson, “Software Reuse and Plagiarism: A Code of Practice”, In Proceedings of ITiCSE’2009. pp.55-59
18-646 – How to Write Fast Code II 28

Structure of Lectures
— Module 1: Background and Multicore Programming — Hardware architectures, applications
— Application design with OpenMP
— Module 23: Manycore Programming — Manycore architectures
— Application design with CUDA
— Module 3: Cluster Programming — Distributed architectures
— Application design with Hadoop
18-646 – How to Write Fast Code II 29

Homework & Mini-Projects
— Three mini projects throughout the semester
1. Multicore project with OpenMP
2. Manycore project with CUDA
3. Cloud project with Hadoop
— Projects are done in teams of two or three students
(40%)
— Programming assignments will start with template code
— Accelerate them with techniques discussed in class
— Will submit code, a project write-up and selected teams will present during class
— Homeworks are done in preparation for the Mini-Projects
— Internalize concepts discussed in class
— Set up the project environment
— Discussion among peers is allowed and encouraged. However, tasks and write-ups must be completed individually
18-646 – How to Write Fast Code II 30

Homework & Mini-Project Schedule
— Homeworks (Project Setup) — Module 1 – Multicore
— Module 2 – Manycore
— Module 3 – Cluster Computing
10% Thursday, Feburary 25th
Thursday, March 18th Thursday, April 15th
— Mini-Projects (Coding and Evaluation) 30%
— Module 1 – Multicore
— Module 2 – Manycore
— Module 3 – Cluster Computing
Monday, March 8th Monday, April 5th Monday, April 26th
18-646 – How to Write Fast Code II
31

Term Project (30%)
— Term project
— An application area of your choice
— Term project introduction on Tuesday next week (1/9)
— Projects are done in teams
— Does not need to be the same team as for the Mini-Project
18-646 – How to Write Fast Code II 32

Term Project Schedule
— Term (Team) Project — Project Proposal (10%)
— Poster Presentations (40%) — Final Report (50%)
30%
Tuesday, March 9th
Tuesday, May 4th Friday, May 14th
18-646 – How to Write Fast Code II
33

Exam (30%)
— Final Exam – TBD ( Week of May 10th-14th )
18-646 – How to Write Fast Code II
34

Outline
—Why Fast Code Now?
—What does 100x speedup mean? —Course structure and organization —Problem solving when writing fast code
18-646 – How to Write Fast Code II 35

Problem Solving for Fast Code
— Writing fast code is a process coherent with “general problem solving behavior”
– Newell and Simon, Human Problem Solving (1972), pp. 72-73
— The process of problem solving involves:
1. Understand the current state
2. Observe the internal representation
3. Search among alternatives
4. Select from a set of choices
18-646 – How to Write Fast Code II 36

The k-means Problem
— Find k cluster centers that minimize the distance from each data
— Important algorithm in machine learning:
— Statistical data analysis
— Vector quantization (Speech Recognition)
— NP-hard for arbitrary input
— k-means algorithm frequently finds a reasonable
solutions quickly
— Issues:
— Worst case running time is super-polynomial — Approximation can be arbitrarily bad
point to a cluster center
18-646 – How to Write Fast Code II 37

The k-means Problem
— Find k cluster centers that minimize the distance from each data
.
. .
point to a cluster center
.
Cluster
Cluster center (centroid)
k: Number of clusters (defined a-priori) Cluster: Assignment of data points to a class Cluster Center: μ of data points in a cluster
.
18-646 – How to Write Fast Code II
38

k-means Algorithm (“Lloyd’s algorithm”) — Given an initial set of k means
— Expectation Step: Assign each observation to the cluster with the closest mean
— Maximization Step: Calculate the new means to be the centroid of the observations in the cluster.
— Iterate until convergence or stopping criteria met
18-646 – How to Write Fast Code II 39

The Algorithm
Example: k=5
Distance metric=euclidean Dimensions=2
1. Randomly select k cluster Centers
2. Assign closest Center to each data point
3. Update Centers based on assignments from (2)
4. Re-iterate steps 2-3 until convergence or stopping criteria met
18-646 – How to Write Fast Code II 40

The Algorithm
Example: k=5
Distance metric=euclidean Dimensions=2
1. Randomly select k cluster Centers
2. Assign closest Center to each data point
3. Update Centers based on assignments from (2)
4. Re-iterate steps 2-3 until convergence or stopping criteria met
18-646 – How to Write Fast Code II 41

The Algorithm
Example: k=5
Distance metric=euclidean Dimensions=2
1. Randomly select k cluster Centers
3. Update Centers based on assignments from (2)
4. Re-iterate steps 2-3 until convergence or stopping criteria met
2. Assign each data point to closest Center
18-646 – How to Write Fast Code II 42

The Algorithm
Example: k=5
Distance metric=euclidean Dimensions=2
1. Randomly select k cluster Centers
2. Assign closest Center to each data point
4. Re-iterate steps 2-3 until convergence or stopping criteria met
3. Update Centers based on assignments from (2)
18-646 – How to Write Fast Code II 43

The Algorithm
Example: k=5
Distance metric=euclidean Dimensions=2
1. Randomly select k cluster Centers
2. Assign closest Center to each data point
3. Update Centers based on assignments from (2)
4. Re-iterate steps 2-3 until convergence or stopping criteria met
18-646 – How to Write Fast Code II 44

The Phases
1. 2.
3. 4.
Initialization: Randomly select k cluster centers
— Select k samples from data as initial centers [Forgy Partition]
Expectation: Assign each data point go closest center — Compare each data point (N) to each cluster center (k) — Distance Metric: Euclidean distance (D dimensions)
Maximization: Update centers based on assignments
— For each cluster (k) compute mean (D dimensions) from data
points assigned to that cluster
Evaluate: Re-iterate steps 2-3 until convergence or stopping criteria met
— Percentage of data points re-assigned
— Number of iterations (2-3)
18-646 – How to Write Fast Code II 45

18-646 – How to Write Fast Code II 46

A Fast Implementation of k-means — Writing fast code is a process coherent with
“general problem solving behavior”
– Newell and Simon, Human Problem Solving (1972), pp. 72-73
— The process of problem solving involves:
1. Understand the current state
2. Observe the internal representation
3. Search among alternatives
4. Select from a set of choices
18-646 – How to Write Fast Code II 47

A Fast Implementation of k-means — Following the process of problem solving with k-means:
1.
Understand the current state
— Running on a platform
— Using a specific set of resources
— Achieving a specific performance
— Meeting a specific criteria/requirement
Observe the internal representation Search among alternatives
Select from a set of choices
Assumption:
Starting from a functionally correct reference implementation
Implication:
Must observe the current state and implementation requirements before starting to solve a problem
2.
3.
4.
18-646 – How to Write Fast Code II 48

A Fast Implementation of k-means — Following the process of problem solving with k-means:
1. 2.
Understand the current state
Observe the internal representation — Application structure
— Identified four phases of execution — Implementation concerns
— Task considerations
— Data representations
— Concurrency opportunities
Search among alternatives Select from a set of choices
3. 4.
18-646 – How to Write Fast Code II 49

A Fast Implementation of k-means — Following the process of problem solving with k-means:
1. 2.
Understand the current state
Observe the internal representation — Application structure
— Identified four phases of execution — Implementation concerns
— Task considerations
— Data representations
— Concurrency opportunities
Search among alternatives Select from a set of choices
3. 4.
18-646 – How to Write Fast Code II 50

A Fast Implementation of k-means — Following the process of problem solving with k-means:
1. 2.
Understand the current state
Observe the internal representation — Application structure
— Identified four phases of execution — Implementation concerns
— Task considerations
— Data representations
— Concurrency opportunities
Search among alternatives Select from a set of choices
3. 4.
18-646 – How to Write Fast Code II 51

18-646 – How to Write Fast Code II 52