CS计算机代考程序代写 matlab algorithm assembly Java Excel compiler computer architecture python c/c++ cuda CMPSC 450

CMPSC 450
Concurrent Scientific Programming
Introduction
CMPSC 450

Welcome to the class!
• Class meets MWF 8-8:50AM on Zoom
• Office hours
• Tuesdays and Thursdays 8PM – 9PM. On Zoom. • By appointment
• Email: use Canvas
• About me:
• Master of Engineering, CSE, Penn State 2001
• 20 years industry experience
• Focus on implementing/optimizing algorithms
CMPSC 450
2

Class Overview and Topics:
• Week 1: Introduction, Modern Computer Architecture, ACI-ICDS Introduction
• Week 2: Serial Optimization and Data Access Optimization
• Week 3 & 4: Shared Memory Parallel Algorithm Theory, Exam 1 (2/12)
• Week 4 & 5: Shared Memory Parallel Algorithm Practice: std::threads
• Week 6 & 7: Open MP
• Week 8: Prefix Sums, Pointer Jumping, Exam 2 (3/10 or 3/15 TBD)
• Week 9 & 10: Networks, Distributed Memory Theory
• Week 11 & 12: MPI
• Week 12 & 13: SUMMA & Cannon Matrix Multiplication, Exam 3 (4/5)
• Week 14: Simulations and Locality
• Week 15: CUDA Overview
• Finals Week: Final (Time TBD).
CMPSC 450
3

Class Material
• Textbook: Georg Hager and Gerhard Wellein, Introduction to High Performance Computing for Scientists and Engineers
• ISBN: 978-1-4398-1192-4
• Useful for terminology, practical programming
recommendations, analysis of scientific codes
• Available as an e-book (See Library Resources on Canvas page)
• Lecture slides available in Canvas
• Miscellaneous Reading
• Class discussions
This is a new edition!
CMPSC 450
4

Grading
• Programming assignments*, 40% • Class contribution*, 10%
• Mini project, 10%
• 3 In-Class Exams, 30%
• Final exam, 10%
* A score of less than 50% in this section results in an F for the course.
Concerns regarding earned grades must contact instructor within 2 weeks of date of posted grade
CMPSC 450
5

Programming Assignments
• All programming assignments shall run on the ACI-ICS cluster (Roar Supercomputer).
• All programming assignments shall be written in C/C++.
• All source-code shall have: • Consistent formatting
• Author header
• Makefiles
• All write-ups submitted shall be in pdf format following the “How to Write Technical Papers” format.
• Late penalty: 50% (unless prior approval granted)
• Example Topics:
• Benchmarking – memory, functions/subroutines, algorithms • Code optimization
• Threading
• OpenMP
• MPI
• Advanced parallel algorithms
CMPSC 450
6

Generic Rubric (available on Canvas)
CMPSC450 Generic Assignment Rubric
Unsatisfactory [0%]
Note: Any Unsatisfactory criteria will result in a zero grade for the assignment, and no further grading will be done.
Poor (40%)
Satisfactory (75%)
Excellent (100%)
Code Performance
50% (10 pts)
• Code does not compile
• Code does not meet
functional requirements.
• Code execution results
in segmentation fault.
• No Makefile provided.
• No code provided.
• No Author header.
• Code is not optimized.
• Parallelism (if applicable)
does not work.
• Code does not scale in
parallelization.
• Code does not scale with
input data size (where
applicable).
• Code is commented
poorly (no comments)
• Poor code structure.
• Some optimizations missed.
• Code execution time does not meet times specified in assignment.
• Code scales appropriately with parallelization (where applicable)
• Code scales to medium data sizes, but does not scale fully.
• Code is fully optimized
• Execution times meet or
exceed times specified
in assignment.
• Code execution scales to
appropriately large data
sizes.
• Code is easy to read and
commented appropriately.
Documentation (writeup)
50% (10 pts)
• No writeup provided.
• No author name on
writeup.
• Writeup not in pdf
format.
• Write-up is missing insight into performance results or is just a text description of a graph.
• Write-up missing discussion on code implementation or just includes a copy of the source code.
• Performance details presented poorly (table).
• Data presentation (graph) is difficult to read.
• Data presentation provides too few data points or does not cover full range of performance.
• Writeup could use more details regarding code implementation.
• Writeup could use more insight into performance results.
• Excellent writeup.
• Excellent presentation of
performance results
(graph).
• Efficiently explains
critical code
implementation details.
• Accurately explains
insight into performance results.
CMPSC 450
7

Mini project (due date April TBD)
• Go and learn good things!
• Learn something about High Performance Computing or apply what you have learned to your research and then tell us about what you learned.
• Up to groups of 2.
• More info to come…
CMPSC 450
8

Class Contribution
• A short paragraph due every Friday (midnight) highlighting how your participation in class contributed to the success of the class.
• Evaluated out of 2 points.
• Acceptable contributions:
• Answer questions in class
• Ask relevant questions in class
• Point out code errors during live coding demos
• Point out math errors in Execution time calculations • Participated in discussions in Canvas
• Unacceptable contributions:
• Read the book.
• Took notes during class.
• Did not disrupt class with questions.
CMPSC 450
9

Attendance
• University policy: attendance is mandatory.
• Zoom attendance will be recorded to validate class contribution
reports.
• Tips for morning classes:
• Have breakfast before class.
• Have a morning beverage… I prefer coffee. • Assume it is your job.
CMPSC 450
10

Academic Integrity
• The work submitted is assumed to be YOUR work.
• Homework will be scanned for plagiarism.
• If the data in your reports doesn’t match the code you provided, that is also a violation!
• If code is used from online, citations in the comments are required!
• Discussion of assignments is encouraged. Writing source code and
reports must be done individually!
• Penalty: zero on assignment, 1 letter grade drop on final grade.
• Ask for help!
CMPSC 450
11

You should take this class if …
• You think you will like writing high-performance code
• You are comfortable with foundational CS material (Algorithms,
Computer Organization and Architecture)
• You are proficient in C • for loops
• variable declarations (int, float, double) • arrays
• memory allocation
float *A = new float[N]; float *B = new float[N]; float *C = new float[N];
for (int ii = 0; ii < N; ii++) C[ii] = A[ii] * B[ii]; CMPSC 450 12 Today’s tasks • Review material posted on Canvas • Read Chapter 1 of the HPC book for Wednesday • Read “How To Write Technical Papers” found in Canvas • Read Chapters 2 & 3 for Next week. • Register for account on ACI cluster. CMPSC 450 13 Questions? CMPSC 450 14 High Performance Computing Today • https://www.hpcwire.com/2020/04/08/supercomputer-modeling- tests-how-covid-19-spreads-in-grocery-stores/ CMPSC 450 15 Why C? CMPSC 450 16 Why C? https://insights.stackoverflow.com/survey/2018#most-popular-technologies CMPSC 450 17 Why C? • C is still one of the fastest languages around • Assembly is faster but much more complex! CMPSC 450 18 Matrix Multiply: relative speedup to a Python version (18 core Intel) from: “There’s Plenty of Room at the Top,” Leiserson, et. al., to appear. CMPSC 450 63,000! 19 ACI Cluster • 26,000 Cores, 2.4MW of redundant power • Dual 10/12 core Xeon E5-2680, Quad 10 core Xeon E7-4830 • 128GB / 256GB / 1TB Memory • Utilize ACI open allocation. • Limited to 100 cores, 48-hour jobs • User Guide: • https://ics.psu.edu/computing-services/icds-aci-user-guide/ • Sign up for an account: • Affiliation: -- Class – • Role: Undergrad/Grad Student • Sponsor: wrs122 • Computational and Data Requirements: 1 GB storage, Open Allocation CMPSC 450 20 What you should get out of this class? In-depth understanding of: • When is parallel computing useful? • Programming models (software) and tools, and experience using some of them • Performance analysis and tuning • Important parallel applications and commonly-used algorithms • Current computing hardware and trends • Exposure to various open research questions CMPSC 450 21 Why parallel computing? CMPSC 450 22 Why parallel computing? • Fastest Processor Overclocked to 8.8GHz (2011) • 25MHz in 1993 • 75MHz in 1995 • 1GHz in 1999 • Power Density of CPU is somewhere between a hotplate and Nuclear Reactor. • They’re already here... • Intel Xeon Phi 7250 – 68 cores. CMPSC 450 23 CMPSC 450 CMPSC 450 25 Parallel computing • Parallel processors are everywhere • Including server, desktops, laptops, smartphones, refrigerators • We require powerful computers to solve large-scale science and engineering problems • Big problems, big data require better use of parallel hardware • Writing good parallel programs is hard • ‘Efficient’ programs require heroic coding CMPSC 450 26 Writing fast code: practical issues In-depth understanding of architecture and performance programming • Optimizations for serial code • Data access optimizations • Locality, Memory hierarchy • Shared memory programming • Distributed memory programming CMPSC 450 27 Definitions from Wikipedia (petascale, exascale) • In computing, petascale refers to a computer system capable of reaching performance in excess of one petaflops, i.e. one quadrillion floating point operations per second. The standard benchmark tool is LINPACK and Top500.org is the organization which tracks the fastest supercomputers. • Exascale computing refers to computing systems capable of at least one exaFLOPS, or a billion billion calculations per second. Such capacity represents a thousandfold increase over the first petascale computer that came into operation in 2008.[1] (One exaflops is a thousand petaflops or a quintillion, 1018, floating point operations per second.) At a supercomputing conference in 2009, Computerworld projected exascale implementation by 2018.[2] This proved accurate, as Oak Ridge National Labs performed a 1.8×1018 flop calculation on the Summit Supercomputer while analyzing genomic information in 2018.[3] They are Gordon Bell Finalists at Supercomputing 2018. • Exascale computing would be considered as a significant achievement in computer engineering, for it is believed[by whom?] to be the order of processing power of the human brain at neural level[4](functional might be lower). It is, for instance, the target power of the Human Brain Project. CMPSC 450 28 Today’s petascale systems are solving challenging research problems . . . Physics of high-temperature superconducting cuprates Protein structure and function for cellulose-to-ethanol conversion Fundamental instability of supernova shocks Global simulation of CO2 dynamics Next-generation combustion devices burning alternative fuels CMPSC 450 Optimization of plasma heating systems for fusion experiments 29 Many compelling problems become tractable only at the exascale Fundamental science • Decipher and comprehend the core laws governing the universe and unravel its origins Materials science • Design, characterize, and manufacture materials tailored and optimized for specific applications Earth science • Understand the complex biogeochemical cycles that underpin global ecosystems and control the sustainability of life on Earth Biology and medicine • Understand relationships from individual proteins through whole cells into ecosystems and environments Energy assurance • Attain, without costly disruption, the energy required by the nation in guaranteed, economically viable, and environmentally benign ways to satisfy residential, commercial, and transportation requirements National security • Analyze, design, test, and optimize critical systems for communications, homeland security, and defense • Understand and uncover human behavioral systems underlying asymmetric operation environments Engineering design • Design, deploy, and operate safe and economical structures, machines, processes, and systems with reduced concept-to-deployment time CMPSC 450 30 What do commercial and computational science applications have in common? Health Image Speech Music Browser 1 Finite State Mach. 2 Combinational 3 Graph Traversal 4 Structured Grid 5 Dense Matrix 6 Sparse Matrix 7 Spectral (FFT) 8 Dynamic Prog 9 N-Body 10 MapReduce 11 Backtrack/ B&B 12 Graphical Models 13 Unstructured Grid CMPSC 450 31 Embed SPEC DB Games ML HPC ‘Big Data’ analysis requires new computational methods and powerful computers Recommendation Systems Web Search Targeted Advertising Viral Marketing CMPSC 450 32 Why is writing parallel code difficult? • Finding enough parallelism (Amdahl’s Law) • Granularity – how big should each parallel task be • Locality – moving data costs more than arithmetic • Load balance – don’t want 1000 processors to wait for one slow one • Coordination and synchronization – sharing data safely • Performance modeling/debugging/tuning Impediments to parallelization: • Legacy serial code • Rapidly-evolving architectures • Potential speedup only a ‘constant factor’ on current platforms, especially for memory-intensive computations CMPSC 450 33 Pre-existing Parallelism in Modern Machines • Bit level parallelism • within operations, etc. • Instruction level parallelism (ILP) • multiple instructions execute per clock cycle • Memory system parallelism • overlap of memory operations with computation • OS parallelism • multiple jobs run in parallel on commodity SMPs Limits to all of these: for very high performance, need user to identify, schedule and coordinate parallel tasks CMPSC 450 34 Automatic Performance Tuning: Motivation • Writing high-performance software is hard • Make programming easier while getting high speed • Ideal case: program in your favorite high-level language (MATLAB, Python, Java) and get a high fraction of peak performance • Reality: Best algorithm (and its implementation) can depend strongly on the problem, computer architecture, compiler, ... • How much of this can we automate? CMPSC 450 35 Parallel software development: an emerging view • Efficiency layer (10% of today’s programmers) • Expert programmers build libraries, frameworks, OS • Highest fraction of peak performance • Productivity layer (90% of today’s programmers) • Domain experts develop parallel applications by composing frameworks and libraries • Emphasis on productivity, willing to sacrifice some performance for productive programming CMPSC 450 36 Algorithm Design Efficient algorithm CMPSC 450 37 Applications Algorithm ‘Engineering’ Real inputs Realistic models Design Experimentation Implementation Alg. libraries Analysis Perf. Guarantees CMPSC 450 38 Applications Increasing hardware complexity AMD Magny-Cours NVIDIA Fermi CMPSC 450 39 Models designed by computer architects are too complex CMPSC 450 40 CMPSC 450 41 Additional Resources • See Canvas • Some parallel computing text books CMPSC 450 42 Parallel computing CMPSC 450 43 Software tools, Languages, Libraries for mini project • NAMD, LAMMPS, ParMetis, SCOTCH, NWCHEM, SCALAPACK, PLASMA, MAGMA, SLEPC • Intel MKL, PetSC, Trilinos, Hypre • SPIRAL, FFTW, OSKI, ROSE, Orio • Visit, FastBit • RaxML, AbySS, VASP, Blast • UPC, Cilk, Chapel, Habanero, X10, OpenACC, OpenCL CMPSC 450 44 Tunnel Vision by Experts • • • • “I think there is a world market for maybe five computers.” Thomas Watson, chairman of IBM, 1943. “There is no reason for any individual to have a computer in their home” Ken Olson, president and founder of Digital Equipment Corporation, 1977. “640K [of memory] ought to be enough for anybody.” Bill Gates, chairman of Microsoft,1981. “On several recent occasions, I have been asked whether parallel computing will soon be relegated to the trash heap reserved for promising technologies that never quite make it.” Ken Kennedy, CRPC Directory, 1994 CMPSC 450 45 Parallel Algorithm Design and Analysis • Costs of computation: Asymptotics, Time, Space, Power, Energy, Speedup, Tradeoffs • Scalability • Parallel models of computations: PRAM, BSP • Scheduling: Task graphs, work, span • Analysis CMPSC 450 46 Algorithmic Design Strategies • Divide and conquer • Parallel prefix • Stencil-based iterations • Pipelining • Randomization CMPSC 450 47 Parallel Programming • Target machine model • Shared memory (threading: OpenMP) • Distributed memory (message passing: MPI) • Heterogeneous • Hello world→Proficient use of parallel libraries • Synchronization • Load balancing • Data layout, locality, memory management CMPSC 450 48 Resources • The HPC book. • The Internet • Course Instructor and TA. CMPSC 450 49