FLOATING POINT BENCHMARKING
• In this exercise we’re going to look at floating point performance and how precision impacts performance
• Application code is provided (float-bench.c) for you to take and compile and run on Cirrus
• Use the Intel compilers to start with
Copyright By PowCoder代写 加微信 powcoder
• You can run on the login nodes for basic benchmarking
• Compile with a command like this:
• icc –O2 –o float-bench float-bench.c
• Run with a command like this: • ./float-bench 1000000
• Current benchmark will run with double precision variables
• Produce timing data for a number of basic operations
Investigate performance
• Check the performance of double precision floating point number operations
• 64-bit number
• Different operations have different costs
• Ensure you’re using sufficient iterations to get accurate timings
• See how the timings change with different iteration counts
• Try different compiler flags
• Different optimisation settings
• Without vectorisation or with vectorisation
• What can you see about the different operations
Add single precision functionality
• Extend the benchmark to undertake single precision benchmark
• Copy the bench_double routine
• Convert it to using float datatype not double • Run both to see the performance difference
• Single precision is 32-bits, double precision 64-bits • Saving in memory traffic and storage requirements
• Potential improvement from vectorisation savings as well
• Compare performance with the different compiler flags
Further work
• Check compiler optimisation reports to ensure optimisations are happening as expected
• i.e.–qopt-report=5
• Check the assembler produced
• -S flag produces assemble for most compilers
• Check additional compilers • gcc
• Nvidia (PGI) compilers are also available (module load nvidia/compilers-21.9)
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com