COMP3600/6466 – Algorithms
Empirical Analysis [Lev sec. 2.6]
Hanna Kurniawati
https://cs.anu.edu.au/courses/comp3600/ comp_3600_6466@anu.edu.au
Empirical analysis
• Empirical analysis of an algorithm: Analyse an algorithm by measuring properties of interest (e.g., run-time, space) when the implementation of the algorithm is executed on sample input(s), and analyzing these measurements
• Why do this?
• We have not figured out the theoretical analysis of the
algorithm
• Assess the quality of the implementation
This class
Empirical analysis
• To assess the quality of the implementation
• Compute the (theoretical) asymptotic bound of
the algorithm
• Run the algorithms to gather data about actual resources (time/space) used to solve problems with different sizes
• Fit data to a function
• Compared with the asymptotic bound
What to Measure?
• Count the number of main components being called
• flops in matlab
• Use static variable to count the number of times a
function is called
• Measure the time/space directly
• Capture time before and after a function is called, the difference is the time
• Use profiler, e.g., gprof (usually for time), valgrind (usually for memory usage)
Different “time”
• Wall clock time: The actual time taken for the program to finish
• Includes time taken by other programs, by memory swap, etc.
• Not a good measure if you want to know the run-time complexity of your program. But, might be useful if you want to know the actual run-time under “typical” use case
• CPU time: The amount of time CPU spends
processing the program, directly or indirectly.
• Consists of 2 types:
• User time: The amount of time CPU spends processing the program,
e.g., looping over arrays.
• System time: The amount of time CPU spends on Operating System functionalities on behalf of the program, e.g., system calls (e.g., file read/write)
Measuring CPU time in C++
• From inside C++
:
:
clock_t s = clock();
(Part of the) Program being analysed clock_t e = clock();
double cputimeInSec = ((double) (e-s)) / CLOCKS_PER_SEC; • Using profiler, e.g., gprof
Measuring Memory in C++
• Using profiler, e.g., valgrind + massif- visualizer
• From inside C++
• In Linux, based on /proc/self/stat file
Analysis
• Once we have a time/space used when an implementation of an algorithm is executed, we basically have a single data point
• Usually, we’re interested more on the trend
in run-time/space complexity
• This means we need to run on multiple input sizeàmultiple data points
• Fit a function to the data pointsàregression
A note on fitting functions to data points
• Many regression methods
• Outside the scope of this class
• Can use functions for regression in spreadsheet (e.g., excel) or other mathematics s/w (e.g., matlab) or use math library for C++ (e.g., GSL)
• But just so it’s not totally a black box and you can use regression functionalities available in those libraries properly, let’s see simple regression a bit.
Simple Regression: Linear Least Square
• Given a set of data points
{ 𝑥!,𝑦! , 𝑥”,𝑦” ,…, 𝑥#,𝑦# }, we want to find a
linear combination of functions that best fit the
data, where best means it minimizes the least
square error. This means,
• We want to find the weights 𝑤!, 𝑤”, … , 𝑤# for a function 𝑦=𝑓𝑥 =𝑤!𝑓! 𝑥 +𝑤”𝑓” 𝑥 +⋯+𝑤#𝑓# 𝑥 that
minimises𝑒=∑& 𝑦 −𝑓 𝑥 ” $%!$ $
• Note that 𝑓$ 𝑥 for 𝑖 ∈ [1, 𝑘] can be any function (e.g., 𝑥”, log 𝑥, etc.) because once we insert the data, the equation𝑦=𝑤!𝑓! 𝑥 +𝑤”𝑓” 𝑥 +⋯+𝑤#𝑓# 𝑥 remains a system of linear equations
Finding the weights
• Let’s rewrite 𝑓 𝑥 for each data point 𝑦!=𝑤!𝑓!𝑥! +𝑤”𝑓”𝑥! +⋯+𝑤#𝑓#𝑥! 𝑦”=𝑤!𝑓!𝑥” +𝑤”𝑓”𝑥” +⋯+𝑤#𝑓#𝑥”
: :
𝑦&=𝑤!𝑓!𝑥& +𝑤”𝑓”𝑥& +⋯+𝑤#𝑓#𝑥&
• We can write the above more compactly as: 𝑓! 𝑥! ⋯ 𝑓# 𝑥! 𝑤! 𝑦!
⋮ ⋱ ⋮ 𝑤⋮=𝑦⋮ 𝑓!𝑥& ⋯𝑓#𝑥& # &
𝑋𝑤 = 𝑦
• Using linear algebra, the error 𝑒 will be minimised when 𝑋$𝑋𝑤 = 𝑋$𝑦
Next: More on sorting