SOFT3410 Tutorial 4 Measuring, Graphing and Techniques
We will be looking into performance analysis of applications. We are mostly concerned with the cpu usage but we will check out memory usage.
Question 1: Time Command
To get a rough idea of the real time implications of our application, we can utilise a unix command to run with our program. The time command provides indication of the real world clock run time, amount of time it spent in userspace and the amount of time it spent in kernel space (via a system call).
You can use the code below.
void resize_nn(int* pixels_in, size_t in_width, size_t in_height, int* pixels_out, size_t out_width, size_t out_height) {
if(pixels_in
&& pixels_out && out_width && out_height && in_width && in_height) {
} }
double x_ratio = in_width / (double) out_width; double y_ratio = in_height / (double) out_height;
for(size_t y = 0; y < out_height; y++) { for(size_t x = 0; x < out_width; x++) {
} }
double adj_x = floor(x*x_ratio); double adj_y = floor(y*y_ratio);
pixels_out[(y*out_width)+x] = pixels_in[(size_t) ((adj_y*in_width) + adj_x)];
Note: For anyone that is adventurous, use bitmap image as data for this function. 1
Question 2: GnuPlot, Timing and Collecting Data
Using the code from exercise 1, construct a few test cases and examine different techniques such as unrolling or constructing the images as a jagged array.
Use the following gnuplot configuration.
1 # Output Settings
2 set terminal png size 800,300
3 set output 'out.png'
4
5 # Labels, Title and Data
6 set key inside bottom right
7 set xlabel 'Dimensions'
8 set ylabel 'Time (s)'
9 set title 'NN-Resize Comparison'
10 plot "data.txt" using 1:2 title 'Column-Major Order' with linespoints, \
11 "data.txt" using 1:3 title 'Jagged Array' with linespoints
Generate and construct the data in the following form
1 #Dimensions Jagged Array Column-Major Order
2 1024 0.2 0.1
3 2048 0.4 0.2
4 4096 0.8 0.3
5 ...
To clarify, the # symbol is used for commenting with gnuplot, use this to convey the information of
your plot configuration and data.
• Why do we want to use visualisation software for this task?
• What information do we gain from graphing the different configurations? • What other configurations could we create to visualise this data and why?
You are not restricted to using gnuplot for your homework or assignment tasks. However, you much present a repeatable experiment that should show a similar graph and survive scrutiny from your tutor.
It isn’t enough to graph how your application scales over time, you will need to identify hotspots in your code. We are able to collect data using perf that will show how much time the application spent in each function.
SOFT3410 Measuring, Graphing and Techniques
Concurrency Page 2 of 7
Question 3: Perf
It isn’t enough to retrieve timing information via the time utility. During your performance analysis, you will need to capture and profile your application. We are able to do this using a linux utility called perf (occasionally known as perf_events).
Given the following program, profile it and find out what functions are taking the majority of execu- tion time.
#include
#include
#include
//filter
void filter_by_value(int* pixels, size_t width, size_t height, int pixel_threshold, int negative_val, int positive_val) { for(size_t i = 0; i < height; i++) {
for(size_t j = 0; j < width; j++) {
if(pixels[(i * width) + j] >= pixel_threshold) {
} }
}
pixels[(i * width) + j] = positive_val; } else {
pixels[(i * width) + j] = negative_val;
}
//Initialise some data and filter by value inbetween
int main(int argc, char** argv) {
size_t width = 0; /* Select Width */
size_t height = 0; /* Select height */
size_t npixels = width*height;
unsigned int* data = malloc(sizeof(int)*npixels); for(size_t i = 0; i < n_pixels; i++) {
data[i] =rand();
}
unsigned int pos_val = 0xffffffff; unsigned int neg_val = 0x22222222;
filter_by_value(data, width, height,
(pos_val - neg_val)/2, neg_val, pos_val);
}
SOFT3410 Measuring, Graphing and Techniques
Concurrency
Page 3 of 7
SOFT3410 Measuring, Graphing and Techniques Use the following script to profile this program, you may need to run this script as a super user:
1 perf record -F 100 -g ./program
2 perf script > stackdata.perf
To help with visualising the data, we recommend you use FlameGraph from Github. This should visualise the execution stack and make it easier to to diagnose what function is executing for the longest time.
Once you have cloned the FlameGraph repository, you can use the stackcollapse script to fold the stack samples for flamegraph to process.
1 ./stackcollapse-perf.pl stackdata.perf > stackdata.folded
Afterwards, you can construct an SVG from the folded stack data using flamegraph.pl.
1 ./flamegraph.pl stackdata.folded > output.svg
Question 4: Loop Unrolling
A micro-optimisation that is more applicable on processors that typically halt on conditions. Loop unrolling is a technique employed to reduce the number of checks required on an iterative operation.
//Regular loop
for(int i = 0; i < N; i++) { c[i] = a[i]+b[i];
}
//Unrolled
for(int i = 0; i < N; i+= 4) { c[i] = a[i]+b[i];
c[i+1] = a[i+1]+b[i+1]; c[i+2] = a[i+2]+b[i+2]; c[i+3] = a[i+3]+b[i+3];
}
• What is the advantage of unrolling the loop?
• Does this change the time complexity?
• Whatissuewillariseifthelengthofa,bandcarenotdivisibleby4,ifwecontroltheallocation scheme, how could we address this?
Concurrency Page 4 of 7
Question 5: Tiling
Tiling is a technique that, on top of breaking data into divided sets, we are able to optimise the usage on the cache by iterating over the size of the cache line, proportional to the size of the data. Consider the following function which operates on blocks of data for each pixels. Can we already see some optimisation of tiling with this function? Discuss with your tutor and peers about the following function.
1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16
blur_filter(pixels, width, height, blur_dist) { for(i = 0; i < height; i++) {
} }
SOFT3410 Measuring, Graphing and Techniques
for(j = 0; j < width; j++) { avg=0;
count = 0;
for(ii = i-(blur_dist/2); ii < (blur_dist/2); ii++) {
for(jj = j-(blur_dist/2); jj < (blur_dist/2); jj++) { avg += pixels[(ii * width) + jj];
count++;
} }
avg = avg / count;
pixels[(i*width)+j] = avg; }
Concurrency
Page 5 of 7
} }
}
}
SOFT3410 Measuring, Graphing and Techniques
Question 6: Matrix Multiplication - Homework
You have been tasked with performance analysis of the following function. You will need to construct a suite of test cases along with a report. Use the information and tools you have learned from this tutorial and from the previous lecture to analyse the performance of the matrix multiplication function.
void multiply(const float* mata, size_t mata_width, size_t mata_height, const float* matb, size_t matb_width, size_t matb_height,
float** result_mat, size_t* res_width, size_t* res_height) {
if(result_mat) {
*res_width = matb_width;
*res_height = mata_height);
*result_mat = calloc(mata_height * matb_width,
sizeof(float));
for(size_t y = 0; y < mata_height; y++) { for(size_t x = 0; x < matb_width; x++) {
for(size_t k = 0; k < mata_width; k++) { (*result_mat)[(y * matb_width) + x] +=
(mata[(y * mata_width) + k] *
matb[(k * matb_width) + x]);
}
Given the above matrix multiplication function, you must do the following as part of your analysis and testing.
• Construct a set of test cases of different sizes (try 32x32, 64x64, 128x128 and larger)
• Using the test cases, measure and record the execution of the function above
• Record these results and use gnuplot or software of your choosing to graph the data
• Using techniques from the previous lecture and tutorial, apply them, benchmark, graph and compare a with baseline and other combinations of techniques
• Write a conclusion and explain your observations
You will need to consider how you could reorder the access of memory in the function and utilise techniques such as unrolling and tiling. Explain if your changes increase or decrease the performance of the function.
For your submission, create a github repository with the following name soft3410hw4 and invite your tutor to your repository. Your report must be submitted to canvas via TurnitIn.
Concurrency Page 6 of 7
You must make a code submission to the master branch by 11:59pm, 27th September, 2020. Please make your last commit and push occurs before this time.
You must submit your report to canvas portal by 11:59pm, 27th September, 2020.
SOFT3410 Measuring, Graphing and Techniques
Concurrency Page 7 of 7