程序代写CS代考 scheme concurrency SOFT3410 Tutorial 4 Measuring and Graphing

SOFT3410 Tutorial 4 Measuring and Graphing
We will be looking into performance analysis of applications. We are mostly concerned with the cpu usage but we will check out memory usage.
Question 1: Time Command
To get a rough idea of the real time implications of our application, we can utilise a unix command to run with our program. The time command provides indication of the real world clock run time, amount of time it spent in userspace and the amount of time it spent in kernel space (via a system call).
You can use the code below.
void resize_nn(int* pixels_in, size_t in_width, size_t in_height, int* pixels_out, size_t out_width, size_t out_height) {
if(pixels_in
&& pixels_out && out_width && out_height && in_width && in_height) {
} }
double x_ratio = in_width / (double) out_width; double y_ratio = in_height / (double) out_height;
for(size_t y = 0; y < out_height; y++) { for(size_t x = 0; x < out_width; x++) { } } double adj_x = floor(x*x_ratio); double adj_y = floor(y*y_ratio); pixels_out[(y*out_width)+x] = pixels_in[(size_t) ((adj_y*in_width) + adj_x)]; Note: For anyone that is adventurous, use bitmap image as data for this function. 1 Question 2: GnuPlot, Timing and Collecting Data Using the code from exercise 1, construct a few test cases and examine different techniques such as unrolling or constructing the images as a jagged array. Use the following gnuplot configuration. 1 # Output Settings 2 set terminal png size 800,300 3 set output 'out.png' 4 5 # Labels, Title and Data 6 set key inside bottom right 7 set xlabel 'Dimensions' 8 set ylabel 'Time (s)' 9 set title 'NN-Resize Comparison' 10 plot "data.txt" using 1:2 title 'Column-Major Order' with linespoints, \ 11 "data.txt" using 1:3 title 'Jagged Array' with linespoints Generate and construct the data in the following form 1 #Dimensions Jagged Array Column-Major Order 2 1024x1024 0.2 0.1 3 2048x2048 0.4 0.2 4 4096x4096 0.8 0.3 5 ... To clarify, the # symbol is used for commenting with gnuplot, use this to convey the information of your plot configuration and data. • Why do we want to use visualisation software for this task? • What information do we gain from graphing the different configurations? • What other configurations could we create to visualise this data and why? You are not restricted to using gnuplot for your homework or assignment tasks. However, you must present a repeatable experiment that should show a similar graph and survive scrutiny from your tutor. It isn’t enough to graph how your application scales over time, you will need to identify hotspots in your code. We are able to collect data using perf that will show how much time the application spent in each function. SOFT3410 Measuring and Graphing Concurrency Page 2 of 5 Question 3: Perf Breaking down your code is necessary in discovering hotspots in your program. During your perfor- mance analysis, you will can capture and profile your application. We are able to do this using a linux utility called perf (occasionally known as perf_events. Given the following program, profile it and find out what functions are taking the majority of execu- tion time. void multiply(const float* mata, size_t mata_width, size_t mata_height, const float* matb, size_t matb_width, size_t matb_height, float** result_mat, size_t* res_width, size_t* res_height) { if(result_mat) { *res_width = matb_width; *res_height = mata_height; *result_mat = calloc(mata_height * matb_width, sizeof(float)); for(size_t y = 0; y < mata_height; y++) { for(size_t x = 0; x < matb_width; x++) { for(size_t k = 0; k < mata_width; k++) { (*result_mat)[(y * matb_width) + x] += (mata[(y * mata_width) + k] * matb[(k * matb_width) + x]); } } float* hadamard(void) { assert(((WIDTH - 1) & WIDTH) == 0); size_t w = WIDTH; size_t quad_size = 1; float* result = malloc(WIDTH * WIDTH * sizeof(float)); result[0] = 1; while ((w >>= 1) != 0) {
for (size_t y = 0; y < quad_size; ++y) { for (size_t x = 0; x < quad_size; ++x) { } } } } } quad_size *= 2; } return result; } const float v = result[IDX(x, y)]; result[IDX(x + quad_size, y)] = v; result[IDX(x, y + quad_size)] = v; result[IDX(x + quad_size, y + quad_size)] = -v; SOFT3410 Measuring and Graphing Concurrency Page 3 of 5 // Displays a matrix. void display(const float* matrix) { for (size_t y = 0; y < WIDTH; y++) { for (size_t x = 0; x < WIDTH; x++) { printf("%6.2f ", matrix[IDX(x, y)]); } printf("\n"); } } int main(void) { // Construct the matrices float* a = hadamard(); float* b = hadamard(); // Compute the result size_t rwidth = 0; size_t rheight = 0; float* result = NULL; multiply(a, WIDTH, WIDTH, b, WIDTH, WIDTH, &result, &rwidth, &rheight); // Verify the result for (size_t y = 0; y < WIDTH; y++) { for (size_t x = 0; x < WIDTH; x++) { assert(x == y ? result[IDX(x, y)] == WIDTH : result[IDX(x, y)] } } puts("done"); free(a); free(b); free(result); return 0; } Use the following script to profile this program: 1 perf record -F 100 -a -g -- ./program 2 perf script > stackdata.perf
To help with visualising the data, we recommend you use FlameGraph from Github. This should help visualise the execution stack and make it easier to to diagnose what function is executing for the longest time.
Once you have cloned the FlameGraph repository, you can use the stackcollapse script to fold the Concurrency Page 4 of 5
SOFT3410 Measuring and Graphing
=

• What is the advantage of unrolling the loop?
• Does this change the time complexity?
• Whatissuewillariseifthelengthofa,bandcarenotdivisibleby4,ifwecontroltheallocation scheme, how could we address this?
• What are disadvantages to loop unrolling?
SOFT3410 Measuring and Graphing
stack samples for flamegraph to process.
1 ./stackcollapse-perf.pl stackdata.perf > stackdata.folded
Afterwards, you can construct an SVG from the folded stack data using flamegraph.pl.
1 ./flamegraph.pl stackdata.folded > output.svg
Question 4: Loop Unrolling
A micro-optimisation that is more applicable on processors that typically halt on conditions. Loop unrolling is a technique employed to reduce the number of checks required on an iterative operation.
//Regular loop
for(int i = 0; i < N; i++) { c[i] = a[i]+b[i]; } //Unrolled for(int i = 0; i < N; i+= 4) { c[i] = a[i]+b[i]; c[i+1] = a[i+1]+b[i+1]; c[i+2] = a[i+2]+b[i+2]; c[i+3] = a[i+3]+b[i+3]; } Concurrency Page 5 of 5