parallel-demo
Compute min, max and average values, and standard deviation.
Using float value for temperature.
Report memory transfer, kernel execution and total program execution times.
Adopting the method of reduceadd3 in tutorial 3 of the work shop to compute the sum. This method uses
local memory to store partial sums resulting from each reduction step. The procedure first copies the data
from global to local memory, calculates all reduction steps on local memory and then copies the result to
the output variable. The host adds all the partial sum to get the final total sum and diving the total number of
values to get the average.
Since we read the global memory only once at the beginning the barriers between the steps refer now to
local memory. This is a much faster operation. The size of local memory can be defined in the host code
and is specified as an additional kernel parameter. The size of local memory corresponds to the number of
work items in a workgroup. I set the size of local memory to 64 in the code.
The computation of min and max are similar to computing the average and only require several lines
modification.
The method for computing standard deviation is as follows. First, the map_square computes the B[i] =
pow(A[i] – avg, 2) where A is temperature value and avg is average temperature value computed previously.
Then using previous described computing average method to compute the B’s average value, the square
root of this average value is the standard deviation.
Achieved
Method for Statistic
Computing average
Computing min and max
Computing standard deviation
Method for Time measuring
Measure Overall Runtime
In the beginning of the program execution, record the start time using clock() , and in the end record
the ending time using clock , the difference is the total execution time.
Use getProfilingInfo method to compute each event execution time.
Sum all the events which belong to memory transfer such as
enqueueWriteBuffer and enqueueReadBuffer to get the memory transfer time.
Sum all the events which belong to kernel execution such as
enqueueNDRangeKernel to get the kernel execution Time.
The most time is spent on reading from file. Computation itself is very quick.
Measure memory transfer and kernel execution Time
Observation