代写 R C algorithm math MPI openmp scala parallel statistic CMDA 3634 SP2019 Image Statistics and Distributing Data Homework 03

CMDA 3634 SP2019 Image Statistics and Distributing Data Homework 03
Homework 03: Image Statistics and Distributing Data
Assigned: 2019-03-27 12:00:00
Due: 2019-04-03 23:59:00
Points: 100 pts is full credit (25 pts extra available from any questions)
Instructions: All of this assignment will be submitted via Canvas. Handwritten solutions will receive 0 credit. Show your work and derivations.
Deliverables: For this assignment, you are expected to upload a PDF file to canvas along with all figures and .tex files used to generate that pdf. A space will be provided in the assignments repository on code.vt.edu for you to upload your materials to, as well.
Collaboration: This assignment is to be completed by yourself, however, you make seek assistance from your classmates. In your submission you must indicate from whom you received assistance.
Honor Code: By submitting this assignment, you acknowledge that you have adhered to the Virginia Tech Honor Code and attest to the following:
I have neither given nor received unauthorized assistance on this assignment. The work I am pre- senting is ultimately my own.
References
• Writing pseudo-code in Latex: https://en.wikibooks.org/wiki/LaTeX/Algorithms
• Making tables in Latex: https://en.wikibooks.org/wiki/LaTeX/Tables
• Formula for calculating the standard deviation: https://www.khanacademy.org/math/probability/ data-distributions-a1/summarizing-spread-distributions/a/calculating-standard-deviation-step-by-st
• Matrix-matrix multiplication review: https://www.khanacademy.org/math/precalculus/precalc-matrices/ multiplying-matrices-by-matrices/a/multiplying-matrices
Assigned: 2019-03-27 12:00:00 1 Due: 2019-04-03 23:59:00

CMDA 3634 SP2019 Image Statistics and Distributing Data Homework 03
Problems
1. Distributing an image: Consider an MImageF image, as in Projects 1 and 2, which is N columns ×M rows, so the image data D is an M × N floating point array. To process the image in parallel, the data array needs to be distributed over P MPI processes. Each process will operate on a contiguous set of the rows. Specifically, if a typical process is assigned R rows, then process 0 is responsible for rows 0, 1, . . . , R − 1, process 1 responsible for rows R,R+1,…,2R−1, process 2 is responsible for rows 2R,2R+1,…,3R−1, and so on.
(a) (8 points) In general, M may not be evenly divisible by P. Write a formula for Mp, the number of rows managed by process p as a function of M, P and p. This will will be a piecewise formula. No process should have more rows than the process immediately before it, that is, Mp ≥ Mp+1, ∀p, and no process should have more than one row more than any other process, that is, max |Mp − Mq | = 1, ∀p, q.
(b) (3 points) Using the formula you developed above, let M = 13 and P = 5. For each process, give the number of rows it owns, Mp, and the first and last row index of those rows. Give your answer as a table.
(c) (4 points) Compare the strategy for distributing the rows in parts (a) and (b) to the strategy for dividing up the array in Lab 9. Why is this strategy in parts (a) and (b) better? This should be a 1 sentence answer.
In the next two parts, you will be asked to write pseudocode for the following function, which distributes an image seen initially only by process 0 to many sub-images, each owned by a different process, following the division you outlined in parts (a) and (b). Let Dp be the chunk of rows of D assigned to process p.
1: 2: 3: 4: 5: 6: 7: 8: 9:
10: 11: 12: 13: 14: 15: 16:
function load image(image filename) p ← get rank(COMMUNICATOR)
P ← get size(COMMUNICATOR)
if p = 0 then
M, N ← read image size(image filename) allocate image(M, N, D)
D ← read image(image filename)
send broadcast(COMMUNICATOR, (M,N))
else
receive broadcast(COMMUNICATOR, (M,N)) end if
Mp,jp ← subimage rows(M, P, p)
allocate image(Mp, N, Dp)
distribute subimages(M, N, D, P, p, Mp, Dp) return M,N,Mp,Dp
end function
(d) (8 points) Write pseudocode for a function called subimage rows that, when called by rank p, calculates the number of rows Mp of the original image it is responsible and the first row index jp of that set of rows.
1: 2: 3: 4:
function subimage rows(M, P, p) ***your pseudocode here**** return Mp, jp
end function
(e) (8 points) Write pseudocode for a function called distribute subimage that, when called by each process, copies the relevant portion the image D from process 0 to process p.
1: 2: 3: 4:
function distribute subimages(M, N, D, P, p, Mp, Dp) ***your pseudocode here****
return Dp
end function
Assigned: 2019-03-27 12:00:00 2 Due: 2019-04-03 23:59:00

CMDA 3634 SP2019 Image Statistics and Distributing Data Homework 03
(f) (3 points) Assume that we only have a limited amount of memory available to each process. Why is it bad for rank 0 to have to load the entire array before sending data to other ranks? This should be a 1 sentence answer.
(g) (8 points) Assume that we have a limited amount of memory for each process. In a new function, load image efficiently, rewrite the pseudocode for load image so that it does not read, and store, the entire image from disk at once. Other than the portion of the image that it will own, process 0 should never have more than Mmax = ⌈M/P⌉ rows of the image in memory at any time.
1: function load image efficiently(image filename)
2: ****your pseudocode here****
3: return M,N,Mp,Dp
4: end function
2. Calculate basic image statistics: Consider the following pseudocode, which gathers statistics of an M ×N MImageF image D that is distributed over P processes using the algorithms developed above.
1: function calc image stats(image filename)
2: p ← get rank(COMMUNICATOR)
3: P ← get size(COMMUNICATOR)
4: M, N, Mp, Dp ← load image efficiently(image filename)
5: min ← calc min(M, N, P, p, Mp, Dp)
6: max ← calc max(M, N, P, p, Mp, Dp)
7: mean ← calc mean(M, N, P, p, Mp, Dp)
8: stddev ← calc stddev(M, N, P, p, Mp, Dp)
9: return min, max, mean, stddev
10: end function
(a) (10 points) Write pseudocode for the parallel function calc min to calculate the minimum pixel
value over all of D. As in Problem 1, each process p owns Mp of the total image rows in an array Dp.
(b) (3 points) Modify the calc min pseudocode to implement calc max and calculate the maximum
pixel value over D.
(c) (10 points) Write pseudocode for the parallel function calc mean to calculate the mean pixel value over all entries of D. Hint: There are multiple approaches to this problem. Consider whether you want to calculate sums or means over each process and what this means when you reduce the result over all process. Think about how you will account for different processes having different size subsets of D.
(d) (3 points) Why does the mean need to be calculated before the standard deviation? This should be a 1 sentence answer.
(e) (8 points) Write pseudocode for the parallel function calc stddev to calculate the standard devi- ation of D. Hint: You may use the previous functions in your implementation of calc stdev.
(f) (3 points) Why is it more difficult to compute, in parallel, the global median than the global mean?
3. Computing image histograms: An image histogram gives information about the distribution of tones in the image. Image histograms can be used to interpret image properties such as the exposure and contrast in an image. A histogram divides the data space into K bins and counts the pixels whose data falls into those bins. An example photo and its histogram are given in Figure 1.
For example, for the MImageF format, the data space is on the range [0, 1]. If K = 4, the 4 bins would be b0 = [0, 1), b1 = [1, 1), b2 = [1, 3), and b3 = [3,1]. The histogram measures raw pixel counts. Thus, if
,
442244
we have a 3×4 image,
1.0
0.4
0.3
0.8
0.5
1.0
0.0
0.0
0.2
0.1
0.0
0.9
Assigned: 2019-03-27 12:00:00 3
Due: 2019-04-03 23:59:00

CMDA 3634 SP2019 Image Statistics and Distributing Data Homework 03
Figure 1: A greyscale image and its histogram from Adobe Lightroom 6. then the histogram values would be
.
(a) (10 pts) Design a sequential algorithm for computing the histogram of a monochrome image, in MImageF format, with K bins. Assume that the image is M rows × N columns. Give pseudocode for your algorithm. You may assume that the array for the histogram is pre-allocated.
(b) (10 pts) Design a parallel algorithm for computing the histogram of a monochrome image, in MImageF format, with K bins. Assume that the image is M rows × N columns and that the data is distributed over P processors, with each processor having MP rows of the image. At the end of the algorithm, each processor should have a copy of the histogram. You may use high-level parallel primitives (e.g., reductions or broadcasts), but justify your selection. Give pseudocode for your algorithm. If you like, you may use your sequential algorithm, as a function, to simplify your notation. You may assume that the array for the histogram is pre-allocated.
4. Weak scalability studies: In Lab 07 we performed a weak scaling study on the matrix-vector product between an N × N matrix and an N × 1 vector. This operation required O(N2) floating point operations. In a weak scaling study, increasing the processors requires us to increase the amount of work proportionally. Thus, for the matrix-vector product, if N1 = 10,000 on a single processor (P = 1), doubling the work for two processors (P = 2) required that N2 = 14,142. Similarly, for 8 times the work and P = 8, N8 = 28, 284.
Consider the matrix-matrix product, C = AB, which produces an N × N matrix C from two N × N dense matrices A and B. This operation requires O(N3) floating point operations. Assume we have a parallel algorithm for the matrix-matrix product. In this problem, we will design a weak scalability study for this algorithm.
(a) (5 pts) For the matrix-vector product, give a general formula for NP as a function of N1 and P.
(b) (5 pts) For the matrix-matrix product, give a general formula for NP as a function of N1 and P.
(c) (15 pts) Assume that we can execute the program in parallel with MPI (using OpenMPI) and that the program is executed by ./time matmat NP . Write an SLURM sbatch submission script which submits one job to Cascades to perform a weak scaling study on this algorithm. This study should test the algorithm for P = 2, 4, 8, 16, 32, 64, 128, and 256 processors for N1 = 10, 000.
5. (1 pts) Who did you collaborate with on this assignments? What references did you use?
5
2
1
4
b0
b1
b2
b3
Assigned: 2019-03-27 12:00:00 4 Due: 2019-04-03 23:59:00