Project #2 – LU Decomposition using OpenMP
SWE3021 Multicore Computing – Fall 2018
Due date: October 19 (Fri) 17:00pm
1 Notes
2
1. For this project, please use swui, swye, and swji server. Since this project will use lots of CPU cycles, I do not want you to mess up swin that is used for OS and System Programming classes.
2. Please do not run more than 16 threads. And, please do not run a job that runs for more than 5 minutes.
Parallel LU Decomposition
LU Decomposition is a classical method for transforming an N x N matrix A into the product of a lower-triangular matrix L and an upper-triangular matrix U,
A = LU.
You will use a process known as Gaussian elimination to create an LU decomposition of a square matrix. By performing elementary row operations, Gaussian elimination transforms a square matrix A into an equivalent upper-triangular matrix U. The lower-triangular matrix L consists of the row multipliers used in Gaussian elimination.
In this assignment, you will develop an OpenMP parallel implementation of LU decomposition that use Gaussian elimination to factor a dense N x N matrix into an upper-triangular one and a lower-triangular one. In matrix computations, pivoting involves finding the largest magnitude value in a row, column, or both and then interchanging rows and/or columns in the matrix for the next step in the algorithm. The purpose of pivoting is to reduce round-off error, which enhances numerical stability. In your assignment, you will use row pivoting, a form of pivoting involves interchanging rows of a trailing submatrix based on the largest value in the current column. To perform LU decomposition with row pivoting, you will compute a permutation matrix P such that PA = LU. The permutation matrix keeps track of row exchanges performed. Below is pseudocode for a sequential implementation of LU decomposition with row pivoting.
inputs : a(n,n)
outputs : π(n) , l (n,n) , and u(n,n)
initialize π as a vector of length n
initialize u as an n x n matrix with 0s below the diagonal initialize l as an n x n matrix with 1s on the diagonal and 0s
above the diagonal for i = 1 to n
π[i]=i
for k = 1 to n
max = 0
for i = k to n
if max< |a(i,k)| max= |a(i,k)| k’=i
if max==0
error (singular matrix)
swap π[k] and π[k’]
1
swap a(k,:) and a(k’ ,:)
swap l(k,1:k−1) and l(k’ ,1:k−1) u(k,k) = a(k,k)
for i = k+1 to n
l(i,k) = a(i,k)/u(k,k)
u(k,i) = a(k,i) for i = k+1 to n
for j = k+1 to n
a(i,j) = a(i,j) − l(i,k)∗u(k,j)
Here, the vector π is a compact representation of a permutation matrix p(n,n) , which is very sparse . For the ith row of p, π( i ) stores the column index of the sole position that contains a 1.
You will write a shared-memory parallel program that performs LU decomposition using row pivoting. You will develop the solution using OpenMP. Your LU decomposition implementation should accept four arguments:
1. n – size of a matrix
2. r – random seed number
3. t – number of threads (smaller than 16)
4. p – print flag (if 1, print matirx L, U, and A. if 0, print nothing)
Your output format must be as follows. Note that you should print out floating point values using “%.4e” format specifier in printf.
$ a.out 4 1 8 1
L:
1.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 1.0851e+00 1.0000e+00 0.0000e+00 0.0000e+00 3.3061e-01 -1.8387e+00 1.0000e+00 0.0000e+00 4.3417e-01 -1.4853e+00 2.0883e-01 1.0000e+00 U:
1.8043e+09 8.4693e+08 1.6817e+09 1.7146e+09 0.0000e+00 -4.9473e+08 -1.1048e+09 -2.1071e+08 0.0000e+00 0.0000e+00 -1.5622e+09 3.9619e+08 0.0000e+00 0.0000e+00 0.0000e+00 8.2737e+08
A:
1.8043e+09 8.4693e+08 1.6817e+09 1.7146e+09 1.9577e+09 4.2424e+08 7.1989e+08 1.6498e+09 5.9652e+08 1.1896e+09 1.0252e+09 1.3505e+09 7.8337e+08 1.1025e+09 2.0449e+09 1.9675e+09
Your programs will allocate an n x n matrix A of double precision (64-bit) floating point variables.
You should initialize the matrix with uniform random numbers computed using a random number generator rand(). Note that you must call srand(r) before you initialize the matrix. The srand() function sets the starting point for producing a series of pseudo-random integers. If srand() is not called, the rand() seed is set as if srand(1) were called at program start. Any other value for seed sets the generator to a different starting point. Then, you should apply LU decomposition with partial pivoting to factor the matrix into an upper-triangular one and a lower-triangular one. Have your program time the LU decomposition phase by reading the real-time clock before and after and printing the difference.
Note: To evaluate the performance, you should use a problem size larger than n = 1000. But do not run with too large parameter as in-ui-ye-ji cluster is shared by many students.
2
3 How to Submit
To submit your project, you must rename your code to project2.cpp and run multicore_submit command in your project directory in swji.skku.edu server.
$ multicore_submit project2 project2.cpp
This command will submit your code to the instructor’s account.
For any questions, please post them in Piazza so that we can share your questions and answers with other students. Please feel free to raise any issues and post any questions. Also, if you can answer other students’ questions, please don’t hesitate to post your answer. You would get some credits for posting questions and answers in Piazza.
3