CS计算机代考程序代写 cuda GPU Assignment 3 – Sparse matrix formats on GPUs

Assignment 3 – Sparse matrix formats on GPUs
Submission Deadline: Mo 29 November, 10am

Task 1: So far we learned about the CSR format. On CPUs this is a widely used

standard format. However, it has some severe disadvantages on GPUs, but also on

modern vector extensions (AVX, etc.) of CPUs. The paper Improving the

performance of the sparse matrix vector product with GPUs by Vazquez et. al.

describes these difficulties and also the alternative Ellpack and Ellpack-R formats

that improve on the shortcomings of CSR on GPUs and offer more performance on

these devices. Summarize the difficulties of the CSR format and how Ellpack and

Ellpack-R solve these difficultues. For a very high mark look for some more modern

papers on GPU based matvecs and describe what other modern developments have

happened since that paper.

Task 2: Given a sparse matrix in CSR format. Write a CPU based Numba routine that

converts this matrix into the Ellpack-R format.

Implement a new class EllpackMatrix derived from

scipy.sparse.linalg.LinearOperator, which in its constructor takes a Scipy

sparse matrix in CSR format, converts it to Ellpack-R and provides a routine for

matrix-vector product in the Ellpack-R format. At the end the following prototype

commands should be possible with your class.

The sparse-matrix vector product at the end shall be executed in the Ellpack-R

format.

For your implementation you can either use CPU based Numba code using prange

for multithreading or write an implementation in Numba-Cuda. For an overall mark of

the assignment beyond 80% we would expect a Numba-Cuda implementation

(assuming also all other parts are of high standard).

Demonstrate in your solution that your code provides the correct result by verifying

for a sparse random matrix with the standard CSR matvec of sparse

matrices and showing for 3 random vectors that the relative distance of your

Ellpack-R matvec to the CSR matvec result is in the order of machine precision.

Use the discretise_poisson method from the lecture notes to generate the sparse

matrix for the Poisson discretization and plot the times for a single matvec for

growing matrix-sizes (go as high as you think is reasonable) using the standard

Scipy csr matvec and your own Ellpack-R implemementation.

my_sparse_mat = EllpackMatrix(csr_mat)
x = numpy.random.randn(my_sparse_mat.shape[1])
y = my_sparse_mat @ x

1000 × 1000

https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5577904

By Timo Betcke

© Copyright 2020.

Previous
Assignment 2 – GPU
Accelerated solution of
Poisson problems

Your implementation may not be faster since the matrix derives from a very simple

2d problem with few elements per row. This is not the situation where the additional

complexitiy is usually needed.

Finally, go shopping in the Matrix Market and try to find two matrices that better

show off the Ellpack-R format and do timing comparisions for your chosen matrices.

https://tbetcke.github.io/hpc_lecture_notes/assignment_2.html
https://math.nist.gov/MatrixMarket/