程序代写 COMP5112 / MSBD5009 Parallel Programming (Spring 2022)

COMP5112 / MSBD5009 Parallel Programming (Spring 2022)
Assignment 3: CUDA Programming
Submission deadline: 23:59 (pm) on May 4, 2022
I. General Notes

Copyright By PowCoder代写 加微信 powcoder

1. This assignment counts for 15 points.
2. This is an individual assignment. You can discuss with others and search online resources, but
your submission must be your own code. Plagiarism checking will be enforced, and penalty will
be given to all students involved in an incident of academic dishonesty.
3. Add your name, student id, and email at the first two lines of comments in your submission.
4. Submit your assignment through Canvas before the deadline.
5. All submission will be compiled and tested uniformly on given machines of the course (CSE
Lab 2 machines for COMP5112 and Azure virtual machines for MSBD 5009).
6. Please direct any inquiries about this assignment to the designated TAs listed in the CANVAS
assignment page.
7. No late submission will be accepted!
II. Problem Description
This assignment is on the CUDA parallelization of a super-mer generation algorithm given an input list of DNA fragments, called reads. Each read is represented as a string of length 𝑛, and the characters in the string include ‘A’, ‘T’, ‘C’, and ‘G’. A k-mer is a substring of length 𝒌 of a read, so a read of length 𝑛 contains 𝑛 − 𝑘 + 1 k-mers. A minimizer of length 𝒑 of a k-mer is the lexicographically smallest length-𝑝 substring of the k-mer. Finally, a super-mer is generated by merging consecutive k-mers in the read that have the same minimizer. Two k-mers are consecutive in a read if their starting positions in the read are consecutive. Figure 1 illustrates the super-mer generation algorithm (Algorithm 1) with a simple example.
Input: 𝑘 = 9, 𝑝 = 5, 𝑛 = 14
𝑅𝑒𝑎𝑑 = CAAATTACTGCATA
Super-mer Generation (mimimizers are underlined)
𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒𝑟 ←AAATT,𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒𝑟_𝑏𝑒𝑔_𝑝𝑜𝑠 ← 1,𝑠𝑢𝑝𝑒𝑟𝑚𝑒𝑟_𝑏𝑒𝑔_𝑝𝑜𝑠 ← 0 no update on minimizer
𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒𝑟 ← 𝑛𝑒𝑤_𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒𝑟 ←AATTA, 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒𝑟_𝑏𝑒𝑔_𝑝𝑜𝑠 ← 2 generate current super-mer, 𝑠𝑢𝑝𝑒𝑟𝑚𝑒𝑟_𝑏𝑒𝑔_𝑝𝑜𝑠 ← 2
i=0 (k-mer #1)
i=1 (k-mer #2)
i=2 (k-mer #3)
i=2 (super-mer #1) CAAATTACTG
super-mer #1 is made up of k-mer #1 and #2, minimizer = AAATT
i=3 (k-mer #4) ATTACTGCA 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒𝑟 ← 𝑛𝑒𝑤_𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒𝑟 ←ACTGC, 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒𝑟_𝑏𝑒𝑔_𝑝𝑜𝑠 ← 7
i=3 (super-mer #2) AATTACTGC generate current supermer, 𝑠𝑢𝑝𝑒𝑟𝑚𝑒𝑟_𝑏𝑒𝑔_𝑝𝑜𝑠 ← 3 super-mer #2 is made up of k-mer #3 only, minimizer = AATTA
𝑠𝑢𝑝𝑒𝑟𝑚𝑒𝑟_𝑏𝑒𝑔_𝑝𝑜𝑠 ≠ 𝑛 − 𝑘, generate the last super-mer super-mer #3 is made up of k-mer #4 #5 #6, minimizer = ACTGC
i=4 (k-mer #5)
i=5 (k-mer #6)
(super-mer #3)
ATTACTGCATA
Figure 1: Illustration of super-mer generation (The minimizers are underlined) 1/5

III. Sequential Algorithm
Algorithm 1: Super-mer Generation (sequential) Input: readstring 𝑺 = 𝑠!,𝑠”,𝑠#,…,𝑠$%”
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24.
k-mer length 𝒌, minimizer length 𝒑 Procedure GenerateSupermer (𝑺, 𝒌, 𝒑):
𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒𝑟 = the minimum 𝒑-substring of 𝑺[𝟎, 𝒌 − 𝟏] 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒𝑟_𝑏𝑒𝑔_𝑝𝑜𝑠 = the starting position of minimizer in 𝑺 𝑠𝑢𝑝𝑒𝑟𝑚𝑒𝑟_𝑏𝑒𝑔_𝑝𝑜𝑠 = 0
for𝑖from𝟏to𝑛−𝑘do:
if 𝑖 > 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒𝑟_𝑏𝑒𝑔_𝑝𝑜𝑠 then:
𝑛𝑒𝑤_𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒𝑟 =theminimum 𝒑-substringof 𝑆[𝑖,𝑖+𝑘−1] update 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒𝑟_𝑏𝑒𝑔_𝑝𝑜𝑠
if 𝑆[𝑖+𝑘−𝑝,𝑖+𝑘−1] ≤ 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒𝑟 then:
𝑛𝑒𝑤_𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒𝑟 = 𝑆[𝑖+𝑘−𝑝,𝑖+𝑘−1]
update 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒𝑟_𝑏𝑒𝑔_𝑝𝑜𝑠 end if
if 𝑛𝑒𝑤_𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒𝑟 ≠ 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒𝑟 then:
𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒𝑟 = 𝑛𝑒𝑤_𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒𝑟
savecurrentsuper-mer: 𝑺[𝑠𝑢𝑝𝑒𝑟𝑚𝑒𝑟_𝑏𝑒𝑔_𝑝𝑜𝑠,𝑖+𝑘−1] 𝑠𝑢𝑝𝑒𝑟𝑚𝑒𝑟_𝑏𝑒𝑔_𝑝𝑜𝑠 = 𝑖
end if end for
if 𝑠𝑢𝑝𝑒𝑟𝑚𝑒𝑟_𝑏𝑒𝑔_𝑝𝑜𝑠 ≠ 𝑛 − 𝑘 then: savethelastsuper-mer: 𝑺[𝑠𝑢𝑝𝑒𝑟𝑚𝑒𝑟_𝑏𝑒𝑔_𝑝𝑜𝑠,𝑛]
end if procedure
IV. Your Implementation Task
The code skeleton “gensupermer_cuda.cu” is provided, in which the function “GenerateSupermer_GPU” calls the two kernel functions that you will implement:
1. The first kernel function “GPU_GenMinimizer”:
__global__ void GPU_GenMinimizer(_in_ _out_ T_GPU_data data, int K_kmer, int P_minimizer)
In this function, 𝑑𝑎𝑡𝑎.𝑟𝑒𝑎𝑑𝑠 is given in the CSR format, and the function will output 𝑑𝑎𝑡𝑎. 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒𝑟𝑠. You can choose to use a 2bits-compressed format to store each minimizer (See Notes at the end of Section IV) or use your own method.
2. The second kernel function “GPU_GenSKM”:
__global__ void GPU_GenSKM(_in_ _out_ T_GPU_data data, int K_kmer, int P_minimizer)
In this function, 𝑑𝑎𝑡𝑎.𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒𝑟𝑠 is given (generated by calling the kernel 𝑮𝑷𝑼_𝑮𝒆𝒏𝑴𝒊𝒏𝒊𝒎𝒊𝒛𝒆𝒓 ) and the function will output 𝑑𝑎𝑡𝑎. 𝑠𝑢𝑝𝑒𝑟𝑚𝑒𝑟_𝑜𝑓𝑓𝑠 . The starting

positions of all supermers in their corresponding reads are stored in 𝑑𝑎𝑡𝑎. 𝑠𝑢𝑝𝑒𝑟𝑚𝑒𝑟_𝑜𝑓𝑓𝑠. The last element of 𝑑𝑎𝑡𝑎. 𝑠𝑢𝑝𝑒𝑟𝑚𝑒𝑟_𝑜𝑓𝑓𝑠 will be n-k+1 to indicate the end of this array, where n is the length of the read and k is the length of kmer.
1. The CSR Format: The CSR (compressed sparse row) format stores all rows in a single data
array and has an offset array to point to the beginning position of each row in the data array. For example, we store all reads in the array 𝑟𝑒𝑎𝑑_𝐶𝑆𝑅, and the index of each read in the array 𝑟𝑒𝑎𝑑_𝐶𝑆𝑅_𝑜𝑓𝑓𝑠𝑒𝑡. Figure 2 shows an example of four reads AC, ACT, AG, and GCT stored in CSR.
Figure 2: An Example of CSR Format
2. 2bits-compression of minimizers: You can store a minimizer in a 32-bit unsigned integer for efficient comparison and access. In this format, each base (‘A’, ‘C’, ‘T’, ‘G’) in the minimizer is represented as a 2-bit integer (0,1,2,3 or 0b00, 0b01, 0b10, 0b11 accordingly). We provide the “d_basemap” constant array in the GPU memory for you to convert a minimizer string into an unsigned integer in the GPU kernel program GPU_GenMinimizer. For example, a minimizer (p=16) ACACACAGAGAGATTT can be represented as an unsigned int 0b00010001000100100010001000111111, and a minimizer (p=5) AACGT as an unsigned int 0b0000011011.
3. The struct “T_GPU_data” contains the following members: the “reads” array is the data array in the CSR format of the reads; the “reads_offs” array is the offset array in the CSR format of the reads; the “minimizers” array will be output in the kernel “GPU_GenMinimizer” and used as input in the kernel “GPU_GenSKM”; the “supermer_offs” array is the output of the kernel “GPU_GenSKM” and stores the starting positions of all supermers in their corresponding reads. In the following example, the minimizer of the second read “CAAACCTTGG” is “1,1,5” because the minimizer of the first kmer “CAAACCTT” is “AAAC”, which is expressed in 2bits- compression as “0b00000001” = 1. The minimizer of the third kmer “AACCTTGG” is “AACC”: “0b00000101” = 5. The supermer_offs in read 2 is 0, 2, 3. The first two elements 0 and 2 are the starting positions of the two supermers, and the last element 3 is n -k+1, where n is the length of the read, 10, and k is 8.

gensupermer _cuda.cu gensupermer.hpp main.cpp gensupermer _sequential.cpp dataset/*.txt result/ utilities.hpp
V. The Given Package
The CUDA program skeleton for you to fill in the missing code of the two GPU kernel functions.
The header file for gensupermer_cuda.cu
The main function of the CUDA version program.
The sequential version of super-mer generation. It is for your reference and result comparison.
Each text file is a test dataset with each line containing a read.
An empty folder for super-mer output.
A header file containing utility functions such as file loading, result output, and result comparison.
VI. Compile and Run
1. We will not apply any compiler optimization during the assessment (e.g., -O2 -O3 …).
2. No external library or header file is allowed (e.g., Boost or other third-party libraries).
An example of input file and corresponding output file:
The input file contains multiple lines of reads. The following is an example containing 5 reads.
Example input: dataset1.txt
The output will be the super-mers generated from the reads. The order of the super-mers in the output file is lexicographic. For example, the 7th and the last super-mers in the example output file are generated from the first read in the example input file. The 7th super-mer contains 7 consecutive k-mers. These 7 k-mers share the same minimizer AAAACTG.
GATAACGAGTTCTAGAAAACTGGGTCCC
AGATCAGCAAACCCTGAGAAAAA AGAAAAATTAGCAATAATTAGCAGTGTTCATAACA AGAAGACAACTGGGCCCGGGGGAC GAGGGAGAACCTGATTTCCAGAGT
AAAATTAGCAATAATTAGCAG AAATTAGCAATAATTAGCAGT AATTAGCAATAATTAGCAGTGTTCATAA AGAAAAATTAGCAATAATTAGCA AGAAGACAACTGGGCCCGGGGGAC AGATCAGCAAACCCTGAGAAAAA ATAACGAGTTCTAGAAAACTGGGTCCC ATAATTAGCAGTGTTCATAACA GAGGGAGAACCTGATTTCCAGAGT GATAACGAGTTCTAGAAAACT
Example output (k=21, p=7): my_gs_output.txt

Example of compiling and running the sequential program:
# Compile:
g++ gensupermer_sequential.cpp -o seq_gs -std=c++11
# Run and save super-mers to text file:
# (e.g., k=21, p=7, data file is ./dataset/dataset1.txt, and result file is ./result/seq_gs_output.txt):
./seq_gs 21 7 ./dataset/dataset1.txt ./result/
# Run without saving super-mers to text file:
./seq_gs 21 7 ./dataset/dataset1.txt
Example of compiling and running your CUDA program:
# Compile:
nvcc -std=c++11 gensupermer_cuda.cu main.cpp -o cuda
./cuda 21 7 8 512 ./dataset/dataset1.txt
# can be 0 or 1. If 1, the program will automatically check the correctness by comparing all_supermers with the sequential version. is optional; if not provided, the super-mers will not be saved to any file.
VII. Submission and Evaluation
You only need to submit your completed gensupemer_cuda.cu to Canvas before the deadline. You can add your code only in the specified section. Your program should not have any extra output to the result file or the standard output.
We will evaluate your program in different parameter settings. We will vary 𝑝 and 𝑘 (10 ≤ 2𝑝 ≤ 𝑘 < 23). The length of each read is greater than 22. Both the total length of all reads and that of the generated super-mers will be less than INT_MAX (0x7FFFFFFF). We will test with different numbers of threads (1 ≤ num_blocks_per_grid ≤ 32 and 32 ≤ num_threads_per_block ≤ 512). We will prepare multiple sets of configurations and datasets. Your grade will be marked based on both correctness and running time. 程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com