程序代写代做代考 cuda kernel GPU algorithm cache CSC3150 Assignment 3

CSC3150 Assignment 3
In Assignment 3, you are required to simulate a mechanism of virtual memory via GPU’s memory.
Background:
 Virtual memory is a technique that allows the execution of processes that are not completely in memory. One major advantage of this scheme is that programs can be larger than physical memory.
 In this project, you should implement simple virtual memory in a kernel function of GPU that have single thread, limit shared memory and global memory.
 We use CUDA API to access GPU. CUDA (Compute Unified Device Architecture) is a parallel computing platform and programming model.
 We don’t consider any parallel computing technique in this project, only use single thread to serial access that let us focus our virtual memory implementation.
 There are many kinds of memory in CUDA GPU, we only introduce two memories (global memory and shared memory) which relate to our project.
 Global memory
– Typically implemented in DRAM
– High access latency: 400-800 cycles
 Shared memory
– Extremely fast
– Configurable cache
– Memory size is small (16 or 48 KB)
The GPU Virtual Memory we need to design:
 Because the shared memory in GPU with small size and low latency access, we take the shared memory as the traditional CPU physical memory.
 Take the global memory as the disk storage (secondary memory).
 In CUDA, the function executed on GPU that defined by programmer, is
called kernel function.
 A kernel function would no longer be constrained by the amount of
shared memory that is available. Users would be able to write kernel functions for an extremely large virtual address space, simplifying the programming task.

 Implement a paging system with swapping where the thread access data in shared memory and retrieves data from global memory (secondary memory).
 We only implement the data swap when page fault occurs (not in the instruction level).
Specification of the GPU Virtual Memory we designed:
 Secondary memory (global memory) – 128KB (131072 bytes)
 Physical memory (share memory)
– 48KB (32768 bytes)
 32KB for data access
 16KB for page table setting  Memory replacement policy for page fault:
– If shared memory space is available, place data to the available page, otherwise, replace the LRU set. Pick the least indexed set to be the victim page in case of tie.
 We have to map virtual address (VA) to physical address (PA).
 The valid bit of each page table block is initialized as false before first
data access in shared memory.
 Page size
– 32 bytes
 Page table entries
– 1024 (32KB / 32 bytes)

Template structure:
 At first, load the binary file, named “data.bin” to input buffer before
kernel launch and return the size of input buffer: input_size.
 Launch to GPU kernel with single thread, and dynamically allocate 16KB of share memory, which will be used for variables declared as “extern __shared__”
 Initialize the virtual memory.
 Initialize the page table
(Considering the page entries is limited here, we’re using invert page table).

 Under vm_write, you should implement the function to write data into vm buffer.
 Under vm_read, you should implement the function to read data from vm buffer.
 Under vm_snapshot, together with vm_read, you should implement the program to load the elements of vm buffer (in shared memory, as physical memory) to results buffer (in global memory).
 For user_program (operations on vm_read/vm_write/vm_snapshot), you should strictly follow the name and input parameters as:
– user_program(VirtualMemory *vm, uchar *input, uchar *results, int input_size)
We will replace user_program for testing, please do not change any symbol of these parameters.
 Count the page fault number when executing paging replacement.
 In Host, dump the contents of binary file into “snapshot.bin”.
 Print out page fault number when the program finish execution.

Functional Requirements (90 points):
 Implement vm_write to write data to vm buffer (shared memory, as physical memory) (10 points)
 Implement vm_read to read data from vm buffer (shared memory, as physical memory) (10 points)
 Implement vm_snapshot together with vm_read to load the elements of vm buffer (in shared memory, as physical memory) to results buffer (in global memory, as secondary storage). (10 points)
 Use provided user program to test memory management. (5 points)
 When swapping memory, you need to implement with LRU paging
algorithm. (40 points)
 Print out correct page fault number. (5 points)
 Correctly dump the contents to “snapshot.bin”. (10 points)

Bonus (15 points)
Background:
 We used only one page-table in basic task, if we want to launch multiple threads and each thread use the mechanism of paging, we should design a new page table for managing multiple threads.
 Usually, each thread has an associated page-table, but we don’t have enough memory size (shared memory) to setup.
 To solve this problem, we can use an inverted page table (refers to Chapter 8).
Requirement:
 Basing on Assignment 3, launch 4 threads in kernel function, all threads concurrently execute it. (2 point)
 To avoid the race condition, threads execute vm_read() / vm_write() should be a non-preemptive priority scheduling, the priority of threads should be as:
Thread 0 > Thread 1 > Thread 2 > Thread 3.
Maintain the scheduling when operating on vm_read() / vm_write() /
vm_snapshot(). (5 points)
 Modify your paging mechanism to manage multiple threads. (5 points)
 Print the times of page fault of whole system before the program end. (2
point)
 Correctly dump the contents to “snapshot.bin”. (1 point)
Report (10 points)
Write a report for your assignment, which should include main information as below:
 Environment of running your program. (E.g., OS, VS version, CUDA version, GPU information etc.)
 Execution steps of running your program.
 How did you design your program?
 What’s the page fault number of your output? Explain how does it come
out.
 What problems you met in this assignment and what are your solution?
 Screenshot of your program output.
 What did you learn from this assignment?

Submission
 The report should be submitted in the format of pdf, together with your s ource code. Please compress all files into a single zip file and name it ac cording to your student id. The project structure is illustrated as below:  Assignment_3_Student ID.zip
– Source
o Your project folder
o Within the folder ‘CSC3150_A3’, it should include files below:
 main.cu
 virtual_memory.cu
 virtual_memory.h
 user_program.cu
 data.bin
 snapshot.bin (auto generated after running your
program) – Bonus
o Your project folder
o Within the folder ‘CSC3150_A3’, it should include files below:
 main.cu
 virtual_memory.cu
 virtual_memory.h
 user_program.cu
 data.bin
 snapshot.bin (auto generated after running your
program)
– Report (in pdf format)
 Due date: End (23:59) of 08 Nov, 2020

Grading rules
Completion
Marks
Bonus
15 points
Report
10 points
Completed with good quality
80 ~ 90
Completed accurately
80 +
Fully Submitted (compile successfully)
60 +
Partial submitted
0 ~ 60
No submission
0
Late submission
Not allowed