COMP5822M – Exercise 1.4 Resources
1 Project Updates 1
2 Vertex Buffers 2
3 Uniform Buffer & Descriptors 10
Copyright By PowCoder代写 加微信 powcoder
4 Textures 20
5 Second object 32
6 Depth buffer & testing 35
7 2nd pipeline & Blending 38
Exercise 4 introduces Vulkan resources, specifically buffers and textures. While textures are mainly used for texturing as their name implies, the buffers serve dual purposes in Exericse 4. They will are to store both vertex data (i.e., as vertex buffers), and to store uniform data (i.e., as uniform buffers). Textures and uniform buffers form uniform inputs to the graphics pipeline. Uniform inputs are declared via descriptors, which are further grouped into one or more descriptor sets. Refer to Figure 1 for an overview.
Exercise 4 first describes vertex buffers to store the vertex data, which was previously hardcoded in the shaders. Next, Exercise 4 introduces uniform buffers (Vulkan) and the corresponding uniform interface blocks (GLSL) to pass a projection matrix to the shader. With this, Exercise 4 finally starts drawing in 3D. Next, Exercise 4 introduces textures to make the renderings more detailed and interesting. When adding a second object, the need for hidden surface removal becomes apparent, which is done by introducing a depth buffer and enabling depth testing. Finally, Exercise 4 creates a second pipeline that enables blending. See teaser image.
1 Project Updates
As always, grab the exercise code and browse through it. You will be working with the code in the exercise4 directory and with some of the code in the labutils project. Transfer your labutils solutions from Exer- cise 3 to Exercise 4. Exercise 4 adds a few new files to labutils, but the code covered in previous exercises is otherwise unchanged for the moment.
Third Party Software Exercise 4 introduces two additional third-party libraries:
GLM, a header only library defining common matrix and vector types, as well as associated opera- tions/functions. While originally designed with OpenGL in mind, the library has been updated with functionality related to Vulkan and other APIs.
Vulkan Memory Allocator (VMA), a single-source library originally developed by AMD that helps with Vulkan memory management. Aside from providing convenience functions to allocate buffers and images, the library allocates Vulkan memory in large chunks and suballocates from these for individual resources. This is the recommended/best practice for memory allocations.
COMP5822M – Exercise 1.4
layout( location = 0 ) in vec3 iPosition; layout( location = 1 ) in vec3 iColor;
layout( set = 0, binding = 0 ) uniform UScene {
mat4 camera; mat4 projection; mat4 projCam;
Descriptor Set Layout binding 0: Uniform Buffer
Pipeline Layout
Descriptor Set 0
Graphics Pipeline
Descriptor Set 0
Vertex Inputs:
Vertex Buffer 0
Vertex Buffer 1
USAGE_UNIFORM_BUFFER
camera[0][0], camera[0][1], camera[0][2],
projCam[3][2], projCam[3][3]
Descriptor Set binding 0: VkBuffer
vkCmdDraw…()
Descriptor Set 0
Vertex Buffer 0
Vertex Buffer 1
Figure 1: Overview of how data is fed into a draw call. The layouts (orange outline) declare what type of uniform data is fed into a graphics pipeline. Multiple graphics pipelines may utilize the same layouts. The actual data is specified when drawing. Uniform data is bound via descriptor sets; vertex buffers are bound as-is. (The overview corresponds to the state after Section 3. Later sections add additional data elements, such as textures.)
Labutils You will again find a few new files in labutils:
allocator.{hpp,cpp} C++ wrapper around the VMA allocator object ( VmaAllocator). Described in
Section 2.
angle.hpp A few convenience functions related to angles (e.g. conversion between degrees and radians).
vkbuffer.{hpp,cpp} Wrapper for Vulkan buffers. This is similar to the buffer object introduced in Ex- ericse 2, but uses a VmaAllocation instead of the raw DeviceMemory allocation. Introduced in detail in Section 2.
vkimage.{hpp,cpp} Wrapper for Vulkan images, similar to the buffer object described above. See Section 4 for details.
2 Vertex Buffers
(Vulkan Memory Allocator (VMA) — Buffer utilities— Buffer barrier — Vertex Data — Graphics Pipeline — Drawing)
So far, the exercises have generated the geometry directly in the vertex shader. While this can be very efficient when it is possible, we frequently need to render models and meshes modelled in dedicated software. To do so, vertex data making up the the meshes needs to be transferred into VkBuffers. The graphics pipeline reads the per-vertex data from the buffers and feeds this into the vertex shader as inputs.
Exercise 1.2 already used a VkBuffer to download image data from the GPU to the CPU. Exercise 1.4 does the reverse. We specify data in a host-visible staging buffer, and copy this to the final GPU-only buffer.
Buffer USAGE_VERTEX_BUFFER
-1.f, 0.f, -2.f, // v0 -1.f, 0.f, +2.f, // v1 +1.f, 0.f, +2.f, …
Buffer USAGE_VERTEX_BUFFER
0.4f, 0.4f, 1.0f, // c0 1.0f, 0.4f, 0.4f, // c1 0.4f, 1.0f, 0.4f, …
The above process assumes a system with dedicated VRAM, of which the majority is not host- visible. A system where all memory is host-visible (e.g., many integrated GPUs) could skip the staging buffer. Nevertheless, the process outlined here works for all kinds of systems, even if it performs unnecessary work in some cases.
COMP5822M – Exercise 1.4 3
Vulkan Memory Allocator Exercise 1.2 allocated the buffer with “raw” Vulkan calls. In Exercise 1.4, we will instead use the Vulkan Memory Allocator (VMA). VMA simplifies the process of allocating both buffers and images by reducing the necessary boilerplate code in many common cases. Additionally, instead of creating one VkDeviceMemory allocation per resource (as we did in Exerercise 1.2), VMA maintains a small number of “large” VkDeviceMemory allocations and sub-allocates memory from these to the individual Vulkan resources. This is especially useful on e.g. Windows, where the number of VkDeviceMemory allocations is limited.
VMA is documented quite exhaustively. Study the documentation briefly – there are several short examples in the Quick Start section. In order to use VMA, one first must create a VmaAllocator object. VMA follows the Vulkan API conventions, so the code’s general structure should look somewhat familiar.
To simplify the process, labutils provides the Allocator class (labutils/allocator.{hpp,cpp}). It is fully implemented and ready to go. Study the code briefly. An instance of the Allocator is already created in main() in Exercise 1.4.
Buffer utilities In Exercise 1.2, we defined a temporary Buffer class. It holds two Vulkan resources: the main VkBuffer handle, and a VkDeviceMemory handle referring to the memory allocation that backed the buffer. With VMA, the VkDeviceMemory handle is replaced by a VmaAllocation handle instead. Exercise 1.4 introduces a wrapper class similar to the one defined in Exercise 1.2, except that is defined in the labutils project this time. Find it in vkbuffer.{hpp,cpp}.
The header additionally declares a create_buffer helper function. Before proceeding, the function needs to be implemented (vkbuffer.cpp). It is just a short wrapper around the vmaCreateBuffer function defined by VMA:
In some cases, VMA might decide to create a dedicated allocation for a resource regardless. A dedi- cated allocation means that it is backed by a custom VkDeviceMemory allocation that is not shared with other resources. VMA lists when this occurs in its documentation. One specific case is that Vulkan can internally indicate when a dedicated allocation should be used for a certain resource (see VK KHR dedicated allocation, which was promoted to core Vulkan in version 1.1).
vVkBufferCreateInfo bufferInfo{}; 1 vbufferInfo.sType = VK_STRUCTURE_TYPE_BUFFER_CREATE_INFO; 2 vbufferInfo.size = aSize; 3 vbufferInfo.usage = aBufferUsage; 4
5 vVmaAllocationCreateInfo allocInfo{}; 6 vallocInfo.usage = aMemoryUsage; 7 8 vVkBuffer buffer = VK_NULL_HANDLE; 9
vVmaAllocation allocation = VK_NULL_HANDLE; 10 11 vif( auto const res = vmaCreateBuffer( aAllocator.allocator, &bufferInfo, &allocInfo,▽12
◃ &buffer, &allocation, nullptr ); VK_SUCCESS != res )
v{ 13 v vthrow Error( “Unable to allocate buffer.\n” 14 v v v”vmaCreateBuffer() returned %s”, to_string(res).c_str() 15 v v); 16 v} 17
18 vreturn Buffer( aAllocator.allocator, buffer, allocation ); 19
You are already familiar with the VkBufferCreateInfo structure and its parameters (Exercise 1.2). VMA accepts a few additional parameters relating to the underlying allocation via the VmaAllocationCreateInfo structure. Study the documentation. For now, the usage field of type VmaMemoryUsage is the most important one. With it, we specify how we intend to use the memory. The relevant options are:
VMA MEMORY USAGE GPU ONLY Memory (buffer) will be accessed from GPU only. Prefers a memory type that has the VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT flag set.
VMA MEMORY USAGE CPU TO GPU Memory (buffer) will be used to transfer data from the CPU/host to the GPU. Requires the underlying memory type to support VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT.
VMA MEMORY USAGE GPU TO CPU Memory (buffer) will be used to transfer data from the GPU to the CPU/host. Requires the underlying memory type to support VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT.
COMP5822M – Exercise 1.4 4
The documentation lists the above options as deprecated. It turns out that the Vulkan Memory Al- locator was updated in the period between development of Exercise 1.4’s code and the writing of this document. The version of the Vulkan Memory Allocator that is included in Exercise 1.4 does not support the new style (e.g. VMA_MEMORY_USAGE_AUTO).
Buffer barriers Previously, in Exercise 1.2, we benefited from synchronization provided by the subpass de- pendencies and a fence. The subpass dependencies in particular introduced a barrier that ensured that the rendered image was available to the subsequent copy of the image into the buffer.
The subpass dependencies define synchronization relating to the image attachments that are being rendered to in the render pass. To ensure that the copy from the host-visible staging buffer to the final GPU-only buffer finishes before we start reading data from the buffers during rendering, we must use a buffer barrier.
Buffer barriers, like all kinds of pipeline barriers, are issued with the vkCmdPipelineBarrier function. The function can issue multiple kinds of barriers in a single call. For now, we will focus on the buffer barriers, defined via the VkBufferMemoryBarrier structure.
Additional buffer barriers are required later, when dealing with uniform buffers. Exercise 1.4 therefore intro- duces a helper function, buffer_barrier. Declare the buffer_barrier function in labutils/vkutil.hpp:
A good overview of Vulkan synchronization and the different types of barriers can be found in this blog post.
vvoid buffer_barrier( 1 v vVkCommandBuffer, 2 v vVkBuffer, 3 v vVkAccessFlags aSrcAccessMask, 4 v vVkAccessFlags aDstAccessMask, 5 v vVkPipelineStageFlags aSrcStageMask, 6 v vVkPipelineStageFlags aDstStageMask, 7 v vVkDeviceSize aSize = VK_WHOLE_SIZE, 8 v vVkDeviceSize aOffset = 0, 9 v vuint32_t aSrcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED, 10 v vuint32_t aDstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED 11 v); 12
Buffer barriers apply to a specific buffer, identified by the VkBuffer argument. By default, the barrier will ap- ply to the whole buffer. The optional arguments aSize and aOffset can be used to narrow down the barrier to only a subset of the buffer’s contents. The access flags describe how the buffer was accessed before the barrier and how it will be accessed after the barrier. The pipeline stages describe (as with the semaphore synchro- nization in Exercise 1.3) which pipeline stages of commands recorded before the barrier must have completed (aSrcStageMask) before the pipeline stages of subsequent commands can commence (aDstStageMask). The final optional arguments (aSrcQueueFamilyIndex and aDstQueueFamilyIndex) can be used to transfer own- ership of a buffer from one queue family to another.
The implementation of the buffer_barrier helper looks as follows (labutils/vkutil.cpp):
void buffer_barrier( VkCommandBuffer aCmdBuff, VkBuffer aBuffer, VkAccessFlags ▽ 1 ◃ aSrcAccessMask, VkAccessFlags aDstAccessMask, VkPipelineStageFlags aSrcStageMask▽ ◃ , VkPipelineStageFlags aDstStageMask, VkDeviceSize aSize, VkDeviceSize aOffset, ▽ ◃ uint32_t aSrcQueueFamilyIndex, uint32_t aDstQueueFamilyIndex )
{2 vVkBufferMemoryBarrier bbarrier{}; 3 vbbarrier.sType = VK_STRUCTURE_TYPE_BUFFER_MEMORY_BARRIER; 4
vbbarrier.srcAccessMask
vbbarrier.dstAccessMask
vbbarrier.buffer
vbbarrier.size
vbbarrier.offset
vbbarrier.srcQueueFamilyIndex
vbbarrier.dstQueueFamilyIndex
= aSrcAccessMask; 5 = aDstAccessMask; 6 = aBuffer; 7 = aSize; 8 = aOffset; 9 = aSrcQueueFamilyIndex; 10 = aDstQueueFamilyIndex; 11
vvkCmdPipelineBarrier(
v vaCmdBuff,
v vaSrcStageMask, aDstStageMask, 15 v v0, 16
COMP5822M – Exercise 1.4 5
v v0, nullptr, 17 v v1, &bbarrier, 18 v v0, nullptr 19 v); 20
As with previous helper functions, buffer_barrier focuses on simplicity. The underlying vkCmdPipelineBarrier can be used to define multiple barriers in a single call. Our simplified helper function does not take advantage of this functionality.
Vertex buffer creation With this, we have the tools in place to create the vertex buffers. This exercises uses two separate buffers, one for vertex positions and one for per-vertex colors. The positions are identical to the ones used so far. As indicated earlier, Exercise 1.4 assumes a system with dedicated VRAM which is not fully host-visible. To create an on-GPU vertex buffer, the following steps are required:
1. Create on-GPU buffer
2. Create CPU/host-visible staging buffer
3. Place data into the staging buffer (std::memcpy)
4. Record commands to copy/transfer data from the staging buffer to the final on-GPU buffer 5. Record appropriate buffer barrier for the final on-GPU buffer
6. Submit commands for execution
For simplicity, Exercise 1.4 will use a separate function that performs the aforementioned steps. This function will create and destroy the necessary temporary resources. We must not destroy any resources while they are currently being used by pending/executing commands. A fence enables us to wait for the commands to finish before we destroy the resources.
Look at the vertex data.{hpp,cpp} sources. The header defines a ColorizedMesh structure that holds the relevant buffers. Additionally, it defines a create_triangle_mesh function, which we will need to implement.
Find the function definition in vertex data.cpp. The function already defines the relevant vertex-data data in a set of C/C++ arrays. Following the data definition, create the on-GPU buffers:
v// Create final position and color buffers 1 vlut::Buffer vertexPosGPU = lut::create_buffer( 2 v vaAllocator, 3 v vsizeof(positions), 4 v vVK_BUFFER_USAGE_VERTEX_BUFFER_BIT | VK_BUFFER_USAGE_TRANSFER_DST_BIT, 5 v vVMA_MEMORY_USAGE_GPU_ONLY 6 v); 7 vlut::Buffer vertexColGPU = lut::create_buffer( 8 v vaAllocator, 9 v vsizeof(colors), 10 v vVK_BUFFER_USAGE_VERTEX_BUFFER_BIT | VK_BUFFER_USAGE_TRANSFER_DST_BIT, 11 v vVMA_MEMORY_USAGE_GPU_ONLY 12 v); 13
Note the use of VMA_MEMORY_USAGE_GPU_ONLY. This indicates that VMA should try to use device local memory for the on-GPU buffer whenever possible.
After this, create the staging buffers, declaring the VMA_MEMORY_USAGE_CPU_TO_GPU usage instead:
vlut::Buffer posStaging = lut::create_buffer( 1 v vaAllocator, 2 v vsizeof(positions), 3 v vVK_BUFFER_USAGE_TRANSFER_SRC_BIT, 4 v vVMA_MEMORY_USAGE_CPU_TO_GPU 5 v); 6 vlut::Buffer colStaging = lut::create_buffer( 7 v vaAllocator, 8 v vsizeof(colors), 9 v vVK_BUFFER_USAGE_TRANSFER_SRC_BIT, 10 v vVMA_MEMORY_USAGE_CPU_TO_GPU 11 v); 12
COMP5822M – Exercise 1.4 6
The staging buffers are CPU/host-visible. We can fill them by mapping the buffers to retrieve a normal C/C++ pointer to their contents, copying the data to this pointer and then unmapping the buffers again. We previously used vkMapMemory/vkUnmapMemory for this; with VMA, we must instead use VMA’s vmaMapMemory and
vmaUnmapMemory:
vvoid* posPtr = nullptr; 1 vif( auto const res = vmaMapMemory( aAllocator.allocator, posStaging.allocation, &▽ 2
◃ posPtr ); VK_SUCCESS != res )
v{ 3 v vthrow lut::Error( “Mapping memory for writing\n” 4 v v v”vmaMapMemory() returned %s”, lut::to_string(res).c_str() 5 v v); 6 v} 7 vstd::memcpy( posPtr, positions, sizeof(positions) ); 8 vvmaUnmapMemory( aAllocator.allocator, posStaging.allocation ); 9
10 vvoid* colPtr = nullptr; 11 vif( auto const res = vmaMapMemory( aAllocator.allocator, colStaging.allocation, &▽ 12
◃ colPtr ); VK_SUCCESS != res )
v{ 13 v vthrow lut::Error( “Mapping memory for writing\n” 14 v v v”vmaMapMemory() returned %s”, lut::to_string(res).c_str() 15 v v); 16 v} 17 vstd::memcpy( colPtr, colors, sizeof(colors) ); 18 vvmaUnmapMemory( aAllocator.allocator, colStaging.allocation ); 19
The next step is preparation for issuing the transfer commands that copy data from the staging buffers to the final on-GPU buffers. For this, we create a fence, a temporary command pool, and allocate a command buffer from the command pool with the utilities we have introduced in previous exercises:
v// We need to ensure that the Vulkan resources are alive until all the 1 v// transfers have completed. For simplicity, we will just wait for the 2 v// operations to complete with a fence. A more complex solution might want 3 v// to queue transfers, let these take place in the background while 4 v// performing other tasks. 5 vlut::Fence uploadComplete = create_fence( aContext ); 6
7 v// Queue data uploads from staging buffers to the final buffers 8 v// This uses a separate command pool for simplicity. 9
vlut::CommandPool uploadPool = create_command_pool( aContext ); 10 vVkCommandBuffer uploadCmd = alloc_command_buffer( aContext, uploadPool.handle ); 11
We then record the copy commands into the command buffer:
vVkCommandBufferBeginInfo beginInfo{}; 1 vbeginInfo.sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO; 2 vbeginInfo.flags = 0; 3 vbeginInfo.pInheritanceInfo = nullptr; 4
5 vif( auto const res = vkBeginCommandBuffer( uploadCmd, &beginInfo ); VK_SUCCESS != ▽ 6
v{ 7 v vthrow lut::Error( “Beginning command buffer recording\n” 8 v v v”vkBeginCommandBuffer() returned %s”, lut::to_string(res).c_str() 9 v v); 10 v} 11
12 vVkBufferCopy pcopy{}; 13 vpcopy.size = sizeof(positions); 14 15 vvkCmdCopyBuffer( uploadCmd, posStaging.buffer, vertexPosGPU.buffer, 1, &pcopy ); 16 17 vlut::buffer_barrier( uploadCmd, 18 v vvertexPosGPU.buffer, 19 v vVK_ACCESS_TRANSFER_WRITE_BIT, 20 v vVK_ACCESS_VERTEX_ATTRIBUTE_READ_BIT, 21
COMP5822M – Exercise 1.4 7
v vVK_PIPELINE_STAGE_TRANSFER_BIT, 22 v vVK_PIPELINE_STAGE_VERTEX_INPUT_BIT 23 v); 24
25 vVkBufferCopy ccopy{}; 26 vccopy.size = sizeof(colors); 27 28 vvkCmdCopyBuffer( uploadCmd, colStaging.buffer, vertexColGPU.buffer, 1, &ccopy ); 29 30 vlut::buffer_barrier( uploadCmd, 31 v vvertexColGPU.buffer, 32 v vVK_ACCESS_TRANSFER_WRITE_BIT, 33 v vVK_ACCESS_VERTEX_ATTRIBUTE_READ_BIT, 34 v vVK_PIPELINE_STAGE_TRANSFER_BIT, 35 v vVK_PIPELINE_STAGE_VERTEX_INPUT_BIT 36 v); 37 38 vif( auto const res = vkEndCommandBuffer( uploadCmd ); VK_SUCCESS != res ) 39 v{ 40 v vthrow lut::Error( “Ending command buffer recording\n” 41 v v v”vkEndCommandBuffer() returned %s”, lut::to_string(res).c_str() 42 v v); 43 v} 44
The code uses two barriers, one for each of the final on-GPU buffers. The barriers declare that the results of the transfer operations (VK_ACCESS_TRANSFER_WRITE_BIT and VK_PIPELINE_STAGE_TRANSFER_BIT) must be visible to subsequent uses of the buffer when reading vertex attribute data from it (as indicated by the VK_ACCESS_VERTEX_ATTRIBUTE_READ_BIT access mash) ahead of ve
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com