Lecture 2:
Vulkan, Part 2 (Memory and resources)
COMP5822M – High Perf. Graphics
Copyright By PowCoder代写 加微信 powcoder
– WhatisVulkan
– StructureofVulkanAPI – Volk
COMP5822M – High Perf. Graphics
– Memoryanddata
– Next: Commands and pipelines
COMP5822M – High Perf. Graphics
First – from last time
– sType and pNext. – Whyandwhat?
COMP5822M – High Perf. Graphics
First – from last time
– sType and pNext. – Whyandwhat?
– sType: structure type
VkPhysicalDeviceFeatures2 feats{};
feats.sType = VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_FEATURES_2;
COMP5822M – High Perf. Graphics
Why do we need this?
– Compiler knows the type of the struct – Functionrequiresthattype
– Both caller and callee agree on this
COMP5822M – High Perf. Graphics
Why do we need this?
– Compiler knows the type of the struct – Functionrequiresthattype
– Both caller and callee agree on this
– Two 1.5 reasons:
– Extensions and pNext
– (alternative way to define extensions)
COMP5822M – High Perf. Graphics
Why do we need this?
– pNext: void const* pointer
– Points to “next” structure
– Pointer is void – we don’t know the type
– Hence: sType
COMP5822M – High Perf. Graphics
Extensions / sType, .pNext
VkPhysicalDeviceVulkan12Features featsV12{};
featsV12.sType = VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_VULKAN12_FEATURES; …
VkPhysicalDeviceFeatures2 feats{};
feats.sType = VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_FEATURES_2; feats.pNext = &featsV12;
vkGetPhysicalDeviceFeatures2( physicalDevice, &feats );
COMP5822M – High Perf. Graphics
Extensions / sType, .pNext
VkPhysicalDeviceVulkan12Features featsV12{};
featsV12.sType = VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_VULKAN12_FEATURES; …
VkPhysicalDeviceFeatures2 feats{};
feats.sType = VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_FEATURES_2; feats.pNext = &featsV12;
vkGetPhysicalDeviceFeatures2( physicalDevice, &feats );
Forms a single-linked list.
COMP5822M – High Perf. Graphics
pNext is a void* pointer. Vulkan needs the sType field to know which type of additional structure is attached!
Today: doing stuff with Vulkan
COMP5822M – High Perf. Graphics
Today: doing stuff with Vulkan
– Commands sent to device
– Commands take input and generate output
COMP5822M – High Perf. Graphics
Today: doing stuff with Vulkan
– Example:vkCmdCopyBuffer()
– Copy from one buffer to another – Input:Vulkanbuffer(VkBuffer)
– Output:Vulkanbuffer
COMP5822M – High Perf. Graphics
Today: doing stuff with Vulkan
– Example:vkCmdDraw()
– Drawprimitives
– Input:Vertices,uniforms,textures,… – Output:Framebuffer
COMP5822M – High Perf. Graphics
Today: doing stuff with Vulkan
– Example:vkCmdDraw()
– Drawprimitives
– Input:Vertices,uniforms,textures,…
– Output:Framebuffer
– How are the primitives drawn?
– What primitives do we use?
– Defined by a (graphics) pipeline object.
COMP5822M – High Perf. Graphics
Today: doing stuff with Vulkan
– Compute is similar
– Compute pipeline object instead – Simpler…
COMP5822M – High Perf. Graphics
Running a Vulkan computation
– Computation:Command(&Pipeline)
– Input data
– Output data
– Submit to Queue
– Record in command buffer – Synchronization…
COMP5822M – High Perf. Graphics
– Setupdata
– Allocate inputs & initialize
– Allocate space for outputs
– Setuppipeline
– Shader modules + config
– Describe inputs & outputs
– Generatecommandbuffer – Submit
COMP5822M – High Perf. Graphics
– Setupdata
– Allocate inputs & initialize
– Allocate space for outputs
– Setuppipeline
– Shader modules + config
– Describe inputs & outputs
– Generatecommandbuffer – Submit
COMP5822M – High Perf. Graphics
“Once” (Cold path)
Many times (Hot path)
– Soundseasyenough…
COMP5822M – High Perf. Graphics
– Soundseasyenough…
COMP5822M – High Perf. Graphics
– Soundseasyenough…
COMP5822M – High Perf. Graphics
Instance and Devices
– Vulkaninstance
– Per-application state and data
– Similar to OpenGL Context
– Not a global, though
– VkInstanceobject
– First Vulkan object created
– Last Vulkan object alive
COMP5822M – High Perf. Graphics
Instance and Devices
– Per-instanceoptions
– E.g. layers and extensions
COMP5822M – High Perf. Graphics
Instance and Devices
– Physicaldevicevslogicaldevice
– Physicaldevice(VkPhysicalDevice)
– Corresponds to a Vulkan implementation
(“device”)
– E.g. a GPU will show up as a physical device
– Listphysicaldevicesandpickone
– Logical device: instance of a physical device
COMP5822M – High Perf. Graphics
Instance and Devices
– Logicaldevice(VkDevice)
– Instance of a Vulkan implementation
– Owns state and resources for that device
– Independent of other logical devices
– Can have multiple logical devices
– Multiple instances of same physical device
COMP5822M – High Perf. Graphics
Instance and Devices
– Topic of Exercise 1.1
– Centralobject:VkDevice
– Need the others to get there
– The mostly forget about them
– Most objects will belong to a logical device – “Created on that device”
COMP5822M – High Perf. Graphics
– Soundseasyenough…
COMP5822M – High Perf. Graphics
– Soundseasyenough…
COMP5822M – High Perf. Graphics
Surfaces and Swapchains
– Windowsystemintegration
– VkSurfaceKHR: ~ OS window
– VkSwapchainKHR: on-device framebuffer
resources (images)
– KHR suffix: defined in an extension
– VK_KHR_surface: instance extension
– VK_KHR_swapchain: device extension
COMP5822M – High Perf. Graphics
Surfaces and Swapchains
– Need this if want to render to a window – Don’t need it if we’re happy to render
headless
– Exercise1.2:Headlessrendering
– Focus on the graphics pipeline
– Exercise1.3:Windowedrendering – Focus on WSI + introduce GLFW.
COMP5822M – High Perf. Graphics
– Soundseasyenough…
COMP5822M – High Perf. Graphics
– Soundseasyenough…
COMP5822M – High Perf. Graphics
VK_EXT_debug_utils
– API to receive validation messages – And related: name objects, …
– Usedwithstandardvalidation
– VK_LAYER_KHRONOS_validation
– Exercise1.1
– Always enable it when developing Vulkan
– Will let you know if you do something illegal
(as defined by the spec)
– Catches errors early.
COMP5822M – High Perf. Graphics
– Soundseasyenough…
COMP5822M – High Perf. Graphics
– Soundseasyenough…
COMP5822M – High Perf. Graphics
Memory and data
– Today’smaintopic.
COMP5822M – High Perf. Graphics
Memory and data
COMP5822M – High Perf. Graphics
Memory and data
– Where to put our data? – How do we get it there?
– E.g. VRAM?
COMP5822M – High Perf. Graphics
Memory Heaps and Types
– SystemmemoryvsVRAM?
– Vulkan:perdevice – MemoryHeaps – MemoryTypes
COMP5822M – High Perf. Graphics
– Soundseasyenough…
COMP5822M – High Perf. Graphics
Memory Heap
– Heap:“sourceofmemory” – System RAM
– Special memory regions
COMP5822M – High Perf. Graphics
Memory Heap
– Heap:“sourceofmemory” – System RAM
– Special memory regions
– Simple case: single heap (=RAM)
– dGPU: likely multiple heaps
COMP5822M – High Perf. Graphics
Memory Type
– Type:“usagescenario”
– Vulkan resources require certain type(s)
– Different properties
– Memory comes from one of the heaps
– VkMemoryPropertyFlags:
– VK_MEMORY_PROPERTY_…_BIT (bitfield)
– DEVICE_LOCAL: “Most efficient for device”
– HOST_VISIBLE: Can be accessed by CPU
– HOST_CACHED: Cached on CPU 2021/2022
COMP5822M – High Perf. Graphics
Example: RTX 2070, Windows
GeForce RTX 2070: 3 Heaps
– heap 0: 8031 MBytes, DEVICE_LOCAL – heap 1: 16328 MBytes, (no flags)
– heap 2: 214 MBytes, DEVICE_LOCAL
GeForce RTX 2070: 5 memory types
– type 0: from heap 1, (no flags)
– type 1: from heap 0, DEVICE_LOCAL
– type 2: from heap 1, HOST_VISIBLE HOST_COHERENT
– type 3: from heap 1, HOST_CACHED HOST_VISIBLE HOST_COHERENT – type 4: from heap 2, DEVICE_LOCAL HOST_VISIBLE HOST_COHERENT
COMP5822M – High Perf. Graphics
Example: RTX 2070, Windows
GeForce RTX 2070: 3 Heaps
– heap 0: 8031 MBytes, DEVICE_LOCAL – heap 1: 16328 MBytes, (no flags)
– heap 2: 214 MBytes, DEVICE_LOCAL
VRAM (~8GB)
System Memory (Half of it anyway; I get 75% on Linux!)
Special region of VRAM (Host accessible)
GeForce RTX 2070: 5 memory types
– type 0: from heap 1, (no flags)
– type 1: from heap 0, DEVICE_LOCAL
– type 2: from heap 1, HOST_VISIBLE HOST_COHERENT
– type 3: from heap 1, HOST_CACHED HOST_VISIBLE HOST_COHERENT – type 4: from heap 2, DEVICE_LOCAL HOST_VISIBLE HOST_COHERENT
COMP5822M – High Perf. Graphics
Tangent: PCIe BAR
– Heap2:VRAMaccessiblethroughPCIe
– 256 MB (modulo rounding errors)
– Fixed size PCIe BAR (Base Address Register)
– AMDannounced“SmartAccessMemory”support
– Actually a standard feature (resizable BAR, >= PCIe 3.0)
– With resizable BAR => Full VRAM available to CPU
– I.e., as DEVICE_LOCAL | HOST_VISIBLE
– Recent motherboard+BIOS+GPU (only RTX30xx)? Might have it.
COMP5822M – High Perf. Graphics
Allocating memory
– vkAllocateMemory() / vkFreeMemory() – VkDeviceMemory
– Decidewhichmemorytype!
– Limited by Vulkan resource memory is for
– Usage scenario…
COMP5822M – High Perf. Graphics
Allocation limits
– MaximumnumberofvkAllocateMemory() – See maxMemoryAllocationCount in
VkPhysicalDeviceLimits (member of VkPhysicalDeviceProperties)
– Windows: only 4096! (Linux: ~4G)
COMP5822M – High Perf. Graphics
Allocation limits
– Suballocate
– GetlargeVkDeviceMemorychunk
– Subdivideitforvariousresources
– VulkanMemoryAllocator(AMD,Github) – We will use this from Exercise 1.4 and
COMP5822M – High Perf. Graphics
Allocation limits
– Suballocate
– GetlargeVkDeviceMemorychunk
– Subdivideitforvariousresources
– VulkanMemoryAllocator(AMD,Github) – We will use this from Exercise 1.4 and
COMP5822M – High Perf. Graphics
Source: 2020 Vulkan Ecosystem Survey https://www.lunarg.com/wp-content/uploads/2021/01/2020-Vulkan- Ecosystem-SDK-Survey-Results.pdf
Using VkDeviceMemory memory
– vkMapMemory() / vkUnmapMemory()
– Makes memory visible on CPU
– Only for HOST_VISIBLE!
– Get “raw” C pointer
– Storagespace(“backing”)forresources
– Buffers, Images, …
– vkBindBufferMemory(), …
COMP5822M – High Perf. Graphics
– DEVICE_LOCAL memory is (typically) faster – NotalwaysHOST_VISIBLE
– How do we get data in there?
– Stagingbuffer+copy
COMP5822M – High Perf. Graphics
– CreatebufferinHOST_VISIBLE
– Copy to resource with DEVICE_LOCAL mem
– vkCmdCopyBuffer()
– vkCmdCopyBufferToImage()
COMP5822M – High Perf. Graphics
Not staging
– Easieroptionforsmallbuffers – E.g. uniform data
– vkCmdUpdateBuffer()
– Up to 64k max
– Uses command buffer memory
– Use sparingly?
– (Don’t loop it to copy more data.)
COMP5822M – High Perf. Graphics
Resources: Buffers and Images
– VkDeviceMemory: “raw” allocation – Can’t really use directly
– VkBuffer: untyped buffer / array – VkImage: “Image”
– Formatted and typed pixel array – Commonly2D
– 1D,3D,cube,arrays,…
COMP5822M – High Perf. Graphics
– Soundseasyenough…
COMP5822M – High Perf. Graphics
– Per-vertexdata(positions,normals,…)
– Indices(indexedmeshes)
– Uniformdata
– Storage buffers (read-write in shaders) -…
COMP5822M – High Perf. Graphics
Buffer creation
– Createbufferobject
vkCreateBuffer() & VkBufferCreateInfo
– Findmemorytype
vkGetBufferMemoryRequirements[2]()
– Allocate Memory
vkAllocateMemory() & VkMemoryAllocateInfo
– BindMemoryToBuffer
vkBindBufferMemory[2]() & VkBindBufferMemoryInfo
COMP5822M – High Perf. Graphics
Buffer Creation
– Must specify how the buffer will be used
– Bitfield of VkBufferUsageFlags, VK_BUFFER_USAGE_*_BIT
– TRANSFER_SRC, TRANSFER_DST: source/destination of copy
– UNIFORM_BUFFER: used as uniform buffer
– VERTEX_BUFFER: used to source per-vertex data
– INDEX_BUFFER: used to source indices
– STORAGE_BUFFER: similar to UNIFORM_BUFFER, larger, read-write -…
COMP5822M – High Perf. Graphics
Buffer Creation
– SharingMode
– Important if buffer is accessed from multiple queues
– EXCLUSIVE: no concurrent access (must manually transfer ownership)
– CONCURRENT: multiple queues can access concurrently
– Only one queue => EXCLUSIVE
“VK_SHARING_MODE_CONCURRENT may result in lower performance access to the buffer or image than VK_SHARING_MODE_EXCLUSIVE.”
COMP5822M – High Perf. Graphics
Why all these separate steps?
– Flexibility
– Example:Canaliasmemory
– I.e., use the same memory for
two different buffers/images
– Must not be used at the same time!
– Requires extra synchronization.
COMP5822M – High Perf. Graphics
– Imagesusedfor – Textures
– Render targets (color buffer, depth buffer, …) -…
COMP5822M – High Perf. Graphics
Image Creation
– Similar to buffer creation
– Need to specify
– Image type: TYPE_1D, TYPE_2D, TYPE_3D
– Size (“extent”)
– Sampling options (mipmap, multisampling)
– Initial Layout
– Usage, sharing mode
COMP5822M – High Perf. Graphics
Image Format
– VK_FORMAT_{COMPONENTS}_{SUFFIX}
– Example: VK_FORMAT_R8G8B8A8_SRGB
– Very long list of different formats
– Moreexamples:
– R8G8B8_*: tempting, typically not an
option (3 byte alignment!)
– R8G8B8A8_*, B8G8R8A8_*:
(what you use instead)
– D32_SFLOAT: depth buffer
COMP5822M – High Perf. Graphics
Image Format
– Suffixes?
– Interpretation of data (e.g. R8 = 8 bit value)
– _SINT: signed integer, R8: [-127,127]
– _UINT: unsigned integer, R8 = [0,255]
– _UNORM: value mapped to [0, 1] (R8: 255 steps between 0, 1)
– _SNORM: value mapped to [-1, 1] (R8: 255 steps between -1, 1)
– _SRGB: value mapped to [0, 1], but with sRGB transformation
– _SFLOAT: data is interpreted as IEEE floating point value
COMP5822M – High Perf. Graphics
Image Format
– RGBAformats:generalpurpose
– I.e., not just color data
– Examples: normal maps
– Compressed formats
– Planar formats -…
COMP5822M – High Perf. Graphics
Image Tiling
– Memorylayout
– VK_IMAGE_TILING_OPTIMAL – VK_IMAGE_TILING_LINEAR
COMP5822M – High Perf. Graphics
Image Tiling
– Memorylayout
– VK_IMAGE_TILING_OPTIMAL vs VK_IMAGE_TILING_LINEAR
COMP5822M – High Perf. Graphics
Image Tiling
– Memorylayout
– VK_IMAGE_TILING_OPTIMAL vs VK_IMAGE_TILING_LINEAR
Memory layout (Linear)
COMP5822M – High Perf. Graphics
Image Tiling
– Memorylayout
– VK_IMAGE_TILING_OPTIMAL vs VK_IMAGE_TILING_LINEAR
Memory layout (Optimal)
Memory layout (Linear)
COMP5822M – High Perf. Graphics
Image Tiling
– Memorylayout
– VK_IMAGE_TILING_OPTIMAL vs VK_IMAGE_TILING_LINEAR
(Possible mem. layout, example only! Actual layout unknown, determined by GPU/driver!)
Memory layout (Optimal)
Memory layout (Linear)
COMP5822M – High Perf. Graphics
Image Tiling
– UseTILING_OPTIMALwheneverpossible
COMP5822M – High Perf. Graphics
Image Usage
– VkImageUsageFlags, VK_IMAGE_USAGE_*_BIT
– TRANSFER_SRC, TRANSFER_DST: source/destination of copy
– SAMPLED: used as a texture
– COLOR_ATTACHMENT: color render target
– DEPTH_STENCIL_ATTACHMENT: depth buffer (depth+stencil buffer)
COMP5822M – High Perf. Graphics
Hardware support
– Onlycertainformat/usage/tiling combinations are valid
– This depends on hardware
– Texture sampling may use special
hardware units
– Same for writing to images during
rendering (=> blending!)
COMP5822M – High Perf. Graphics
Required Image support
– Certain combinations of format/usage/tiling are mandatory
– https://www.khronos.org/registry/vulkan/specs/1.0/html/chap34.htm
l#features-required-format-support
– Highlights
– R8G8B8A8_{SRGB,UNORM}: use as texture, color attachment
– Note: not for compute pipeline use!
– D16_UNORM mandatory for depth, but 16-bit depth is a bit low
– Either of X8_D24_UNORM_PACK32, D32_SFLOAT
COMP5822M – High Perf. Graphics
Image Layout
– Images have a “layout”
– Operations require correct layout
– Operations change the layout (“transition”)
– Canmanuallytransitionlayout
– Image Barrier
– Initiallayout:UNDEFINED(orPREDEFINED)
COMP5822M – High Perf. Graphics
Image Layout
– Layouts:
– UNDEFINED: cannot be used, image contents are undefined
– TRANSFER_DST_OPTIMAL: can be copied to
– TRANSFER_SRC_OPTIMAL: can be copied from
– COLOR_ATTACHMENT_OPTIMAL: can be rendered to
– SHADER_READ_ONLY_OPTIMAL: can be read from in shader (texture) – DEPTH_STENCIL_ATTACHMENT_OPTIMAL: use as depth/stencil buffer -…
– PRESENT_SRC_KHR: can be used to present
COMP5822M – High Perf. Graphics
Image Layout
– Layouts:
– GENERAL: all types of device access,
may be suboptimal
COMP5822M – High Perf. Graphics
Example: Texture
– Create Image
– Memory type with DEVICE_LOCAL
– Tiling: OPTIMAL
– Usage: TRANSFER_DST | SAMPLED_BIT
– Layout: UNDEFINED
– Transition image from UNDEFINED to TRANSFER_DST
– Staging buffer (TRANSFER_SRC)
– Fill with data, and copy to image
(vkCmdCopyBufferToImage())
– Transition image from TRANSFER_DST
to SHADER_READ_ONLY_OPTIMAL 2021/2022
COMP5822M – High Perf. Graphics
– Whytheextrabookkeeping?
– Optimization hint, may affect performance
– Could affect how images are stored in memory
– AMD:caresaboutimagelayouts
– Compress/decompress/rearrange
during transitions
– Using wrong layouts/GENERAL listed
under “Common Mistakes”
– NVIDIA:doesn’tcare
– Lists using GENERAL under “Good Practices”
– I use the correct layouts and avoid GENERAL
COMP5822M – High Perf. Graphics
Thank you for your attention.
COMP5822M – High Perf. Graphics
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com