Lecture 15:
Data, data wrangling & optimizations
COMP5822M – High Perf. Graphics
Copyright By PowCoder代写 加微信 powcoder
– 2nd to last lecture
– Thursday=lastscheduledlecture
– Real-time ray tracing overview
– If time: Mesh Shaders
COMP5822M – High Perf. Graphics
– Vulkan API, hardware & software concerns
– Commands, command execution, synch.
– The Graphics Pipeline
– Resources and passing them around
– Framebuffers/rasterization – Presentation
– Anti-aliasing, blending, masking
COMP5822M – High Perf. Graphics
– Texturing
– Attribute mapping
– Example: normal mapping
– Vulkan details
– Texture compression & virtual textures
– BRDFs, μ-facets
– Direct and indirect lighting techniques
– AO, IBL, Lightmaps, LPVs, …
COMP5822M – High Perf. Graphics
Recap, III
– Render-to-texture
– Deferred shading
– Post-processing effects
– Shadow volumes
– Shadow maps
– Shadow mapping implementation
COMP5822M – High Perf. Graphics
– Datawrangling
– GLSL layout details, uniforms etc.
– Optimizations (meshes, formats)
– Special case: “terrain”
– Optimization
– LOD, billboards & impostors
– Intro to acceleration structures
COMP5822M – High Perf. Graphics
Uniform data
– Vulkan:uniformbufferobjects(UBOs)
– Similar to modern OpenGL (>= 3.1)
– No option for legacy “slot”/register model
– Shaderstoragebufferobjects(SSBOs)
– Also in OpenGL (>=4.3)
– Larger than UBOs, can be read-write
– Maybe slower than UBOs?
COMP5822M – High Perf. Graphics
Limits – Maximum Supported Sizes
– Required: 16kB
– Typical: 64kB
– Required: 128MB – Typical: ~4GB
COMP5822M – High Perf. Graphics
// Fragment Shader, GLSL
layout( set = 1, binding = 0 ) uniform UModel {
vec3 ambientColor; vec3 diffuseColor; float baseAlpha;
COMP5822M – High Perf. Graphics
// Fragment Shader, GLSL
layout( set = 1, binding = 0 ) uniform UModel {
vec3 ambientColor; vec3 diffuseColor; float baseAlpha;
COMP5822M – High Perf. Graphics
// Host, C++
struct UModel {
glm::vec3 ambient; glm::vec3 diffuse; float alpha;
// Fragment Shader, GLSL
layout( set = 1, binding = 0 ) uniform UModel {
Different Memory Layouts!
vec3 ambientColor; vec3 diffuseColor; float baseAlpha;
COMP5822M – High Perf. Graphics
// Host, C++
struct UModel {
glm::vec3 ambient; glm::vec3 diffuse; float alpha;
// Fragment Shader, GLSL
layout( set = 1, binding = 0 ) uniform UModel {
vec3 ambientColor; vec3 diffuseColor; float baseAlpha;
COMP5822M – High Perf. Graphics
Vulkan: Default for UBOs
// Fragment Shader, GLSL
layout( set = 1, binding = 0, std140 ) uniform UModel {
vec3 ambientColor; vec3 diffuseColor; float baseAlpha;
COMP5822M – High Perf. Graphics
Memory Layouts for Interface Blocks
– Introduced in GLSL 1.40 (=OpenGL 3.1)
– Specifies memory layout for UBO
– Introduced in GLSL 4.30 (=OpenGL 4.3)
– Mainly SSBOs
– “scalar” (Vulkan 1.2; GL_/ VK_EXT_scalar_block_layout)
– “shared”/“packed” (OpenGL only)
COMP5822M – High Perf. Graphics
– Seedocsfordetails
https://www.khronos.org/registry/OpenGL/specs/gl/glspec45.core.pdf#page=159
(OpenGL docs? Yes. Same rules, but more readable. Also applies to OpenGL UBOs)
– Everything is aligned/padded to 4, 8 or 16 bytes (a float, vec2 or vec4)
COMP5822M – High Perf. Graphics
// Fragment Shader, GLSL
layout( set = 1, binding = 0, std140 ) uniform UModel {
vec3 ambientColor; vec3 diffuseColor; float baseAlpha;
// Host, C++
struct UModel
glm::vec3 ambient; glm::vec3 diffuse; float alpha;
COMP5822M – High Perf. Graphics
// Fragment Shader, GLSL
layout( set = 1, binding = 0, std140 ) uniform UModel {
vec3 ambientColor; vec3 diffuseColor; float baseAlpha;
// Host, C++
struct UModel
glm::vec3 ambient; glm::vec3 diffuse; float alpha;
Problem: “diffuse” needs to start on 16 byte boundary.
COMP5822M – High Perf. Graphics
// Fragment Shader, GLSL
layout( set = 1, binding = 0, std140 ) uniform UModel {
vec3 ambientColor; vec3 diffuseColor; float baseAlpha;
// Host, C++
struct UModel
glm::vec3 ambient; float pad0_; glm::vec3 diffuse;
float alpha;
COMP5822M – High Perf. Graphics
// Fragment Shader, GLSL
layout( set = 1, binding = 0, std140 ) uniform UModel {
vec3 ambientColor; vec3 diffuseColor; float baseAlpha;
// Host, C++
struct UModel
glm::vec3 ambient; float pad0_; glm::vec3 diffuse; float pad1_; float alpha;
COMP5822M – High Perf. Graphics
// Fragment Shader, GLSL
layout( set = 1, binding = 0, std140 ) uniform UModel {
vec3 ambientColor; vec3 diffuseColor; float baseAlpha;
// Host, C++
struct UModel
glm::vec3 ambient; float pad0_; glm::vec3 diffuse; float pad1_; float alpha;
No padding, float only required 4 bytes alignment.
COMP5822M – High Perf. Graphics
// Fragment Shader, GLSL
layout( set = 1, binding = 0, std140 ) uniform UModel {
vec3 ambientColor; vec3 diffuseColor; float baseAlpha;
// Host, C++
struct UModel
glm::vec3 ambient; float pad0_; glm::vec3 diffuse;
float alpha; ✓
COMP5822M – High Perf. Graphics
// Fragment Shader, GLSL
layout( set = 1, binding = 0, std140 ) uniform UModel {
vec3 ambientColor; vec3 diffuseColor; float baseAlpha;
// Host, C++
struct UModel
alignas(16) glm::vec3 ambient; alignas(16) glm::vec3 diff✓use; alignas(4) float alpha;
COMP5822M – High Perf. Graphics
// Fragment Shader, GLSL
layout( set = 1, binding = 0, std140 ) uniform UModel {
vec3 ambientColor; vec3 diffuseColor; float baseAlpha;
// Host, C++
struct UModel
/*alignas(16)*/ glm::vec3 ambient; alignas(16) glm::vec3 diff✓use; /*alignas(4)*/ float alpha;
COMP5822M – High Perf. Graphics
// Fragment Shader, GLSL
layout( set = 1, binding = 0, std140 ) uniform UModel {
vec3 ambientColor; vec3 diffuseColor; float baseAlpha;
// Host, C++
struct UModel
COMP5822M – High Perf. Graphics
// WARNING: UModel must match the memory layout of the // UModel UBO defined in shader_inputs.glsl!
alignas(16) glm::vec3 ambient; alignas(16) glm::vec3 diffuse; float alpha;
– Has led to people recommending against using vec3 (and other 3-wide types) in interface blocks.
COMP5822M – High Perf. Graphics
– Seedocsfordetails
https://www.khronos.org/registry/vulkan/specs/1.3- extensions/html/vkspec.html#interfaces-alignment- requirements
– Mostly matches C++ struct packing
COMP5822M – High Perf. Graphics
// Fragment Shader, GLSL
#extension GL_EXT_scalar_block_layout : require
layout( set = 1, binding = 0, scalar ) uniform UModel {
vec3 ambientColor; vec3 diffuseColor; float baseAlpha;
// Host, C++
struct UModel
glm::vec3 ambient; glm::vec3 diffuse; ✓ float alpha;
COMP5822M – High Perf. Graphics
// Fragment Shader, GLSL
#extension GL_EXT_scalar_block_layout : require
layout( set = 1, binding = 0, scalar ) uniform UModel {
vec3 ambientColor; vec3 diffuseColor; float baseAlpha;
// Host, C++
struct UModel
glm::vec3 ambient; glm::vec3 diffuse; ✓ float alpha;
COMP5822M – High Perf. Graphics
Note: need to enable scalar block layouts when creating device via Vulkan 1.2 VkPhysicalDeviceVulkan12Features::scalarBlockLayout
or via VK_EXT_scalar_block_layout +
VkPhysicalDeviceScalarBlockLayoutFeatures::scalarBlockLayout
(97+% of Vulkan 1.2 devices support scalar block layouts)
Data format
– Generalconsideration
– Smaller is better (typically)
– But respect alignment
(32-bit, 64-bit, 128-bit)
– E.g. R8G8B8 vs R8G8B8A8.
COMP5822M – High Perf. Graphics
Data format
– Vertexexample-Normals
– Naïve: 3x float (x, y, z) = 96 bits
– Lower res: 3x byte (x,y,z) = 24 bits “=“ 32 bits
– Spherical coordinates: 2x float16 = 32 bits
– Octahedron-normal: 2x float16 = 32 bits
(But better precision than spherical coords.)
– See “On Floating-Point Normal Vectors”
Q. Meyer, J. Sussmuth, G. Sussner, M. Stamminger (2010)
COMP5822M – High Perf. Graphics
Data format
– Vertexexample–tangentframe
– Naïve: 3x vec3 = 3*96 bits = 288 bits
– Orthonormal: vec3 + vec4 = 244 bits
– Quaternion: vec4 = 128 bits
– Quaternion: R10G10B10_A2 = 32bits
– See “The BitSquid low-level animation system”
https://bitsquid.blogspot.com/2009/11/bitsquid-low- level-animation-system.htm
COMP5822M – High Perf. Graphics
Data format
– Lower precision position?
– Lower precision texture coords? – ..
– Example: “Vertex Formats Part 1: Compression”
https://www.yosoygames.com.ar/wp/2018/03/vertex- formats-part-1-compression/
COMP5822M – High Perf. Graphics
– Useindexedmeshes
“Triangle soup”:
[ v0, v1, v2, v0, v2, v3 ] 6 vertices
Indexed mesh:
[ v0, v1, v2, v3 ] + [0, 1, 2, 0, 2, 3 ] 4 vertices + 6 indices
COMP5822M – High Perf. Graphics
– Indexedmeshexample:
– Naïve vertex format: 48 bytes/vert
– Triangle soup: 316350 verts
=> 14.48 MB
– Indexed: 75778 unique verts + 316350 indices
=> 3.47 MB + 1.21 MB = 4.68 MB
COMP5822M – High Perf. Graphics
– Post-transformcache
– Caches results of vertex shader
– Reuses if same vertex is repeated
– Vertices:identifiedbyindex+instanceID
– Post-transform cache only works for indexed
rendering!
– Triangle soup: each vertex has an unique
COMP5822M – High Perf. Graphics
– Post-transformcachelimited – E.g. ~20 or so vertices
– Optimizations:
– Reorder mesh such that repeated use of
vertex is “close by”
COMP5822M – High Perf. Graphics
– Lookate.g.
https://github.com/zeux/meshoptimizer
– Otheroptimizations:
– Reorder mesh to minimize overdraw
– E.g., want front-to-back for opaque – Outside-to-inside?
– +Maximize cache utilization for vertex fetch.
COMP5822M – High Perf. Graphics
Special meshes
– Generate meshes on the fly
– Minimal example: triangle in Ex 1.2 etc.
– Moreuseful:
– Terrain / height maps / height fields / …
COMP5822M – High Perf. Graphics
– Compute XY position in shader – Based on vertex index
– Getheightfromheightmap
– E.g. vertex texture
– Or compute procedurally?
COMP5822M – High Perf. Graphics
Src: , High Perf. Graphics.
– Grid:~N^2triangles – N x N height field
– Gets fairly significant quickly
– Don’t need full res everywhere
=> LOD (but: careful – cracks)
– Variousalgorithms
– E.g., clipmaps Src: https://developer.nvidia.com/gpugems/gpuge
COMP5822M – High Perf. Graphics
ms2/part-i-geometric-complexity/chapter-2- terrain-rendering-using-gpu-based-geometry
Level-of-Detail (LOD)
– Generaltechnique
– Use fewer triangles when less detail is
– E.g. far away
– Discrete LOD: fixed number of detail levels – ContinuousLOD:continuouslyreduce
– cLOD to be more popular
COMP5822M – High Perf. Graphics
Level-of-Detail (LOD)
– Generaltechnique
– Use fewer triangles when less detail is
– E.g. far away
– Discrete LOD: fixed number of detail levels – ContinuousLOD:continuouslyreduce
– cLOD to be more popular
COMP5822M – High Perf. Graphics
Src: https://docs.unity3d.com/Manual/LevelOfDetail.html
Billboards / Impostors
– 2D image to represent complex geometry – Traditionally: always faces camera
– Lesscommonthesedays?
COMP5822M – High Perf. Graphics
Src: http://www.lighthouse3d.com/opengl/billboarding/index.php?billCheat
Billboards / Impostors
– Impostors: render on the fly – Or update regularly
COMP5822M – High Perf. Graphics
https://developer.nvidia.com/gpugems/gpugems3/p art-iv-image-effects/chapter-21-true-impostors
– Less geometry is good, no geometry is better
– Culling: don’t draw unnecessary geometry
COMP5822M – High Perf. Graphics
– Back-faceculling:
– Already seen
– Discard triangles that are back-facing
– Happens in/before rasterization
– Typicallyenabled.
COMP5822M – High Perf. Graphics
– Back-faceculling:
– Already seen
– Discard triangles that are back-facing
– Happens in/before rasterization
– Typicallyenabled.
COMP5822M – High Perf. Graphics
https://bugs.mojang.com/browse/MCPE- 63268?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment- tabpanel&showAll=true
– Frustumculling:
– Only draw objects inside of the view frustum
– Skip objects outside
– Typically done on a per-object level
– Don’t try to cull individual triangles
– Too much overhead
– HW deals with that efficiently
COMP5822M – High Perf. Graphics
– Frustum culling traditionally on CPU
– But has started to move to GPU
– E.g., vkCmdDrawIndirect et al.
– Use compute shader to cull objects / meshes
COMP5822M – High Perf. Graphics
– Occlusionculling
– Idea: don’t draw objects that are fully
obscured by other objects
– Difficult in the general case
– Specializedsystems
– Potentially-visible-sets (PVS) – Portals
COMP5822M – High Perf. Graphics
Acceleration structures
– Accelerate spatial queries
– E.g., for frustum culling?
– Key component for modern ray tracers
(both offline and online)
– Collision detection
COMP5822M – High Perf. Graphics
Quadtree / Octree
– Quadtree=2D
– Octree=3D(butsameidea)
COMP5822M – High Perf. Graphics
Quadtree / Octree
– Quadtree=2D
– Octree=3D(butsameidea)
COMP5822M – High Perf. Graphics
Quadtree / Octree
– Quadtree=2D
– Octree=3D(butsameidea)
COMP5822M – High Perf. Graphics
Quadtree / Octree
– Quadtree=2D
– Octree=3D(butsameidea)
– Placeobjectsinleafs?
COMP5822M – High Perf. Graphics
Quadtree / Octree
– Quadtree=2D
– Octree=3D(butsameidea)
– Placeobjectsinleafs?
– Overlappingobjects
– Place in smallest node that fits it?
– Place in multiple nodes?
– Extend nodes? (Loose quad-/octree)
COMP5822M – High Perf. Graphics
– Binarytree
– Quadtree had fan-out = 4
– Octree: fan-out = 8
– Split along axes in fixed order – E.g. X->Y->X->Y…
COMP5822M – High Perf. Graphics
– Binarytree
– Quadtree had fan-out = 4
– Octree: fan-out = 8
– Split along axes in fixed order – E.g. X->Y->X->Y…
COMP5822M – High Perf. Graphics
– Binarytree
– Quadtree had fan-out = 4
– Octree: fan-out = 8
– Split along axes in fixed order – E.g. X->Y->X->Y…
COMP5822M – High Perf. Graphics
– Binarytree
– Quadtree had fan-out = 4
– Octree: fan-out = 8
– Split along axes in fixed order – E.g. X->Y->X->Y…
COMP5822M – High Perf. Graphics
– Similarissues
– Split not always possible
(if objects have any size)
– Need to figure out where to split
COMP5822M – High Perf. Graphics
– Binarytree(likekD-Tree) – Arbitrarysplitplanes
COMP5822M – High Perf. Graphics
– Binarytree(likekD-Tree) – Arbitrarysplitplanes
COMP5822M – High Perf. Graphics
– Binarytree(likekD-Tree) – Arbitrarysplitplanes
COMP5822M – High Perf. Graphics
– Binarytree(likekD-Tree) – Arbitrarysplitplanes
COMP5822M – High Perf. Graphics
– Construction difficult – NP-hard
– Used in Quake to sort triangles
– And several other games that followed
COMP5822M – High Perf. Graphics
Bounding Volume Hierarchy (BVH)
– Cluster objects into groups
– E.g., by placing them in a box/sphere/…
COMP5822M – High Perf. Graphics
Bounding Volume Hierarchy (BVH)
– Cluster objects into groups
– E.g., by placing them in a box/sphere/…
COMP5822M – High Perf. Graphics
Bounding Volume Hierarchy (BVH)
– Cluster objects into groups
– E.g., by placing them in a box/sphere/…
COMP5822M – High Perf. Graphics
Bounding Volume Hierarchy (BVH)
– Cluster objects into groups
– E.g., by placing them in a box/sphere/…
COMP5822M – High Perf. Graphics
Bounding Volume Hierarchy (BVH)
– Differentvolumes – AABB
– …?(kDOP?)
– AABB-BVH:Mostcommonstructure?
– Extensive use in ray tracing
– More next time.
COMP5822M – High Perf. Graphics
COMP5822M – High Perf. Graphics
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com