CS计算机代考程序代写 compiler GPU cache 10: The OpenGL Pipeline

10: The OpenGL Pipeline

18: Pipeline Optimisation

COMP5822M: High Performance Graphics
Pipeline Optimisation
First rule of optimisation:
Don’t optimise
Second rule of optimisation:
Find the bottleneck first
Pipeline optimisation:
Find the slowest stage in the pipeline
Look for stalling threads

COMP5822M: High Performance Graphics
Profiling Tools
PIX for Windows (DirectX)
gDEBugger (OpenGL)
NVPerfKit (NVIDIA)
GPUPerfStudio (ATI/AMD), supports Vulkan
OpenGL Profiler (Apple)

COMP5822M: High Performance Graphics
GPUPerfStudio
Supports high-level CPU/GPU analysis
API Trace
GPU Trace
Linked Trace

COMP5822M: High Performance Graphics
API Trace
Allows you to examine API calls
Tool tips show detailed timing
Useful to identify stalling

COMP5822M: High Performance Graphics
GPU Trace
Black: command buffer durations
Purple/pink: per-command buffer
Look for long pink sections (waste time)

COMP5822M: High Performance Graphics
Linked Trace
Links between API calls & GPU commands
Track back to find calling routine

COMP5822M: High Performance Graphics
Locating Bottlenecks
Two major strategies:
Test stages with decreasing workload
– if frame rate goes up, found a bottleneck
Reduce workload on other stages
– if frame rate is same, found a bottleneck
– in the unchanged stage

COMP5822M: High Performance Graphics
Application Stage Tests
Measure overall workload
Platform-specific tools for CPU workload
If CPU workload is high, found bottleneck
Test by sending minimal data downstream
i.e. data that causes little GPU work
e.g. create pass-through shader stages
Underclock CPU & watch performance drop

COMP5822M: High Performance Graphics
Geometry Stage Testing
Usually the hardest to test (interdependencies)
Bottlenecks in vertex fetch or processing
Fetch test: add extra texture coords per vertex
If performance drops, vertex fetch is cause
Processing test: make shader longer
Useful to have flags for conditional code
E.g. turn specular shading on/off

COMP5822M: High Performance Graphics
Raster Stage Testing
Triangle Setup:
Not usually a problem
Raster Operations:
Reduce bit depth of colour output
Fragment Shader Operations:
Increase screen resolution
Make shader more complex

COMP5822M: High Performance Graphics
Performance Measurements
Typically, we test:
Vertices per second
Pixels per second
Texels per second
Raw frame rate
But manufacturers always lie
So people build test scenes instead

COMP5822M: High Performance Graphics
Manhattan Benchmark
Designed to test OpenGL ES performance

https://kishonti.net/graphics-benchmarking.jsp

COMP5822M: High Performance Graphics
Optimisation Tricks
Optimisation depends on:
the task
the assets
the CPU
the GPU
the API

COMP5822M: High Performance Graphics
Asset Optimisation
Reduce / simplify models
Set polygon budgets
Set texture budgets
Apply level-of-detail tricks
Clip & cull anything not visible
Use acceleration structures

COMP5822M: High Performance Graphics
Application Stage Optimisation
All the usual tricks:
Shift operations to GPU
Reduce costly operations
Add compiler optimisations
Track down busy functions
Optimise inner loops
Substitute cheaper approaches

COMP5822M: High Performance Graphics
Memory Issues
The most expensive function call is malloc()
So manage your memory yourself
Align data with cache sizes
Sequential calls should access sequential data
Pointer jumps should be avoided
Use arrays not linked lists
Preprocess everything, load as data

COMP5822M: High Performance Graphics
Writing Fast Code
Never divide (especially on GPU)
Avoid math calls (lookup tables)
Avoid if statements (stalls pipelines)
Unroll small loops
Use inline code heavily
Reduce floating point precision
Avoid OO casts, inheritance
Don’t pass copies of structs/classes

COMP5822M: High Performance Graphics
Optimising API Calls
Every function call has overhead
Pass one buffer with 1000 triangles
Instead of calling one function 1000 times

COMP5822M: High Performance Graphics
Geometry Stage Optimisation
Store vertex data in compressed form
Write vertex shader for decompression
http://graphics.stanford.edu/~sliang/CS448B_win00/sliang-cs448b-contrib.html
Don’t use geometry lookaside buffer
Instead, extra pass for geometry
Then reuse 6 times with null shader
Exploit polygon connectivity
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.13.88&rep=rep1&type=pdf

COMP5822M: High Performance Graphics
Optimising Lighting
Reduce number and type of light sources
Background light can be baked in
e.g. a light map stored as a texture
Cull light sources based on distance
Use deferred shading to optimise calculations
Reduce complex lighting calculations
Do you really need caustics?

COMP5822M: High Performance Graphics
Rasteriser Optimisation
Enable backface culling
Disable Z-buffering, e.g. for skybox
Avoid clearing if every pixel will be covered
Use simple write not blending operations
Always use native texture format for GPU
Use different fragment shaders at distance

COMP5822M: High Performance Graphics
Vulkan Doom Benchmark

COMP5822M: High Performance Graphics
Remaining Lectures
We have 4 more lectures scheduled
We have covered most of the material
And we have an opportunity
NVIDIA GTC conference is in March
This year, it’s virtual, starting tomorrow
And the sessions are free online
So go watch those instead

COMP5822M: High Performance Graphics