Advanced VR Rendering Alex Vlachos, Valve Alex@ValveSoftware.com
Outline ●くろまる VR at Valve ●くろまる Methods for Stereo Rendering ●くろまる Timing: Scheduling, Prediction, VSync, GPU Bubbles ●くろまる Specular Aliasing & Anisotropic Lighting ●くろまる Miscellaneous VR Rendering Topics 2
VR at Valve ●くろまる Began VR research 3+ years ago ●くろまる Both hardware and software engineers ●くろまる Custom optics designed for VR ●くろまる Display technology – low persistence, global display ●くろまる Tracking systems ●くろまる Fiducial-based positional tracking ●くろまる Desktop dot-based tracking and controllers ●くろまる Laser-tracked headset and controllers ●くろまる SteamVR API – Cross-platform, OpenVR 3
HTC Vive Developer Edition Specs ●くろまる Refresh rate: 90 Hz (11.11 ms per frame) ●くろまる Low persistence, global display ●くろまる Framebuffer: 2160x1200 (1080x1200 per-eye) ●くろまる Off-screen rendering ~1.4x in each dimension: ●くろまる 1512x1680 per-eye = 2,540,160 shaded pixels per-eye (brute-force) ●くろまる FOV is about 110 degrees ●くろまる 3600 room-scale tracking ●くろまる Multiple tracked controllers and other input devices 4
Room-Scale Tracking 5
Optics & Distortion (Pre-Warp) Warp pass uses 3 sets of UVs for RGB separately to account for spatial and chromatic distortion (Visualizing 1.4x render target scalar) 6
Optics & Distortion (Post-Warp) Warp pass uses 3 sets of UVs for RGB separately to account for spatial and chromatic distortion (Visualizing 1.4x render target scalar) 7
Shaded Visible Pixels per Second ●くろまる 720p @ 30 Hz: 27 million pixels/sec ●くろまる 1080p @ 60 Hz: 124 million pixels/sec ●くろまる 30" Monitor 2560x1600 @ 60 Hz: 245 million pixels/sec ●くろまる 4k Monitor 4096x2160 @ 30 Hz: 265 million pixels/sec ●くろまる VR 1512x1680x2 @ 90 Hz: 457 million pixels/sec ●くろまる We can reduce this to 378 million pixels/sec (later in the talk) ●くろまる Equivalent to 30" Monitor @ 100 Hz for a non-VR renderer 8
There Are No "Small" Effects ●くろまる Tracking allows users to get up close to anything in the tracked volume ●くろまる Can’t implement a super expensive effect and claim "it’s just this small little thing in the corner" ●くろまる Even your floors need to be higher fidelity than we have traditionally authored ●くろまる If it’s in your tracked volume, it must be high fidelity 9
VR Rendering Goals 10 ●くろまる Lowest GPU min spec possible ●くろまる We want VR to succeed, but we need customers ●くろまる The lower the min spec, the more customers we have ●くろまる Aliasing should not be noticeable to customers ●くろまる Customers refer to aliasing as "sparkling" ●くろまる Algorithms should scale up to multi-GPU installations ●くろまる Ask yourself, "Will ‘X’ scale efficiently to a 4-GPU machine?"
Outline ●くろまる VR at Valve ●くろまる Methods for Stereo Rendering ●くろまる Timing: Scheduling, Prediction, VSync, GPU Bubbles ●くろまる Specular Aliasing & Anisotropic Lighting ●くろまる Miscellaneous VR Rendering Topics 11
Stereo Rendering (Single-GPU) ●くろまる Brute-force run your CPU code twice (BAD) ●くろまる Use geometry shader to amplify geometry (BAD) ●くろまる Resubmit command buffers (GOOD, our current solution) ●くろまる Use instancing to double geo (BETTER. Half the API calls, improved cache coherency for VB/IB/texture reads) ●くろまる "High Performance Stereo Rendering For VR", Timothy Wilson, San Diego Virtual Reality Meetup 12
Stereo Rendering (Multi-GPU) ●くろまる AMD and NVIDIA both provide DX11 extensions to accelerate stereo rendering across multiple GPUs ●くろまる We have already tested the AMD implementation and it nearly doubles our framerate – have yet to test the NVIDIA implementation but will soon ●くろまる Great for developers ●くろまる Everyone on your team can have a multi-GPU solution in their dev box ●くろまる This allows you to break framerate without uncomfortable low-framerate VR ●くろまる But lie to your team about framerate and report single-GPU fps :) 13
Outline ●くろまる VR at Valve ●くろまる Methods for Stereo Rendering ●くろまる Timing: Scheduling, Prediction, VSync, GPU Bubbles ●くろまる Specular Aliasing & Anisotropic Lighting ●くろまる Miscellaneous VR Rendering Topics 14
Prediction ●くろまる We aim to keep prediction times (render to photons) for the HMD and controller transforms as short as possible (accuracy is more important than total time) ●くろまる Low persistence global displays: panel is lit for only ~2 ms of the 11.11 ms frame NOTE: Image above is not optimal VR rendering, but helps describe prediction (See later slides) 15
Pipelined Architectures ●くろまる Simulating next frame while rendering the current frame ●くろまる We re-predict transforms and update our global cbuffer right before submit ●くろまる VR practically requires this due to prediction constraints ●くろまる You must conservatively cull on the CPU by about 5 degrees 16
Waiting for VSync ●くろまる Simplest VR implementation, predict right after VSync ●くろまる Pattern #1: Present(), clear back buffer, read a pixel ●くろまる Pattern #2: Present(), clear back buffer, spin on a query ●くろまる Great for initial implementation, but please DO NOT DO THIS. GPUs are not designed for this. ●くろまる See John McDonald’s talk: ●くろまる "Avoiding Catastrophic Performance Loss: Detecting CPU-GPU Sync Points", John McDonald, NVIDIA, GDC 2014 17
GPU Bubbles ●くろまる If you start submitting draw calls after VSync: ●くろまる Ideally, your capture should look like this: (Images are screen captures of NVIDIA Nsight) 18
"Running Start" ●くろまる If you start to submit D3D calls after VSync: ●くろまる Instead, start submitting D3D calls 2 ms before VSync. (2 ms is a magic number based on the 1.5-2.0ms GPU bubbles we measured on current GPUs): ●くろまる But, you end up predicting another 2 ms (24.22 ms total) 19
"Running Start" VSync 20 ●くろまる Question: How do you know how far you are from VSync? ●くろまる Answer: It’s tricky. Rendering APIs don’t directly provide this. ●くろまる The SteamVR/OpenVR API on Windows in a separate process spins on calls to IDXGIOutput::WaitForVBlank() and notes the time and increments a frame counter. The application can then call GetTimeSinceLastVSync() that also returns a frame ID. ●くろまる GPU vendors, HMD devices, and rendering APIs should provide this
"Running Start" Details 21 ●くろまる To deal with a bad frame, you need to partially synchronize with the GPU ●くろまる We inject a query after clearing the back buffer, submit our entire frame, spin on that query, then call Present() ●くろまる This ensures we are on the correct side of VSync for the current frame, and we can now spin until our running start time
Why the Query Is Critical 22 ●くろまる If a frame is late, the query will keep you on the right side of VSync for the following frame ensuring your prediction remains accurate
Running Start Summary 23 ●くろまる This is a solid 1.5-2.0ms GPU perf gain! ●くろまる You want to see this in NVIDIA Nsight: ●くろまる You want to see this in Microsoft’s GPUView:
Outline ●くろまる VR at Valve ●くろまる Methods for Stereo Rendering ●くろまる Timing: Scheduling, Prediction, VSync, GPU Bubbles ●くろまる Specular Aliasing & Anisotropic Lighting ●くろまる Miscellaneous VR Rendering Topics 24
Aliasing Is Your Enemy ●くろまる The camera (your head) never stops moving. Aliasing is amplified because of this. ●くろまる While there are more pixels to render, each pixel fills a larger angle than anything we’ve done before. Here are some averages: ●くろまる 2560x1600 30" monitor: ~50 pixels/degree (50 degree H fov) ●くろまる 720p 30" monitor: ~25 pixels/degree (50 degree H fov) ●くろまる VR: ~15.3 pixels/degree (110 degree fov w/ 1.4x) ●くろまる We must increase the quality of our pixels 25
4xMSAA Minimum Quality ●くろまる Forward renderers win for antialiasing because MSAA just works ●くろまる We use 8xMSAA if perf allows ●くろまる Image-space antialiasing algorithms must be compared side-by-side with 4xMSAA and 8xMSAA to know how your renderer will compare to others in the industry ●くろまる Jittered SSAA is obviously the best using the HLSL ‘sample’ modifier, but only if you can spare the perf 26
Normal Maps Are Not Dead ●くろまる Most normal maps work great in VR...mostly. ●くろまる What doesn’t work: ●くろまる Feature detail larger than a few cm inside tracked volume is bad ●くろまる Surface shape inside a tracked volume can’t be in a normal map ●くろまる What does work: ●くろまる Distant objects outside the tracked volume you can’t inspect up close ●くろまる Surface "texture" and fine details: 27
Normal Map Mipping Error 28 Blinn-Phong Specular Zoomed out normal map box filtered mips Zoomed out super-sampled 36 samples Expected glossiness Incorrect glossiness
Normal Map Mipping Problems ●くろまる Any mip filter that just generates an averaged normal loses important roughness information 29
Normal Map Visualization 30 4096x4096 Normal Map Fire Alarm 4x4 Mip Visualization 2x2 Mip 1x1 Mip
Normal Map Visualization 31 4096x4096 Normal Map Fire Alarm 8x8 Mip Visualization16x16 Mip Visualization
Normal Map Visualization 32 4096x4096 Normal Map Dota 2 Mirana Body 4x4 Mip Visualization 2x2 Mip 1x1 Mip
Normal Map Visualization 33 4096x4096 Normal Map Dota 2 Juggernaut Sword Handle 4x4 Mip Visualization 2x2 Mip 1x1 Mip
Normal Map Visualization 34 4096x4096 Normal Map Shoulder Armor 4x4 Mip Visualization 2x2 Mip 1x1 Mip
Normal Map Visualization 35 4096x4096 Normal Map Metal Siding 4x4 Mip Visualization 2x2 Mip 1x1 Mip
Roughness Encoded in Mips ●くろまる We can store a single isotropic value (visualized as the radius of a circle) that is the standard deviation of all 2D tangent normals from the highest mip that contributed to this texel ●くろまる We can also store a 2D anisotropic value (visualized as the dimensions of an ellipse) for the standard deviation in X and Y separately that can be used to compute tangent-space axis-aligned anisotropic lighting! 36
Final Mip Chain 37
Add Artist-Authored Roughness ●くろまる We author 2D gloss = 1.0 – roughness ●くろまる Mip with a simple box filter ●くろまる Add/sum it with the normal map roughness at each mip level ●くろまる Because we have anisotropic gloss maps anyway, storing the generated normal map roughness is FREE 38 Isotropic Gloss Anisotropic Gloss
Tangent-Space Axis-Aligned Anisotropic Lighting ●くろまる Standard isotropic lighting is represented along the diagonal ●くろまる Anisotropy is aligned with either of the tangent-space axes ●くろまる Requires only 2 additional values paired with a 2D tangent normal = Fits into an RGBA texture (DXT5 >95% of the time) 39
Roughness to Exponent Conversion 40 void RoughnessEllipseToScaleAndExp( float2 vRoughness, out float o_flDiffuseExponentOut, out float2 o_vSpecularExponentOut, out float2 o_vSpecularScaleOut ) { o_flDiffuseExponentOut = ( ( 1.0 - ( vRoughness.x + vRoughness.y ) * 0.5 ) * 0.8 ) + 0.6; // Outputs 0.6-1.4 o_vSpecularExponentOut.xy = exp2( pow( 1.0 - vRoughness.xy, 1.5 ) * 14.0 ); // Outputs 1-16384 o_vSpecularScaleOut.xy = 1.0 - saturate( vRoughness.xy * 0.5 ); // This is a pseudo energy conserving scalar for the roughness exponent } ●くろまる Diffuse lighting is Lambert raised to exponent (N.Lk) where k is in the range 0.6-1.4 ●くろまる Experimented with anisotropic diffuse lighting, but not worth the instructions ●くろまる Specular exponent range is 1-16,384 and is a modified Blinn-Phong with anisotropy (more on this later)
How Anisotropy Is Computed 41 Tangent U Lighting * = Tangent V Lighting Final Lighting * =
Shader Code 42 Anisotropic Specular Lighting: float3 vHalfAngleDirWs = normalize( vPositionToLightDirWs.xyz + vPositionToCameraDirWs.xyz ); float3 vSpecularNormalX = vHalfAngleDirWs.xyz - ( vTangentUWs.xyz * dot( vHalfAngleDirWs.xyz, vTangentUWs.xyz ) ); float3 vSpecularNormalY = vHalfAngleDirWs.xyz - ( vTangentVWs.xyz * dot( vHalfAngleDirWs.xyz, vTangentVWs.xyz ) ); float flNDotHX = max( 0.0, dot( vSpecularNormalX.xyz, vHalfAngleDirWs.xyz ) ); float flNDotHkX = pow( flNDotHX, vSpecularExponent.x * 0.5 ); flNDotHkX *= vSpecularScale.x; float flNDotHY = max( 0.0, dot( vSpecularNormalY.xyz, vHalfAngleDirWs.xyz ) ); float flNDotHkY = pow( flNDotHY, vSpecularExponent.y * 0.5 ); flNDotHkY *= vSpecularScale.y; float flSpecularTerm = flNDotHkX * flNDotHkY; Isotropic Diffuse Lighting: float flDiffuseTerm = pow( flNDotL, flDiffuseExponent ) * ( ( flDiffuseExponent + 1.0 ) * 0.5 ); Isotropic Specular Lighting: float flNDotH = saturate( dot( vNormalWs.xyz, vHalfAngleDirWs.xyz ) ); float flNDotHk = pow( flNDotH, dot( vSpecularExponent.xy, float2( 0.5, 0.5 ) ) ); flNDotHk *= dot( vSpecularScale.xy, float2( 0.33333, 0.33333 ) ); // 0.33333 is to match the spec intensity of the aniso algorithm above float flSpecularTerm = flNDotHk; void RoughnessEllipseToScaleAndExp( float2 vRoughness, out float o_flDiffuseExponentOut, out float2 o_vSpecularExponentOut, out float2 o_vSpecularScaleOut ) { o_flDiffuseExponentOut = ( ( 1.0 - ( vRoughness.x + vRoughness.y ) * 0.5 ) * 0.8 ) + 0.6; // Outputs 0.6-1.4 o_vSpecularExponentOut.xy = exp2( pow( 1.0 - vRoughness.xy, 1.5 ) * 14.0 ); // Outputs 1-16384 o_vSpecularScaleOut.xy = 1.0 - saturate( vRoughness.xy * 0.5 ); // This is a pseudo energy conserving scalar for the roughness exponent }
Geometric Specular Aliasing ●くろまる Dense meshes without normal maps also alias, and roughness mips can’t help you! ●くろまる We use partial derivatives of interpolated vertex normals to generate a geometric roughness term that approximates curvature. Here is the hacky math: float3 vNormalWsDdx = ddx( vGeometricNormalWs.xyz ); float3 vNormalWsDdy = ddy( vGeometricNormalWs.xyz ); float flGeometricRoughnessFactor = pow( saturate( max( dot( vNormalWsDdx.xyz, vNormalWsDdx.xyz ), dot( vNormalWsDdy.xyz, vNormalWsDdy.xyz ) ) ), 0.333 ); vRoughness.xy = max( vRoughness.xy, flGeometricRoughnessFactor.xx ); // Ensure we don’t double-count roughness if normal map encodes geometric roughness 43 Visualization of flGeometricRoughnessFactor
Geometric Specular Aliasing Part 2 ●くろまる MSAA center vs centroid interpolation: It’s not perfect ●くろまる Normal interpolation can cause specular sparkling at silhouettes due to over-interpolated vertex normals ●くろまる Here’s a trick we are using: ●くろまる Interpolate normal twice: once with centroid, once without float3 vNormalWs : TEXCOORD0; centroid float3 vCentroidNormalWs : TEXCOORD1; ●くろまる In the pixel shader, choose the centroid normal if normal length squared is greater than 1.01 if ( dot( i.vNormalWs.xyz, i.vNormalWs.xyz ) >= 1.01 ) { i.vNormalWs.xyz = i.vCentroidNormalWs.xyz; } 44
Outline ●くろまる VR at Valve ●くろまる Methods for Stereo Rendering ●くろまる Timing: Scheduling, Prediction, VSync, GPU Bubbles ●くろまる Specular Aliasing & Anisotropic Lighting ●くろまる Miscellaneous VR Rendering Topics 45
Normal Map Encoding ●くろまる Projecting tangent normals onto Z plane only uses 78.5% of the range of a 2D texel ●くろまる Hemi-octahedron encoding uses the full range of a 2D texel ●くろまる "A Survey of Efficient Representations for Independent Unit Vectors", Cigolle et al., Journal of Computer Graphics Techniques Vol. 3, No. 2, 2014 (Image modified from above paper) 46
Scale Render Target Resolution ●くろまる Turns out, 1.4x is just a recommendation for the HTC Vive (Each HMD design has a different recommended scalar based on optics and panels) ●くろまる On slower GPUs, scale the recommended render target scalar down ●くろまる On faster GPUs, scale the recommended render target scalar up ●くろまる If you’ve got GPU cycles to burn, BURN THEM 47
Anisotropic Texture Filtering ●くろまる Increases the perceived resolution of the panels (don’t forget, we only have fewer pixels per degree) ●くろまる Force this on for color and normal maps ●くろまる We use 8x by default ●くろまる Disable for everything else. Trilinear only, but measure perf. Anisotropic filtering may be "free" if you are bottlenecked elsewhere. 48
Noise Is Your Friend ●くろまる Gradients are horrible in VR. Banding is more obvious than LCD TVs. ●くろまる We add noise on the way into the framebuffer when we have floating-point precision in the pixel shader float3 ScreenSpaceDither( float2 vScreenPos ) { // Iestyn's RGB dither (7 asm instructions) from Portal 2 X360, slightly modified for VR float3 vDither = dot( float2( 171.0, 231.0 ), vScreenPos.xy + g_flTime ).xxx; vDither.rgb = frac( vDither.rgb / float3( 103.0, 71.0, 97.0 ) ) - float3( 0.5, 0.5, 0.5 ); return ( vDither.rgb / 255.0 ) * 0.375; } 49
Environment Maps ●くろまる Standard implementation at infinity = only works for sky ●くろまる Need to use some type of distance remapping for environment maps ●くろまる Sphere is cheap ●くろまる Box is more expensive ●くろまる Both are useful in different situations ●くろまる Read this online article: ●くろまる "Image-based Lighting approaches and parallax-corrected cubemaps", Sébastien Lagarde, 2012 50
Stencil Mesh (Hidden Area Mesh) ●くろまる Stencil out the pixels you can’t actually see through the lenses. GPUs are fast at early stencil-rejection. ●くろまる Alternatively you can render to the depth buffer at near z so everything early z-rejects instead ●くろまる Lenses produce radially symmetric distortion which means you effectively see a circular area projected on the panels 51
Stencil Mesh (Warped View) 52
Stencil Mesh (Ideal Warped View) 53
Stencil Mesh (Wasted Pixels) 54
Stencil Mesh (Unwarped View) 55
Stencil Mesh (Unwarped View) 56
Stencil Mesh (Final Unwarped View) 57
Stencil Mesh (Final Warped View) 58
Stencil Mesh (Hidden Area Mesh) ●くろまる SteamVR/OpenVR API will provide this mesh to you ●くろまる Results in a 17% fill rate reduction! ●くろまる No stencil mesh: VR 1512x1680x2 @ 90Hz: 457 million pixels/sec ●くろまる 2,540,160 pixels per eye (5,080,320 pixels total) ●くろまる With stencil mesh: VR 1512x1680x2 @ 90Hz: 378 million pixels/sec ●くろまる About 2,100,000 pixels per eye (4,200,000 pixels total) 59
Warp Mesh (Lens Distortion Mesh) 60
Warp Mesh (Brute-Force) 61
Warp Mesh (Cull UV’s Outside 0-1) 62
Warp Mesh (Cull Stencil Mesh) 63
Warp Mesh (Shrink Wrap) 15% of pixels culled from the warp mesh 64
Performance Queries Required! ●くろまる You are always VSync’d ●くろまる Disabling VSync to see framerate will make you dizzy ●くろまる Need to use performance queries to report GPU workload ●くろまる Simplest implementation is to measure first to last draw call ●くろまる Ideally measure these things: ●くろまる Idle time from Present() to first draw call ●くろまる First draw call to last draw call ●くろまる Idle time from last draw call to Present() 65
Summary ●くろまる Stereo Rendering ●くろまる Prediction ●くろまる "Running Start" (Saves 1.5-2.0 ms/frame) ●くろまる Anisotropic Lighting & Mipping Normal Maps ●くろまる Geometric Specular Antialiasing ●くろまる Stencil Mesh (Saves 17% pixels rendered) ●くろまる Optimized Warp Mesh (Reduces cost by 15%) ●くろまる Etc. 66
Thank You! Alex Vlachos, Valve Alex@ValveSoftware.com 67