I am running my application under gDebugger on an NVidia 6800 card. I am curious why I don’t get any pixels on the fast_z_count counter. Is it only active on Z-buffer-only passes?
Is the set of conditions required for fast_z processing listed somewhere?
I guess so
From http://developer.download.nvidia.com/GPU_Programming_Guide/GPU_Programming_Guide_G80.pdf :
3.6.1. Double-Speed Z-Only and Stencil Rendering
All GeForce Series GPUs (FX and later) render at double speed when rendering
only depth or stencil values. To enable this special rendering mode, you must
follow the following rules:
Color writes are disabled
Texkill has not been applied to any fragments (clip, discard)
Depth replace (oDepth, texm3x2depth, texdepth) has not been
applied to any fragments
Alpha test is disabled
No color key is used in any of the active textures
See section 6.4.1 for information on NULL render targets with double speed Z.
“Fast Z” is not a term I’m familiar with. There are two “fast Z” facilities:
[ol][li] Course-grained Z (aka Course Z, Hierarchical Z, Hi-Z, or ZCULL), and [*] Fine-grained Z (aka Fine Z, Early Z, Early Z Checking, Early Z Out).[/ol][/li]Think I’ve read there are also analogs of both of these for stencil.
AFAIK, Course Z (happens first) ends up killing whole blocks of fragments before we even have fragment shader threads created. Fine Z (happens right before fragment shader) ends up selectively killing specific fragment shader executions before they launch.
Regarding, Course Z (ZCULL), the NVidia GPU Programming Guide (GeForce 8 version) has this to say:
the following must be true for ZCULL to be used:
[ol][li] Clear the depth+stencil buffer Don’t change the depth value in your fragment shader Don’t change the direction of the depth test while writing depth Don’t enable stencil writes when doing stencil testing Don’t write to a 2D texture array[/ol] [/li]And for maximum ZCULL efficiency:
[ol][li] Write depth buffer with same test direction as is used for testing Don’t render a lot of little features Don’t allocate too many depth buffers Don’t use 32F depth buffers [/ol][/li]
And for Fine Z (Early Z), it says the following must be true for Early Z to be used:
[ol][li] Don’t change the depth value in your fragment shader Don’t reference gl_FragCoord.z in your fragment shader[*] Don’t enable depth or stencil writes or enable occlusion queries AND:[/li] a. Use alpha test, or
b. Call discard, or
c. Use alpha-to-coverage, or
d. Use multisample alpha [/ol]
@ Dark Photon : Please use the term “Coarse Z” instead of “Course Z” …
Nuts. Can’t believe I did that. Thanks for the correction!
Also of note regarding early Z functionality, it’s been explicitly exposed in EXT_shader_image_load_store, where it says:
[ul][li] When early per-fragment operations are enabled, the depth bounds test, stencil test, depth buffer test, and occlusion query sample counting operations are performed prior to fragment shader execution, and the stencil buffer, depth buffer, and occlusion query sample counts will be updated accordingly. When there is no active program, the active program has no fragment shader, or the active program was linked with early fragment tests disabled, these operations are performed only after fragment program execution, in the order described in chapter 4. Because fragment shaders may write to buffer objects or textures, as described in Section 2.14.X, the results of fragment shader execution can have side effects. … With side effects due to memory stores, no such optimizations [skipping fragment shader execution] are allowed.[*] If the fragment shader specifies a layout of “early_fragment_tests”, then the fragment tests will be performed before fragment shading even if there are side effects. [/ul][/li]So if you can’t figure your counter out by any other means, you might try forcing the early_fragment_tests layout to see if that affects the counter.