Early z-test

I do something like that:
enable write to depth buffer
active alpha test
glColorMask(false, false, false, true)
draw scene

disable alpha test
enable blending
glColorMask(true, true, true, true)
draw scene

It is drawn correctly but I don’t see any speed improvement (when every fragments fails alpha test, the speed is the same) and I know my app is limited by the fragment program.
My fragment program doesn’t use discard or change glFragDepth.

I have read in nvidia’s “gpu programming guide” that using alpha test could(?) disable early z-test until the z-buffer is cleared again.
I don’t understand why early z-test needs to be disabled until next clear. And it’s not a “will be disabled” but a “can be disabled”.
What about early stencil test? Should I except the same thing?

I have tested my code on a geforce fx.
Any idea of how I could manage to do this early z-test?

Thank you. :wink:

early z test is disabled on NVIDIA GeForce FX with alpha test enabled, because it isn’t really called “early Z test” but “early Z culling”. It ins’t only a test but also writing the result z value to depthbuffer.

this is standard procedure:

fragment processing
depth test
alpha test
output to framebuffer(color and z + stencil)

and with early z culling:

early z culling(z test + write output)
fragment processing
output to framebuffer(color)

if you add alpha test after fragment processing, you must wait with writing to z buffer until the alpha test pass. And so you have no speedup.

Early stencil test is supported only on Radeon > 9500 and GeForce 6 series

Hi Matt,
I understand I cannot do early z-culling when alpha test is enabled. But in my second pass alpha test is disabled, so I don’t see what is the problem.

the hardware is the problem… not quite sure where in the pipeline this restriction originates, but it simply does not work that way on nv3x hardware (it is not possible to use alpha test (or any other test) to discard fragments in the first pass and then see the early z test performance gain in a successive pass)

Speculating, it’s possible they have some sort of on chip coarse z in an early pipeline stage that effectively acts as an early z (from a developers perspective it’s impossible to tell and one has about as much performance impact as the other except in corner cases). If you alpha test then you to can’t update coarse z on the chip and may have to tag coarse z as invalid for those tiles. It implies they don’t send fragment passes back to the early/coarse z testing, but maybe they do that for z but don’t buffer it all the way past their shaders to get a final result. Without knowing the internals a surefire strategy is difficult to come up with, but try and at least fill depth with alpha test off for as much of the scene as possible. Make alpha test a state that you selectively enable or disable during scene rendering when you anticipate it is needed.

Yes, alpha test + depth writes is one of the bad cases for GeForce FX (and GeForce 4 Ti) architectures. It not only hurts during alpha test, but during subsequent rendering as well.

The GeForce 6 series doesn’t suffer from this limitation, however.

I have tested my code on nv40 but results are the same :frowning: .

ATI has or had this same problem right? I have a document at my home.

You would need to benchmark if it is faster to render using blending and no alpha test or no blend and with alpha test.

However, I remember that a long time ago it was recommended to use alpha test as much as possible.

Originally posted by edf:
I have tested my code on nv40 but results are the same :frowning: .

Can you send me (or point me to) a repro case?
You should be seeing a speedup on NV4x.

Originally posted by V-man:
However, I remember that a long time ago it was recommended to use alpha test as much as possible.
In the Good Old Days before early z, alpha test could kill expensive memory transactions without
having any real downside. Now that we have early z testing to accelerate shading, operations like shader kill, shader depth replace, alpha test, and alpha to coverage all complicate things.

Those features can still be “goodness” but you can’t say that without qualification like you used to be able to…

Thanks -

On ATI, the early z test and “mass” z culling can occur after certain stages since the 8500.
This doesn’t happen on the NV3X, or it does but it still leads to performance degradation?

Reader Question: Why is early Z test not supported when using Alpha testing or the fragment kill instruction?-Imanewbie Londey

Rick Bergman: First we should make a distinction between Early Z processing and Z Culling. Early Z is a technique developed by ATI that does the complete Depth & Stencil read-modify-write operation prior to pixel shading. Z Culling (which occurs inside our HiZ unit) rapidly discards groups of occluded pixels prior to pixel shading. Both acceleration techniques minimize pixel shading processing on portions of primitives that won’t be visible, and are used simultaneously on products such as the X800.

In situations where Alpha Test and Fragment Kill can change the visibility of portions of a polygon, our chips do the Depth & Stencil read-modify-write after these operations. Early Z culling is not affected by these operations, and continues to operate. There are, however, some modes when neither acceleration technique can be used, such as when Depth or Z is generated in the pixel shader.

Can you send me (or point me to) a repro case?
You should be seeing a speedup on NV4x.
Sorry cass I can’t send you the code. The context is isosurface rendering with slices, so there is a lot of transparent/hidden pixels.
Unfortunaly, I don’t have full access to the NV40 and can’t make all the tests I would like to do…
But thank you all for your answers.

I’m currently battling with early z on 6800GT too, and symptoms so far are the same.

I have blending disabled. But I havn’t changed GL_LEQUAL to GL_EQUAL (shouldn’t matter) and I am only prerendering some stuff that is most expensive.


clear depth, color intact


glColorMask(gl_false, gl_false, gl_false, gl_false);
render tuff stuff from VBO's via ffp, no light no stuff.
glColorMask(gl_true, gl_true, gl_true, gl_true);
render tuff stuff from VBO's via HEAVY shader using ftransform in vp.

Nothing seems to work, I get 10 fps with current shader normaly, but with 2x overdraw that drops to 5, for demo I’m intending to do, I’d like to stay above 8…

Maybe problem is because I’m rendering scene to FP16 texture or am I forgetting something else.