alpha test & early-z confusion

I would appreciate it if someone could provide me some clarity on the following:

I’ve read on various places that ‘using alpha testing hurts early-z performance’. I rougly understand how early-z works, and why the use of alpha testing may affect performance.

However, does the use of alpha testing merely disable the performance gain that early-z offers for the alpha tested geometry – or – does it really aversely affect performance of subsequent rendered (non alpha tested) geometry?

For example, if I render a wire fence that covers nearly the entire screen, and then render the scene behind it, would that be slower than rendering the scene first and then the fence?

In other words, would it make sense to sort objects by depth and render alpha tested geometry back to front?

However, does the use of alpha testing merely disable the performance gain that early-z offers for the alpha tested geometry
i 99% certain itll only be the fence thatll be affected.
i believe with alphatesting enabled on nvidia hardware filling performance is halved, but if youve gotta do you gotta do it.

if possible rendering all geometry front to back thus fence + then background

It disables the performance win of early z pixels.

The problem is that with early z testing the zbuffer is before shading, but in the logical pipeline it is after shading.

If you place z before shading and before alpha test then your zbuffer unit generally needs to write z to the depthbuffer at that stage. With alpha test later that depth write may subsequently be invalidated, so unless your hardware design can defer the depth write until the shading and alpha test is complete, you must perform depth testing later.

Now, if you defer for alpha test you have to do things like watch for pixel collisions between defered writes and new pixels arriving from other primitives. It is non trivial.

If you can’t defer the write then you have to be able to move your zbuffer hardware logically to a later stage in the pipeline.

If you can do neither you may need to fall back on software (unlikely on modern hardware).

So for pure early z it affects only the fragments not subsequent ones, however…

early z and other z optimizations can be difficult to differentiate, and you may have some coarse z / hyper z configuration where alpha tested fragments defeat the cached tiled zbuffer writes. It is a complex question.

It is very architecture dependent and if it’s related to pure early z I’d say it is just the alpha tested pixels that are affected just remember only the hidden ones cost you more the visible ones would have passed z anyway and been shaded. For a tiled z or hyper z situation I’d say alpha testing may defeat the hardware optimization and/or increase zbuffer bandwidth & stalls but could possibly also incur shading overhead.

It’s best to be flexible and measure the impact on a per platform basis if you’re really concerned. Or you could just write reasonable code and let the most innovative hardware win and encourage others to improve the alpha test case.

I think this falls squarely in the realm of “try it and see”, though perhaps not as squarely as it would first seem.

If your rendering is slower, as a result of the alpha test, then it’s likely due to the additional cost of the alpha test itself, and the fact that early z and any benefits derived therefrom are consequently foregone.

Performance gained from zculling is really a function of depth complexity, so any real qualification has to take the actual scene into consideration. If there’s little depth complexity behind the fence, there’s correspondingly little performance to be gained from zculling, and therefore little added penalty overall from the alpha test.

P.S. Double post, what Dorbie said ^^

Yup, leghorn is spot on, even the published claims of the card designers cannot be relied on in some situations with some drivers etc.

Okay, makes a lot of sense.

Thanks for your thoughts. :slight_smile:

btw

As my tests showed, on GeForce6 and upper, enabling alpha-test doesn’t affect early-z culling.

On ATI hardware there are very few cases where you’d permanently disable HyperZ optimizations. It’s almost always only on the pass in question. Also note that there are two different kinds of culling, HierarchicalZ (HiZ) and EarlyZ. HiZ works on tiles and will remain functional even with alpha test enabled. EarlyZ, which operates on a per pixel level, on the other hand will be disabled if the depth or stencil buffer could potentially be updated in the pass. You can have EarlyZ enabled as well if you set depth writes to GL_FALSE and stencil mask to zero (or disable stencil test).

If you have an advanced shader that you use with alpha test it may be worth it to do a separate PreZ pass with the alpha test just laying out the depth, then doing the full shader with alpha test still enabled but with depth writes disabled.

Yep, with coarse z reject you can always win on the early reject for the pass in question, however the coarse z for alpha tested primitives cannot always be updated depending on hardware details, so although the alpha tested rendering isn’t affected, if the alpha tested fragments pass z then subsequent coarse z may be defeated because the path to update the coarse z tile post alpha test is non trivial.

Conversely early z alpha test is defeated in the pass in question but subsequent rendering is probably OK.

You’ve really got to watch the generalizations here though and measurement is the best option. Just remember that content matters a lot too when it comes to the final numbers, and rendering order & passes vs fails can have a significant effect on your measured results.

Right - test the performance if you want to know for sure whether something’s “fast”.

The two things I’d encourage are a) avoid shader depth replace and b) mask depth/stencil writes as much as possible (as Humus describes above).