I wrote a simple application to test the effect of early-z rejection in NV hardware.
It simply draws 20 textured quads(one behind
another) with orthographic projection.
The application is fill limited hence
drawing those quads in front-to-back and
back-to-front order should make a difference.
And it does.The difference is,though,pretty small.
Test details :
Quadro FX 500 card, 44.96 driver, Suse Linux 9.1.
512x512 window size:
Back to front : ~193 fps
Front to back : ~210 fps
BtoF : 66fps
FtoB : 73fps
I’d like to hear experiences of others
to go on exploring early-z rejection
optimisations.Above results didn’t seem so
impressing to me and it doesn’t worth
the amount of work I need to implement this
in our application.Is this what we can
expect from early-z rejection or do I miss
from what ive seen a while ago on my gffx5200 doing a first pass laying down z values and then drawing the scene normally gave me only ~10% speed increase (this with resonably expensive shaders).
- now this is with resonably expensive shaders with simplier stuff the benifit is gonna be less
- drivers/hardware change thus this 10% values is prolly more now
200 fps does NOT qualify as severely fill limited. Also, fill limited can mean many things. There is a per-fragment overhead even with early Z reject. If your texturing operation is very simple, then the savings aren’t that much.
I would test this by rendering an object sampling from 8 textures, each with ANISOTROPIC filtering at level 8 or more, and making sure the texture coordinates are such that minification actually happens.
Also, make sure you clear the color and Z buffers together, and don’t enable blend or alpha test. If you get that wrong by mistake, it’ll turn off early Z.
Get your back-to-front case down to like 20 fps while bound on fragment processing, then the front-to-back case might speed up more.
Your test is not good enough: add some complex per-pixel calculations, like normal+gloss mapping, and you’ll see a much higher difference.
I was myself quite amazed when i saw that early Z rejection made a lot of difference in my app some time ago. I don’t remember the exact number but it was like 30-50% performance improvement.
My purpose with the test is to see
if I can optimise an existing application
with depth sorting.
- 1024x768 resolution
- ~60hz refresh
- single textured geoemtry
- A lot of overdraw
- Anisotrpoic filtering
characterizes my case quite well.
My initial post did not include results
with Aniso enabled but they do not change
much with it.
Maybe if I have more complex fragments
I can observe a better gain,though this wouldn’t
characterize my application.