Heavy slowdown when rendering front-to-back

Hi there

Since the Advanced Forum still seems to be down, i´ll post my question here. It isn´t that advanced anyway.

So, i started a new engine. I began with rendering my sectors as simple as possible. Sorted by texture and than brute-force, letting the GPU do the rest (bf-culling, depth-sorting, etc.).

Speed was as expected. For 6500 textured triangles (no shaders), rendered in 8 batches (8 texture switches) i got 190 FPS.

A z-only pass speeded it up, after i added shaders.
Now i thought i could speed up the efficiency of that z-only pass by rendering it front-to-back. No problem. No textures, no color-writes, only a few big batches, just the order of the indices changed.

However, instead of speeding up a bit, it slowed down from 125 FPS down to 35 !!!

This is all on a Radeon 9600XT.

I read ATIs SDK and there i found a passage, which says that random vertex-accesses are worse then sequentiel updates, because of the pre-T&L cache.
Anyway a slowdown of 90FPS ??? Is this still expected behaviour?

The SDK also says, that aligning data on 32 bytes will increase random access speed. My vertex-data is 64 bytes big. I use VBO, so the driver should be able to align it very well, no?

I don´t understand this heavy slowdown. Anyway, BSP-trees seem to lose their advantage in 3D rendering, because of the heavy cache misses they cause.


Remember you’re introducing other overheads for example those texture state changes that you can no longer sort for. It’s not entirely clear at what level you depth sorted. Seems like it may have been primitive level at best, you know that’s going to be expensive.

No, in a z-only pass i don´t use any textures. I still use the same amount of glDrawRangeElements-calls, etc. everything the same, but the order of the elements has changed. So, no other overhad, except for the random vertex access.