Sorting & Transformation

SNX · May 14, 2005, 6:49pm

Hi

I’m currently not using sorting at all in my engine, and I’m really trying to implment this functionality. The idea is to transform all the vertices first, then get the distance of the vertex from the camera position. Then use the closest position in each face to sort the index array.

This would allow me to leave the depth buffer off completely, which is said to be faster as well. However the only realy problem that I’m having is transforming the vertices. Is there method in which I can do the transformation of the vertices in GL, such that the array storing the vertices get changed? (i.e. I want open GL to take a pointer to an array of vertices, then transform it on the video card, like Gl does best, and then either, write those transformed vertices in the same array, or return an pointer to an array which contains the sotred vertices.)

Is that possible? I can currently transform the vertices in my own matrix class which already does all my camera transformations, but it would obviously be faster if GL does it.

cheers

Relic · May 17, 2005, 1:43am

You should not do that!
Current hardware transforms 200 million vertices per second and the depth buffer can reject multiple fragments per clock.
What you should do is coarsly sort your whole objects, not individual triangles, from front to back to get more depth test fail cases and leave the rest to the hardware.

T101 · May 17, 2005, 5:01am

What Relic said.

Presumably you think that it’s a good idea to render your triangles back to front. But that will only be faster if there is not a lot of overdrawing going on.

Apart from that, you need to clip any triangles that intersect other triangles.
BSP algorithms do this, but it’s a bit problematic (expensive to do dynamically) when objects move about.

And if you don’t clip the triangles, the depth buffer is the only way to get depth-correct results.

The alternative is course front to back rendering of objects - the hardware is good at that stuff, and may some coarse visibility culling as well.
If you don’t send an object to the card, the card does not have to cull all of its triangles.
And culling a sphere against a view angle is cheap to do.

BTW you may want to do back to front sorting for transparent triangles (depending on the type of blending).

SNX · May 18, 2005, 7:24am

Yeh, thats currently what I’m doing, but I’m still looking for an alternative. I’m currently using quad tree and oct trees depending on how big my maps are. However I’ve been moving to AR lately so there isn’t much use for maps, which means more than 70% of all my meshes are dynamic. Hence any and every tree is out of the equation .

I allready sort the objects according to their position and material, and is rendering them in the reverse order, but I have an algorithms (radix sort) which can sort 10 000 000 floating points in about 3 seconds, and since I’m only ever going to have about 20 000 verts, it would seem allright to sort them using this algorithm. (and since the algorithm, at the worst case scenario, have a linear complexity.)

Since the depth buffer is peforming a comparision for pixel it writes to the back buffer, and has a limited accuracy at which it can measure depths, this algorithm would most certainly start to seem better. (Since the visible distance of objects in AR is quite far…) Using this algorithm, in theory you should not see any flickering which happen to polygons that is rendered far from the position of the “camera”. The obvious drawback is the fact that the vid card does all the transformations a hell of a lot faster than any code I can write.

But as I said, I’m exploring alternatives here. Hence this is the inspiration behind my question which might seem like a pretty dumb question on every other day

Overmind · May 18, 2005, 10:07am

First off, radix sort is not really linear, it only seems so, but in fact it is worse that N log N if you have much less than 2^32 objects. For “small” datasets (like your proposed 20 000) quicksort is almost certanly faster.

Using your performance numbers, you can do approximately 166 sorts per second if you don’t do anything else than sorting. With a framerate of about 80 FPS this is already nearly 50% CPU utilisation only for the sort :eek:

On the other hand, the depth buffer is hardware accelerated, it costs absolutely nothing on todays hardware. On the contrary, every fragment that is culled with early z test gives you additional performance.

About the problem with limited depth buffer accuracy: There was a thread about this not long ago. It seems a good solution is to “slice” your world and render the slices back to front while using the depth buffer inside the slices. This method only needs the data sorted on the object level, not at the polygon level, and there only an approximation (e.g. a bucket sort with depth ranges, which is truly linear btw )

SNX · May 19, 2005, 3:01am

Well because I’m a tight arss I will profile the two sorting algorithms with a multimedia timer over the weekend if I have some time :>