State change cost?

I’ve searched the archives on the topic of redundant state changes and found a very cool thread on guarding against redundant state changes.

So, I’v started writing a state sorting system. I query my octree for all ‘in view’ leaves. Produce a render list and the idea was to sort the list, however I’ve found no information on what render states should be given priority in my sorter.

I have no idea as to how I might do this dynamically(but it does sound like a forward thinking idea, change state & time it?) so I guess I’ll just hard code the order.

Does anyone have a guestimate preference for order of state changes? In include the active lights as states as well, if two entities use the same light it should in theory need no extra changes either.

So my list will start like this sorting states in this order:

  1. Enabled lights
  2. Texture binding
  3. Material
  4. Err… This is where I run out of idea’s.

My engine is designed so the each ‘RenderState’ is represented by a small class, like ‘Texture’ that can be sent to the renderer to set the texture. Some o the classes will have multiple grouped properties, like sending a material change will do this:

glMaterialfv(g_Sides[a_Object.GetSides()], GL_EMISSION, a_Object.GetEmissive());
glMaterialfv(g_Sides[a_Object.GetSides()], GL_AMBIENT, a_Object.GetAmbient());
glMaterialfv(g_Sides[a_Object.GetSides()], GL_DIFFUSE, a_Object.GetDiffuse());
glMaterialfv(g_Sides[a_Object.GetSides()], GL_SPECULAR, a_Object.GetSpecular());
glMaterialf(g_Sides[a_Object.GetSides()], GL_SHININESS, a_Object.GetShininess());

Not sure about how evil that is. I guess I could seperate the states out in to individual classes.

What are your thoughts so far?

It’s probably a good idea to sort your objects in roughly front to back order. That way the z-buffer kills occluded pixels which ought to save on precious memory bandwith. What states that are important depends on both hardware and how your app behaves. If you e.g. send very high poly meshes with a single texture, sorting by texture probably isn’t a big win. And if you have lots of overdraw it’s probably a good idea to draw in front to back order like I mentioned above. Back when people did their own clipping and occlusion and sent the aquired list of triangles to the graphics card, sorting by texture was very important since these renderers worked on a single triangle at a time sort of. Switching texture per triangle is obviously bad. However if you have high poly meshses with complex shaders you probably won’t share that much textures anyway. You’ll just have to benchmark it.

I can easily sort aggregate objects in front to back order with the system I have now. I’m interested however in what it’ll buy me in terms of performance… When a tri draws to a part of the screen that is sort of occluded (the way a beam tree works) does the z buffer stop the card from wasting fill?

Right now I’m probably bus limited(no LOD high poly models). I’m currently drawing around 4 million tris(GF2) without using VAR (as I havent written a good memory allocator yet). In the future however I plan on have masses of objects, with stripped static LOD meshes sitting in AGP ram, so I was betting that eventially I’d be clobbered by the state changes.

I don’t know if it’s relevant but I was hoping to implement fullunique texturing on the terrain. So that’ll prolly throw in a whole bunch of texture changes, that that I will bother sorting the tiles, as their all unique textures…

Still keep to hear any opinions or any other information anyone is willing to share.

Many thanks…


If I’m not mistaken the idea is this:each fragment is first tested against the z-buffer and if it passes it’s written to the framebuffer.So if you order back to front all the occluded(eventually) geometry is written to the fb and then later overwritten by other fragments.Front-to-back writes the occluding fragment before the occluded one and so the latter doesn’t have to be written,wich saves time.Please correct me if I’m wrong.

Modern cards (I e GF3 and up, I believe) have an early Z test feature, where all the texture fetching and coloring which happens before Z test in the GL pipeline, can actually be discarded if it can prove early that the fragment will be rejected.

Also, most modern cards employ some sort of hierarchical Z. I bet they can actually discard whole slabs of fragments (8x8 or so?) in a single cycle if the situation is right.

Thus, modern cards (who need it the least) get the equivalent of a fill rate boost when you draw front-to-back.

there r some pdf’s at nvidia (might be d3d ones) that tell u in what order u should sort things eg binding texture first (then mybe materials?)
anyway u wont be able to get a perfect sort (todays apps usually have a heap of different states)
also u shouldnt need to sort the scene inapp (but in the scene compilation phase)
though if u use front back sort u will need to sort inapp