I was wondering if it makes sense to add a layer to my renderer, that processes all state changes and tries to reduce them.

I mean, my app cannot know in which state OpenGL is, when a particular piece of code is executed, so it sets all necessary states, even though many will be redundant.

Now i could cache all state changes and find out which states have actually changed. And immediatly before rendering some geometry i could then send all necessary state changes to the driver.

The question is, if drivers don’t actually do this themselves. It’s much work to implement this for me and therefore i’d like to know, if it’s worth doing.

When i talk about states, i mean especially stuff like current color, blending modes, polygon fill mode, face culling mode, write masks, depth-test mode and so on.


Over the years I have seen a fair bit of literature about this, might be worth you doing some googling.

What I’ve gathered is that state changes that will have a high cost will tend to be tested for redundancy by the driver*. For less performance crucial states the driver may not incur an overhead by testing for redundancy as the application developer is likely to be able to optimise more efficiently for them.

Whether you or the driver test for redudant state changes, the most important thing is presenting data in an order that requires fewer state changes. Ok, that’s obvious, but once you’ve done that, and if your data structures are suitable, tracking the states that you think could have the largest hit or are most frequently redundant should be fairly quick and easy.

It’s quite a broad topic, sorry if these basics aren’t exactly news. I don’t think I can prvovide a useful answer to whether it’s worth it, it really depends on many factors and the specifics of your app. You might want to prototype something on your current app and a typical platform and see if you get any benefits. You could also write an indipendant profiler for state change cost. You could then use its results to weigh states in optimising data to reduce state changes. You could even ship it with your app or integrate it.

I wouldn’t bother testing for something like redundant colour changes, it’s not uncommon for colours to change per-vertex so it’s unlikely it is expensive. This sort of reasoning can help.

  • I’ve even seen an ATI driver apparently ignoring state changes to the specular exponent (renowned to be expensive in certain implementations that calculate tables) unless a large quantity of geometry (or perhaps estimated fragments if it was to make any sense) was drawn with that exponent set. This was terribly annoying and it took me a while to figure out what was going on (and there was nothing I could do about it).

Well, of course i do batch my geometry as good as possible. However, the order in which the batches appear is fairly random (because of many optimizations, which means that many batches are culled and therefore you can’t predict what’s next or what was previously).

I do try to only change states that are likely to have changed, but then again, i have to be very conservative, because too aggressively ignoring certain states will lead to errors in some circumstances.

I always change a whole bunch of states between each batch. That means it’s often the case that i change the shader, bind 3 different textures, set a different color (i don’t use vertex-colors at the moment, i only need one color per object) etc.
So my geometry batches are as large as possible.

Anyway, thanks for your insights, maybe i will just need to test it. However, that could be quite difficult.


I was referring to sorting your batches, not the geometry. i.e. sorting shaders to minimise state changes between them. (by shader I mean a vector of material-related states for a surface rendering pass).

You could start with one shader and estimate which shader would give the smallest state change cost and put it up next, repeat until there’s no more left. It’s not perfect but it’s a whole lot better than nothing. Even if entire batches are culled you will still frequently have shaders appearing in the sequence you set and you will anyway end up with state coherence across more adjacent shaders.

You would never ignore a state change unless you explicitly define that is already set to the state you want. This usually means abstracting the state changes, in many cases it can be worth doing and the overhead is minimal. You can often batch states into a single abstracted state if your shaders behave accordingly or just predictably.

Minimising bound texture changes and generally trying to keep textures hot can be very important if you can’t fit them all in gfx ram.

Madoc, do you have any clue whether the following method is good when all textures don’t fit in gfx ram :

frame0 :
texA(VRAM) , texB(VRAM) (no more VRAM left), texC (AGP->VRAM, replace texA);
frame1 :
texC (VRAM), texB(VRAM) (no more VRAM left), texA (AGP->VRAM, replace texC);

By using reverse order, it should be more efficient (provided a basic LRU cache) ? Anybody confirms ?

Well, it makes perfect sense so I would say yes. Quite clever actually, never heard it suggested before. I guess that even if you can’t re-sort everything it would still increase the chance of textures staying hot. Not sure if LRU is always used though, there’s probably more sophisticated approaches involved.