texture & shader binding

Jeff_Russell · September 21, 2004, 9:59am

Hello, quick question for you all:

Should I bother to keep from redundantly binding textures and shaders? I realize that it makes sense to group batches by texture and shader objects, but is there any great harm in binding the same texture or shader multiple times consecutively? The only reason I ask is that with multitexturing guarding against redundant binds becomes a hassle.

Any advice on this or optimizing glstate like this in general?

147-2 · September 22, 2004, 12:38pm

I would suppose that doing that is just as bad as changing to different configurations, it really depends upon the way the API works, if it checks to see if you are changing to the same texture/shader, then there should be no problem, but since such checks hinder performance, I doubt they are done… On the other hand, I can’t see it being such a pain putting a “loaded” boolean into your shader/texture object and checking it.

imported_jwatte · September 22, 2004, 4:28pm

Re-binding the same shader MAY be significantly costly. Less so for texture.

However, guarding against redundant binds is very easy:

MyTexture * currentlyBoundTexture[NumTextureUnits];

void BindMyTexture( MyTexture * t, int u ) {
  if( currentlyBoundTexture[u] != t ) {
    currentlyBoundTexture[u] = t;
    glActiveTexture( GL_TEXTURE0 + u );
    glBindTexture( t->glFormat(), t->glId() );
  }
}

MyTexture::~MyTexture() {
  for( int i = 0; i < NumTextureUnits; ++i ) {
    if( currentlyBoundTexture[i] == this ) {
      currentlyBoundTexture[i] = 0;
    }
  }
}

Same thing for fragment programs. Just call your own texture bind function instead of the gl one. Avoiding the extra call is really quite simple.

However, what you really want to do, is queue up all your drawing in a big list/vector, and then sort by state, so you draw everything using a particular fragment program first; then within that, sort on vertex program; then within that, sort on vertex buffer object; then within that, sort on main diffuse texture (for example). This is a little more work, but also can have a nice pay-off.

Kaldaien · September 24, 2004, 11:31am

Binding shaders can be more expensive. As I recall, there’s a penalty for instruction cache. If you’re constantly swapping shaders, you’re probably incurring major cache thrashing… Because most GPUs (Shader Model 2.0 or older) have relatively small instruction caches. This may have changed in newer GPUs, I can’t say for sure? But there’s also potentially complicated pipeline setup whenever you change shaders. So you may want to consider batching your surfaces sorted by shader first, then texture, then individual state changes…

Also, I’d suggest encapsulating all of that stuff into a backend. It’s simpler than it sounds actually. For starters you can simply have it keep track of what objects are bound and encapsulate the bind/unbind operations. When you make the backend call to bind something and it’s already bound, you can simply return.

The concept really covers a much more broad basis though, for instance in my engine the backend encapsulates the matrix/attrib stacks, texture states (per-texture unit), number of requested state changes (grouped by state type: raster, frame buffer, polygon, texture, etc…) and the number of actual states changed (that is, the requested state change wasn’t redundant and something really did change), primitive count, vertex count and in special build configurations you can selectively time individual API calls or types of API calls, etc…

Since it encapsulates most of the OpenGL API, it’s really easy to track down batch generation that’s not very pipeline friendly (i.e. in your case, binding the shader constantly). And redundant pipeline changes are virtually cost free, since they don’t make it farther than the backend which determines there’s no need to notify the API. A good driver should be smart enough to do this on its own, but I find it’s safer to assume the driver is brain dead

If you start out simple and grow, eventually the concept of encapsulating the API becomes first nature. It opens up a whole world of debugging and optimization possibilities once you get into the groove

Here’s a typical backend interface for binding something… L3D_RequestStateChange_Texture and L3D_FinishStateChange_Texture are macros, they count the number of requested state changes and the number of actual state changes respectively. In certain build configs they may also perform performance timings. You can see how if the texture’s already bound, the backend bails out before any API calls are made… Which makes redundant state changes fairly inexpensive (all that happens is the request count increases and a couple of conditionals are evaluated - all inline).

INLINE
void
l3dRenderBackend_GL::L3D_BindTextureCube (int tex_id)
{
  L3D_RequestStateChange_Texture ();


  // If no texture unit is selected, default to 0...
  if (gl_state.current_tex_unit == -1)
    L3D_SelectTextureUnit (GL_TEXTURE0_ARB);

  if (gl_state.current_textures [gl_state.current_tex_unit].handle == tex_id)
    return;

  gl_state.current_textures [gl_state.current_tex_unit].handle = tex_id;
  gl_state.current_textures [gl_state.current_tex_unit].type   = GL_TEXTURE_CUBE_MAP_ARB;

  glBindTexture (GL_TEXTURE_CUBE_MAP_ARB, tex_id);


  L3D_FinishStateChange_Texture ();
}

It works great if all of your code uses the backend instead of directly making OpenGL API calls. If you have code that can’t be changed in your project, make sure to clear the backend states before using the offending code

Jeff_Russell · September 24, 2004, 3:00pm

Very useful advice, thanks everybody.

I’ll be sorting batches by shader anyway due to the nature of my engine, and I guess I’ll just go ahead and keep an array of texture ids for all the channels I’m using. Sorting “by texture” is pretty impossible for me since each material has at least 4-5 textures in it; some will vary from material to material, some will not. So I’ll just group by material for now.

Does anyone know of an algorithm for simple grouping of like entities, disregarding any ordering between groups? Sorts will do this but they are generally O(n * log n ) and they solve a more sepecific problem than I need. Maybe a modified radix sort on material pointers would be the quickest?

I realize this operation isn’t a huge performance limiting factor but I’m interested nontheless.

Kaldaien · September 25, 2004, 9:35am

The solution I myself use is actually to sort the materials at load time… My real motivation for doing it this way is to avoid expensive depth sorting as much as possible. Initially it simply began as sorting opaque materials first, then blended, additive and finally anything that should always be in the front of the scene (this includes things like text overlays, your GUI, screen flashes, etc…).

But later on it turned out I could also sort the individual opaque/translucent materials at load-time to minimize pipeline state changes. If you sort properly at load-time you can take advantage of opportunities to disable depth writes and other render operations that aren’t absolutely essential. I try to group all depth write disabled passes together rather than constantly turning depth writes on and off, for example

It also helps a LOT with early Z rejection if you maintain a list of opaque (as defined by the material) batches separately from translucent and nearest (again, stuff like the GUI and what not).

If you give each material a “sort priority”, the system comes very cheap. I use a hash of materials that’s sorted at load-time. When I fill the render batch list, I run through each of the materials in the order they appear in that hash and add any batches using that material. And they’re automatically sorted that way.

    ...
    L3D_SortShaders ( opaque,       Opaque     );
    L3D_SortShaders ( sky,          Sky        );
    ...
    L3D_SortShaders ( foliage,      Foliage    );
    L3D_SortShaders ( under_water,  Underwater );
    L3D_SortShaders ( additive,     Additive   );
    L3D_SortShaders ( nearest,      Nearest    );

You can see how any sky materials (i.e. skydome, sun, moon, etc…) are sorted (inserted into the material hash) AFTER the opaque materials. This lets anything opaque occlude the sky so that it doesn’t eat massive fillrate