GLSL recompile on uniform call

What are my best options when trying to avoid GLSL shader code recompile while setting a uniform, i.e glUniformNx?

Using the NVIDIA instrumental driver the driver seems to recompile shader code every time a call to glUniformNx is discovered.

Should i cache the variables, an only set the uniform when a new value is required? Or are there better options?

With core 3.1 and 3.2, and using of glUniformMatrixfv to batch-upload, I see no recompilations anymore on GF8600 and GTX275.

uniform mat4 vvv[7];
#define u_MVP vvv[0]
#define u_someVec1 vvv[1][0]
#define u_someFloat1 vvv[1][1].x

Values of uniform variables are already cached in the program object as long as the program is not relinked. So, no need to set a uniform as long as its value is unchanged (and program not relinked).

Maybe Uniform Buffer Objects may help.

Here is some pseudo code showing how I use uniform buffer objects to make dealing with Uniforms nicer:


//shader Uniform Block Object declaration
layout(std140) uniform matrix1
{
  uniform mat4 glm_ProjectionMatrix;
  uniform mat4 glm_ModelViewMatrix;
  uniform mat4 glm_NormalMatrix;
};

//C++ side "map" into shader UBO
GLfloat uniformBlock_matrix1[] =
{                  //layout(std140) uniform matrix1
  1.0,0.0,0.0,0.0, //mat4  glm_ProjectionMatrix
  0.0,1.0,0.0,0.0,
  0.0,0.0,1.0,0.0,
  0.0,0.0,0.0,1.0,
  1.0,0.0,0.0,0.0, //mat4  glm_ModelViewMatrix
  0.0,1.0,0.0,0.0,
  0.0,0.0,1.0,0.0,
  0.0,0.0,0.0,1.0,
  1.0,0.0,0.0,0.0, //mat4  glm_NormalMatrix
  0.0,1.0,0.0,0.0,
  0.0,0.0,1.0,0.0,
  0.0,0.0,0.0,1.0,
};
GLuint sizeof_uniformBlock_matrix1 = sizeof(uniformBlock_matrix1);

//convenience map into uniformBlock_matrix1
mat4 &glm_ProjectionMatrix = (mat4&)uniformBlock_matrix1[0]; 
mat4 &glm_ModelViewMatrix = (mat4&)uniformBlock_matrix1[16]; 
mat4 &glm_NormalMatrix = (mat4&)uniformBlock_matrix1[32];

GLuint uniformBlock_matrix1_id; // uniform block matrix1 ID

GLuint shader_id;

void defineUniformBlockObject(GLuint binding_point, const char *GLSL_block_string, GLuint &uniformBlock_id)
{
 // externally used ID returned //////////////////////////////////
 glGenBuffers(1, &uniformBlock_id);
 ////////////////////////////////////////////////////////////////

 //"layout(std140) uniform GLSL_block_string"
 GLuint  uniformBlockIndex = glGetUniformBlockIndex(shader_id, GLSL_block_string);

 //And associate the uniform block to binding point
 glUniformBlockBinding(shader_id, uniformBlockIndex, binding_point);

 //Now we attach the buffer to UBO binding_point...
 glBindBufferBase(GL_UNIFORM_BUFFER, binding_point, uniformBlock_id);

 //We need to get the uniform block's size in order to back it with the
 //appropriate buffer
 GLsizei uniformBlockSize;
 glGetActiveUniformBlockiv(shader_id, uniformBlockIndex,
  GL_UNIFORM_BLOCK_DATA_SIZE,
  &uniformBlockSize);

 //Create UBO.
 glBindBuffer(GL_UNIFORM_BUFFER, uniformBlock_id);
 glBufferData(GL_UNIFORM_BUFFER, uniformBlockSize, NULL, GL_DYNAMIC_DRAW);
}

// just after compile, link, use shader code ie
shader_id = glCreateProgram();

defineUniformBlockObject(1,"matrix1",uniformBlock_matrix1_id);


Set_values_each_frame() {
  // Now you can just set each uniform on C++ side with convenient looking code
  glm_ModelViewMatrix  *= lookAt( vec3(eye_pos),
                                  vec3(0.0, 0.0, 0.0),
                                  vec3(0.0, 1.0, 0.0) 
                                );
  glm_ProjectionMatrix *= perspective(20.0f,w/h,1.0f,21.0f);
  glm_NormalMatrix = transpose(inverse(glm_ModelViewMatrix));
//etc ...
}

Draw() {
  // then in your render function the part that updates the uniform values
  // never has to change, just call the following two lines
  // set your UBO. It sets all at once glm_ModelViewMatrix, 
  // glm_ProjectionMatrix, glm_NormalMatrix ... and whatever you have in your UBO
  glBindBuffer(GL_UNIFORM_BUFFER, uniformBlock_matrix1_id);
  glBufferData(GL_UNIFORM_BUFFER, sizeof_uniformBlock_matrix1, uniformBlock_matrix1, GL_DYNAMIC_DRAW);

  // then draw your VAOs
  glBindVertexArray(vao_id);
  glDrawArrays(GL_TRIANGLES , 0, vao_elementcount);
}

Notice how I have not set any individual uniform values anywhere with glUniformNx. It seems that this might be slower sending the whole block of uniforms at once but I have found in my case that this is actually faster to send a entire block at once rather than using separate glUniformNx calls. So it has made my code easier to deal with and performance has been improved with UBOs.

Note the OpenGL Math Libray library is used to get the matrix math functionality.

Are you by chance using a pre-GeForce 8 card? If so, try GeForce 8+.

Before doing anything drastic, try upgrading your drivers. This issue should be solved in the recent stable releases.

I am largely working under Linux with 190.18; Hardware-wise I am on a NVIDIA 8800 GT.

Nevertheless I am only able to see the actual GLSL internal recompile with the NVIDIA instrumental drivers. The regular driver doesnt spit out any information on any of the tasks its performing.
Theres a ton of extensions related to sending uniforms but I havent yet found anything solid on whats the optimal approach.

Also I would like to be able have the driver do somekind of über shader solution, sending a few flags which should result in any branches as these are uniforms and constant as long as no new values are set. This might have been debated previously but couldnt find a fitting thread.

Theres a ton of extensions related to sending uniforms but I havent yet found anything solid on whats the optimal approach.

No, there are two: the old EXT_uniform_buffer_object, and the new ARB_uniform_buffer_object.

The only reliable method for preventing an overzealous driver from doing recompiles is to put every uniform you can into a buffer object.

Also I would like to be able have the driver do somekind of über shader solution, sending a few flags which should result in any branches as these are uniforms and constant as long as no new values are set.

There is no way to guarantee this. ATI, for example, doesn’t have the recompiling problem, so this solution would fail for them.

So the only good über shader solution essentially is to have multiple programs from the same shader source and do the preprocessing by the hand?

I don’t see why a clever driver couldn’t cache off ubershader state permutations as they’re encountered. With a warm up preprocess (and/or parallel compilation of some sort) you could try feeding every conceivable state configuration (enumeration of permutation in recapitulation) to the driver in the hope that it’ll be stowed for later use - to avoid the first encounter hiccup and such.

Dx11 introduced dynamic linking (seems similar in spirit to NV’s Cg interfaces) to aid in the permutation battle… pretty sassy.

Well, if you are talking about a uniform that determines branch execution, then yes, nvidia will recompile. If you are talking about some value such as diffuse color, model matrix, then it need not recompile.

So the only good über shader solution essentially is to have multiple programs from the same shader source and do the preprocessing by the hand?

Well, I would say that using an uber shader is bad to begin with, but yes. If you’ve decided that this is the way you want to go, the most effective way to make sure that you’re getting the performance you want is to build multiple programs.

I don’t see why a clever driver couldn’t cache off ubershader state permutations as they’re encountered.

Why should they? It seems to me that it causes more problems (like shader recompiling when it isn’t necessary) than it solves. NVIDIA’s recompilation fetish may be great for ubershaders, but it is terrible for anything else.

I am largely working under Linux with 190.18; Hardware-wise I am on a NVIDIA 8800 GT.

Nevertheless I am only able to see the actual GLSL internal recompile with the NVIDIA instrumental drivers. The regular driver doesnt spit out any information on any of the tasks its performing.
Theres a ton of extensions related to sending uniforms but I havent yet found anything solid on whats the optimal approach.

Hmm… recompiling should trigger a massive performance hit then, I have not observed this at all on non-instrumented drivers, i.e. I can safely call:

glUniformMatrix3fv

for a fixed shader LOTS of times, and I do not see stuttering or such.

Out of morbid curiosity, what version do the instrumented drivers report?

P.S. there are several ways to send lofs of uniform data backed by a buffer object to a shader:

  1. Uniform buffer object (as mentioned before) EXT_bindable_uniform/ARB_bindable_uniform

  2. texture buffer objects, EXT_texture_buffer_object/ARB_texture_buffer_object

  3. nVidia only: NV_shader_buffer_load (pointers in shaders where pointers point to data of a buffer object).

From what I remember:
uniform buffer object: good for cache-friendly access, i.e. sequential and such, data set cannot be too big. The core idea of this was data shared across many shaders.

texture buffer objects: good for random access over a large data set.

NV_shader_buffer_load: no limit on data set size, cache friendly access preferred, supposed to be faster than texture buffer object though.

This works fine, so long as you use constants, not uniforms, for the “branch expression” identifiers that you would like the GLSL compiler to compile out of your shader.

Essentially this means providing multiple strings to glShaderSource (it takes an array afterall), one of which you build yourself dynamically containing your “const” declarations which drive ubershader decision points. E.g.:

const int  LIGHTING_EQUATION   = LIGHTEQ_FULL;
const int  FOG_RANGE_MODE      = FOGRANGE_MODE_RADIAL;
const int  FOG_MODE            = FOGMODE_VTX_EXP2;

For some situations, this is the best option.

Though like you, I wish for something more generic from GLSL like “constant uniforms” where you set certain parameters once and explicitly tell the compiler that it should compile these out of the shader if possible (i.e. treat them as constants, not frequently-changeable uniforms). Something like Cg’s LITERAL parameters. That would save this “dynamically generate GLSL source code” silliness that is necessary with GLSL (AFAIK) to effectively implement ubershaders.

Though it’s possible this is a GLSL feature already implemented that I don’t know about… Anyone?

über shaders or not;

What I want is for a small/simple pixel shader to have a few deviating paths which I can control without hitting a penalty due to internal recompilation of source. Having multiple shaders doing the almost same thing is fine - I dont have buzz with that, its the C++/interfacing with GL/GLSL thats making me wish to pack as much into the shader as possible.

At this point I am kind of split - it would be super easy to do the string magic by hand on the other side I am close to dropping the über shader thoughts and just add the extra shaders.

Both NV_shader_buffer_load and ARB_bindable_uniform looks good from my end of the line. Dont think the NVIDIA instrumental driver supports those right now?