Constant register limit exceeded..

Hey, I have this vertex shader on a Quadro 5600 (Max uniforms 4096).


#version 120
uniform float uInstanceTx[1200];
uniform float uInstanceTy[1200];
#extension GL_EXT_gpu_shader4 : enable
vec4 instanceXForm(vec4 pos) {
   pos.x += uInstanceTx[gl_InstanceID];
   pos.y += uInstanceTy[gl_InstanceID];
   return pos;
}

vec3 instanceXForm(vec3 vec) { return vec; }


vec4 deform(vec4 pos) {
   pos = instanceXForm(pos);
   return pos;
}

vec3 deform(vec3 vec) {
   vec = instanceXForm(vec);
   return vec;
}

/*********TOUCHDEFORMPOSTFIX*********/

uniform vec4 uAmbientColor;
uniform vec4 uDiffuseColor;
uniform vec3 uSpecularColor;
uniform float uShininess;

varying vec4 vLightVec[1];
varying vec4 vCameraVector;
varying vec4 vNorm;

void main()
{
   // First deform the vertex and normal
   // if we arn't using deforms the deform() function won't do anything.
   vec4 objSpaceVert = deform(gl_Vertex);
   vec3 objSpaceNorm = deform(gl_Normal);
   vec4 camSpaceVert = gl_ModelViewMatrix * objSpaceVert;
   gl_FrontColor = gl_Color;
   gl_BackColor = gl_Color;
   gl_FrontSecondaryColor.rgb = gl_SecondaryColor.rgb;
   gl_BackSecondaryColor.rgb = gl_SecondaryColor.rgb;

   // This is trick to avoid using gl_FrontFacing, which isn't supported on some
 cards
   // In the pixel shader, gl_SecondaryColor.a will be 1.0 if we are on a front
face
   // and 0.0 if we are on a backface
   gl_FrontSecondaryColor.a = 1.0;
   gl_BackSecondaryColor.a = 0.0;

   vec3 camSpaceNorm = normalize(gl_NormalMatrix * objSpaceNorm);
   vNorm.stp = camSpaceNorm.stp;
   vec3 cameraVec = -camSpaceVert.xyz;
   vCameraVector.stp = cameraVec.stp;
   vec3 lightVec;
   lightVec = vec3(gl_LightSource[0].position - camSpaceVert);
   vLightVec[0].xyz = lightVec;

   gl_Position = gl_ModelViewProjectionMatrix * objSpaceVert;
}

I get the compile error:
(0) : error C6007: Constant register limit exceeded; more than 1024 constant registers needed to compiled program

It seems like 507 is the longest array length I can use for my 2 uniform float arrays for it to compile correctly.
Am I running into a uniform limit, or some other hardware limit here? It seems like I should have enough uniforms, no?

Thanks for any insight

To maybe ask this slightly differently. I’m trying to figure out how big I can make those uniform arrays (programmically using querys) without getting a compile error. I thought I could count the uniform usage and subtract that from the MAX_UNIFORMS for the card, but that doesn’t seem to work.

uniform float uInstanceTx[1200];
uniform float uInstanceTy[1200];

Ouch! :slight_smile: When you reach such a amount of uniforms, I think you need to think again about your implementation. Even if this would compile it would run so slow… Store your uniform data into a texture if possible and give it to your shader as a simple sampler.

Is there a max uniform array size I should stay below then?

In the OpenGL spec page 276, see Table 6.38. “Implementation dependent values”
Array of n uniforms are no more than n uniform variables stored consecutively. You won’t get an infinite uniform namespace creating just one array with as many uniforms as you want.

even if this monster shader would run in couple years you could use some fallback shaders with lesser array size, if your first shader fails to compile on current hardware.

Texture buffer is your friend here!

Sorry, which spec is this? I searched for this text in the specs for 2.1, 3.0 and 3.1 along with all of the GLSL specs, and can’t find it. Whatever queriable constant that is sounds like what I want. Can you tell me where you saw this?

even if this monster shader would run in couple years you could use some fallback shaders with lesser array size, if your first shader fails to compile on current hardware.

Yup this is exactly what I want to do. I’m just trying to figure how find the max array size without doing a loop compiling a shader over and over again until the array size is small enough to work.

Texture buffer is your friend here!

I’m open to this, but in the case of passing matrices, wouldn’t the texture buffers be slower than uniforms?

Thanks for all the responses

Sorry, which spec is this? I searched for this text in the specs for 2.1, 3.0 and 3.1 along with all of the GLSL specs, and can’t find it. Whatever queriable constant that is sounds like what I want. Can you tell me where you saw this?

Get MAX_VERTEX_UNIFORM_COMPONENTS. See page 289 in the Opengl 3.1 pdf or 276 according to the page numerotation and you have a link at the end of the table to the section 2.11.4 for more information.

I’m open to this, but in the case of passing matrices, wouldn’t the texture buffers be slower than uniforms?

I want to say, try and you will see. But I am not aware of your hardware capabilities. I am not sure that Vertex Texture Fetch is supported on a Quadro 5600.

If it is possible it would be clearly faster using a texture.

Sorry, I hope I’m not being dense here…
MAX_VERTEX_UNIFORM_COMPONENTS is the constant I’m already using. It’s 4096 on this card (Quadro 5600 is like a Geforce 8000 series btw). I’m not above this limit just by looking at the shader code. So I’m hitting some other hardware limit here. I just don’t understand how to figure out this limit in a way that will work on any card (by querying the card at runtime).

Ok, I see. The OpenGL spec says that this is a implementation-dependent constant. Perhaps it does match exactly the hardware capabilities.

The program object has to store many other constants too, I do not if it could have an impact on the uniform numbers…

How do you compile your shader? Have you tried to compile with the Cg compiler with high-end profiles like vp40 for vertex shader?

Anyway, in some way or other your vertex shader is exploding the hardware limits and as it seems to support SM4, you should be able to do VTF which will be a lot more flexible than uploading thousands of uniforms.

My guess would be, that the float uniforms use up more space than you expect.
Suppose the driver only uses vec4 uniforms so each float uniform uses actually 4 uniform components:

The math:

(507+507)*4 = 4056 + 16 = 4072 uniform components.

float arrays of 508 should also work but there are some other constants the driver needs to store as well.

Try packing your floats into vec4 uniforms.

“Suppose the driver only uses vec4 uniforms so each float uniform uses actually 4 uniform components”
seems to make sense actually. recall in Direct3D you can only use Vec4 vectors no matter if for Vec2 or Vec3. in the later you just skip the .zw or .w simply but always Vec4. don’t know in case of floats though.

Yes I remmber now that when I used pix to debug a direct3D program, I noticed that shader processors have only 4-float sized constant registers.

And unless you pack your float uniform data in a vec4 to save space, each uniform take one register since D3D implementation seems to not allow you to retrieve a constant location inside a register but only a register index.

I don’t know if all this is handled the same way using glsl but your issue tends to make me say that it is.
In addition, def’s calculation, seems to confirm that.

One thing you may try to do relative to this observation is to create a vec4 array instead of a float one. This way you would be able to use a 4x less big array.

Ok this makes sense. Thanks for all of the help