Available number of texture instructions exceeded

I am having a problem with a machine that has an ATI Radeon X300/550 series graphics card.

I am using a noise function that has 4 texture2D() calls to get each of the 4 gradients round the pixel. Anyway my procedural texture requires 3 calls to the noise function since I require a different noise frequency, offset and octave placed into 3 different variables but when I do I get this error.

The GLSL vertex shader will run in software due to the GLSL fragment shader running in software. The GLSL fragment shader will run in software - available number of texture instructions exceeded.

Making it go into software mode just slows the whole thing down to 1 fps but if I remove one noise call it works fine:

The GLSL vertex shader will run in hardware. The GLSL fragment shader will run in hardware.

Is there any way of knowing in advance how many texture instructions a graphics card can take? Since this will be useful in the future.
At the moment I am just going to have to figure out a way to get the same look without using that third noise call.

Thanks

Look at the older ARB_fragment_program extension.

Thanks I found this:

The limit on fragment program texture instructions can be queried with a <pname> of MAX_PROGRAM_TEX_INSTRUCTIONS_ARB, and must be at least 24. Each texture instruction in the program (matches of the <TexInstruction> grammar rule) counts against this limit.

Since I am using JOGL I ran this line to find out how many the computer had.

System.out.println("Tex instruc "+gl.glGetString(GL.GL_MAX_PROGRAM_TEX_INSTRUCTIONS_ARB));

Sadly that just returned null.
Out of interest what are texture units?
Since I was able to find out that it has:
8 Texture units
16 Texture units FS
0 Texture units VS
0 Texture units GS

Has that anything to do with my problem?
Thanks again for your help so far.

You need to use the glGetProgramivARB function to query that value.

Out of interest what are texture units?
Since I was able to find out that it has:
8 Texture units
16 Texture units FS

That term is little overloaded in the OGL as there are basically three types of those in the API.

[ul][li]Texture image units - those are points to which you attach textures for use in shaders (when you set value for GLSL sampler uniform, the value is index of one from those units. 16 from your list). Shader can not simultaneously access more than this number of different textures.[]Texture coordinate sets - sending texture coordinates to the vertex shader without using the generic attributes (corresponds to the gl_MultiTexCoord0 attributes in the GLSL).[]Conventional texture units (8 units from the list) which have both texture image and texture coordinates and also additional functionality like texgen, texture matrices and texturing environment.[/ul][/li]From the API point of view they are partially overlapped so in your case first 8 units accessible trough the glActiveTexture are full featured while the additional 8 texture units have only the texture image.

Has that anything to do with my problem?

For purpose of texturing from GLSL shader you are interested in the texture image units. They tell you how many different textures you can simultaneously access from the shader using the texture2D() call. You can sample each from those textures multiple times at multiple different coordinates.

The MAX_PROGRAM_TEX_INSTRUCTIONS_ARB value basically (simplified) tells you how many texture2D() calls can be done by the shader no matter what texture they sample (it was meant for different shaders so the GLSL compiler might generate more or less real sampling instructions). For the X300 hw the value is 32 so you shader can not read more than that number of samples from the textures. Because your hw does not support loops, they will be unrolled. So if there is texture2D() call within a loop, it will be repeated.

Thank you very much for that detailed response. I wanted to reply earlier but I had to suddenly focus my time on something else.

I can confirm that GL_MAX_PROGRAM_TEX_INSTRUCTIONS_ARB does indeed return 32 but what I have trouble understanding is why after 12 calls to texture2D() that it starts to complain and go into software mode.
For example here I essentially use two calls to noise() and it has no problem with it.

varying vec2  MCposition;
float noise(vec2 pos);
void main() {
    float lol  = noise(MCposition);
    float lol2 = noise(MCposition+0.2);
    float lol3 = noise(MCposition+0.3);
    float lol4 = noise(MCposition+0.4);
    
    float foo  = (lol + lol2 + lol + lol) / 4.0;
    
    gl_FragColor = vec4(vec3( foo ), 1.0);
}

If I then add a third variable of noise then it just falls over and goes into software mode.

varying vec2  MCposition;
float noise(vec2 pos);
void main() {
    float lol  = noise(MCposition);
    float lol2 = noise(MCposition+0.2);
    float lol3 = noise(MCposition+0.3);
    float lol4 = noise(MCposition+0.4);
    
    float foo  = (lol + lol2 + lol3 + lol) / 4.0;
    
    gl_FragColor = vec4(vec3( foo ), 1.0);
}

As you can see the noise call simply uses 4 texture2D() calls which are used without the use of any for loops.

float noise(vec2 P)
{
  vec2 Pi = ONE*floor(P)+ONEHALF; // Integer part, scaled and offset for texture lookup
  vec2 Pf = fract(P);             // Fractional part for interpolation P-floor(P) (0 - 1)

  // Noise contribution from lower left corner
  vec2 grad00 = texture2D(permTexture, Pi).rg * 4.0 - 1.0; //(-1, 255, 511)
  float n00 = dot(grad00, Pf);//()

  // Noise contribution from lower right corner
  vec2 grad10 = texture2D(permTexture, Pi + vec2(ONE, 0.0)).rg * 4.0 - 1.0;
  // Compute the dot-product between the vectors and the gradients
  //DotProduct = (x1*x2 + y1*y2)
  float n10 = dot(grad10, Pf - vec2(1.0, 0.0));

  // Noise contribution from upper left corner
  vec2 grad01 = texture2D(permTexture, Pi + vec2(0.0, ONE)).rg * 4.0 - 1.0;
  float n01 = dot(grad01, Pf - vec2(0.0, 1.0));

  // Noise contribution from upper right corner
  vec2 grad11 = texture2D(permTexture, Pi + vec2(ONE, ONE)).rg * 4.0 - 1.0;
  float n11 = dot(grad11, Pf - vec2(1.0, 1.0));

  // Blend contributions along x
  vec2 n_x = mix(vec2(n00, n01), vec2(n10, n11), fade(Pf.x));

  // Blend contributions along y
  float n_xy = mix(n_x.x, n_x.y, fade(Pf.y));

  // We're done, return the final noise value.
  return n_xy ;
}

Check your MAX_PROGRAM_TEX_INDIRECTIONS_ARB. Then understand what an indirection is.

That card only supports 64 ALU instructions. It is possible that you are running out of those instructions instead of the texture ones and the error report is just misleading. When I tested your code with the GPU Shader Analyzer for the R9700 hw which has the same shader capabilities as the X300 and with fade function set to return its parameter, the shader had 57 ALU instructions when the two noise samplings were used.

Do they simulate the noise instruction with many ALU instructions? I guess that is what is happening and he is actually maxing out the ALU instruction limit.

V-man, look at his source-- he isn’t using GLSL’s noise function (which ATI does not implement.) He is implementing his own noise function using texture lookups.

Exceeding ALU instructions is a possibility, but it is far more likely that he is exceeding texture indirections. ATI only supports 4.

His code does not have such deep dependency chain. The only thing adding additional level of indirection (for total indirection of 2) is calculation of the texture coordinates.

Yes, but indirections are counted in stages. If the driver is not able to flatten and reorder the calculations for texcoord temps, then each call to his noise() will be a new indirection stage, and he’ll fall back after 4 stages.

That is possible however the shader had 57 ALU instructions and 2 indirections with two uses of noise() function and simple content of the fade function. The number of indirections was 2 even with only one use of the noise function (I do not remember exact number of the ALU instructions in that case however it was something around 40). While the driver might be unable to keep the number of indirections on 2 for third use of the noise(), it seems unlikely to me that it would add more than one indirection so it would be still within the limit.

Using this:
gl.glGetProgramivARB(GL.GL_FRAGMENT_PROGRAM_ARB, GL.GL_MAX_PROGRAM_TEX_INSTRUCTIONS_ARB, value, 0);
Returns 31, one less than INSTRUCTIONS and here are what I have tested so far:

GL_MAX_PROGRAM_ALU_INSTRUCTIONS_ARB 128
GL_MAX_PROGRAM_TEX_INDIRECTIONS_ARB 31
GL_MAX_PROGRAM_TEX_INSTRUCTIONS_ARB 32

I thought I would do some more testing to see if it really was because of the texture2D() calls, so I commented out the calls and in their place put in the same calculations that were done inside the texture2d() call. So When I used this noise function:

float noise(vec2 P)
{
  vec2 Pi = ONE*floor(P)+ONEHALF; // Integer part, scaled and offset for texture lookup
  vec2 Pf = fract(P);             // Fractional part for interpolation P-floor(P) (0 - 1)

  // Noise contribution from lower left corner
  vec2 grad00 = vec2(0.5)/*texture2D(permTexture, Pi).rg*/ * 4.0 - 1.0; //(-1, 255, 511)
  float n00 = dot(grad00, Pf);//()

  // Noise contribution from lower right corner
  vec2 grad10 = (Pi + vec2(ONE, 0.0))/*texture2D(permTexture, Pi + vec2(ONE, 0.0)).rg*/ * 4.0 - 1.0;  // Compute the dot-product between the vectors and the gradients
  //DotProduct = (x1*x2 + y1*y2)
  float n10 = dot(grad10, Pf - vec2(1.0, 0.0));

  // Noise contribution from upper left corner
  vec2 grad01 = (Pi + vec2(0.0, ONE))/*texture2D(permTexture, Pi + vec2(0.0, ONE)).rg*/ * 4.0 - 1.0;
  float n01 = dot(grad01, Pf - vec2(0.0, 1.0));

  // Noise contribution from upper right corner
  vec2 grad11 = (Pi + vec2(ONE, ONE))/*texture2D(permTexture, Pi + vec2(ONE, ONE)).rg*/ * 4.0 - 1.0;
  float n11 = dot(grad11, Pf - vec2(1.0, 1.0));

  // Blend contributions along x
  vec2 n_x = mix(vec2(n00, n01), vec2(n10, n11), fade(Pf.x));

  // Blend contributions along y
  float n_xy = mix(n_x.x, n_x.y, fade(Pf.y));

  // We're done, return the final noise value.
  return n_xy ;
}

It confirmed to me that the error message had nothing to do with the texture2D() calls, despite having none it gave out the same error message:

Link successful. The GLSL vertex shader will run in software due to the GLSL fragment shader running in software. The GLSL fragment shader will run in software - available number of texture instructions exceeded. Validation successful.

Although if I simply commented out the texture2D() call without replacing it with the same amount of calculations that are done in the first place I am able to do 3 noise calls without a problem but fails on 4.
I thought I would also paste in my fade() method as that too is rather mathematically expensive.

/*
 * The interpolation function. This could be a 1D texture lookup
 * to get some more speed, but it's not the main part of the algorithm.
 */
float fade(float t) {
  // return t*t*(3.0-2.0*t); // Old fade, yields discontinuous second derivative
  return t*t*t*(t*(t*6.0-15.0)+10.0); // Improved fade, yields C2-continuous noise
}

Would this be a driver problem then or I am I simply doing too many calculations that the card can handle?
Thanks again

The more important values are those provided by the NATIVE values which more precisely describe real limits of the hw.

Would this be a driver problem then or I am I simply doing too many calculations that the card can handle?

The most likely reason is that the shader really does too much calculations for that type of the card. You can try to move some calculation into the vertex shader however I am not sure if that will get you sufficient amount of instructions for all four random values. You can also try rewrite parts of the code as vector operations so the driver has higher chance to map them better to vector instructions provided by the hw. Or you can encode some functions into additional textures.

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.