Using for loop in shaders

Are there any known issues with using a for loop in a shader ?

I am using a for loop in my fragment shader which uses a lot more time than without the loop.

For example: noOfLightSources = 2
Using for loop:

for(int j = 0; j < noOfLightSources; j++)
if(lSource[j].data.y > 0.5) //light is ON
color = color + lSource[j].Intensity;

without for loop:
if(lSource[0].data.y > 0.5) //light is ON
color = color + lSource[0].Intensity;
if(lSource[1].data.y > 0.5) //light is ON
color = color + lSource[1].Intensity;

The for loop takes twice as much the time taken without the for loop.
any clues why this is happening ?

Is it a vertex or fragment shader?

What card is it?

Is “noOfLightSources” a constant in the shader or a uniform variable?

If your card is too old, it has limited support of loop in fragment shaders (it has better support in vertex shader) and the size of the loop has to be a compile-time constant:

its a fragment shader with NVIDIA GeForce 8800 card.
I am updating noOfLightSources, which is an int, with an uniform variable that I pass to the shader in the application code.

I also tried using a constant instead of noOfLightSources, but but my framerate is still slow.

When I do not use the for loop, I get the frame rate of 60-70 milliseconds which shoots upto 200-250 milliseconds when I use the for loop. Also, in both the cases, I did not get errors in shader compilation.

Use nvEmulate to see the asm code that the driver produces. Notice all the loop and loop-preparation (read-index computation) instructions.
Use cgc.exe for direct feedback on how complex your shader is - and find the optimal GLSL/Cg code.

On a GF8600GT, after I depth-pass my scene, and render every pixel with:

//@ILX_COMPILER_ARGS -fastmath -fastprecision

uniform vec4 tintColor : C0;
uniform vec4 soundpos[60] : C1;
uniform vec4 soundVol[60]: C61;

float clamp(float x){
	return max(0.0,min(1.0,x));

void main(){
	float result=0.0;
	for(int i=0;i<60;i++){
		vec4 sndpos = soundpos[i]; 
		vec4 sndvol = soundVol[i];
		float dist = distance(,;
		result += clamp(1.0-abs(dist-sndpos.w)) * sndvol.y;
	gl_FragColor = tintColor*result;

I get 60fps on 1280x720. If I don’t do depth-pass, it’s 20fps.

Is there a reason, you use your own clamp() function instead of the builtin GLSL one?

I use cgc, and when I was typing that shader, the clamp func was missing this time. I had used it before, I guess it’s a different version of cgc, or bad args.

unlike on cpu where unrolling small loops should increase execution time on older gpu its better to use loops for more compact code instead of unrolling them, thus saving more instruction for later calculations. but current shader models support many of gpu instruction so loops are not good or bad, but either how and in what context they are used.

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.