Weird slow down with array index

This makes the program extremely slow!


SOME WHILE LOOP
{
thisNodeIndex = v.r & mask_nodeindex;
if( thisNodeIndex != prevNodeIndex )
{
	subdivision = 0U;
				
	nodeListQueue[queueHead] = subdivision | nodeIndex;	
				
	prevNodeIndex = thisNodeIndex;
				
	queueHead = queueHead + 1;
	queueHead = ( queueHead == 8 ) ? 0 : queueHead;
}
}

fragOutput1 = uvec4( nodeListQueue[0], nodeListQueue[1], nodeListQueue[2], nodeListQueue[3] );
fragOutput2 = uvec4( nodeListQueue[4], nodeListQueue[5], nodeListQueue[6], nodeListQueue[7] );

When I remove that if statement, even if I just remove the two lines that increase the variable ‘queueHead’ and if I just used the initialized value of queueHead the entire loop I get 10+ fps increase. I can’t figure out what the heck is wrong. Can anyone help?

Thanks.

And you do this in a fragment shader ??!

I am not surprised at all, that this is slow.

Jan.

Some points :

  • when working on performance problems, never count with fps ! This is non-linear, as +10 fps means twice faster when going from 10 to 20, and means no increase when going from 500 to 510. Use milliseconds instead, and provide total frame time for comparison.
  • GLSL is implemented differently on different hardware, so it is useful to state what hardware/OS/driver version you are using.
    For example, some implementations unroll loops, some evaluate both branches of conditionals, some do both.
    In this case you may have to rewrite the algorithm differently, avoiding conditionals, to optimize for your hardware.
  • Can you detail “SOME WHILE LOOP” : number of loops ? fixed or dynamic ?

Thanks for the answers.

@ZBuffeR: People write fluid solvers on GPU using shaders, this is half as bad.

I’m not counting on Frames per second, it’s just that I add this one counter queueHead an integer and suddenly it becomes terribly slow. There is obviously something more to it here. And it slows down ONLY if I use the variable to index the array. If I remove the part where I index the queue then it is back to normal even if I’m using the same variable. It is also fast if I index the array with just constants. Don’t know what’s going on. Here is the code and I’ll try to remove as much clutter as possible. I’m using 9800M GTS, Win, 186.81, GLSL 1.30, GL 3.0


int queueHead = 0;
		
while( samplesCollected < numSamples && cout.a <= 1.0 )
{
x = ( coordInCube( newVoxel ) + 0.000001 ) * 0.99999; // Bias for the voxels exactly on the edge
x.y = 1.0 - x.y;
			
distanceFromEye = distance( newVoxel, eyePosition );
			
if( distanceFromEye > 300.0 ) levelAllowedByDistance = 1;
else if( distanceFromEye <= 300.0 && distanceFromEye > 200.0 ) levelAllowedByDistance = 2;
else if( distanceFromEye <= 200.0 ) levelAllowedByDistance = levelOfDetail;

level = 1;
c = ivec3( rootNodePosition, 0, 0 );
v = texelFetch( nodeQueue, c + intPart( x * n ), 0 );
			
constantColor = v.r & mask_constcolor;
			
while( ( v.r & mask_maxsubdiv ) == 0U && constantColor == 0U && level < levelAllowedByDistance )
{
	level++;
				
	nodeIndex = v.r & mask_nodeindex;
				
	c = ivec3( nodeIndex * n, 0.0, 0.0 );
	x = x * n - intPartf( x * n );
	v = texelFetch( nodeQueue, c + intPart( x * n ), 0 );			
				
	constantColor = v.r & mask_constcolor;
}
			
// Compute the coordinates in the brick - only if the node doesn't hold a constantColor
if( constantColor == 0U )
{
	// Find the correct brick queue
	c = ivec3( v.a, 0, 0 );	 // Beginning origin of brick
	queue = int( floor( float(c.x) / 2048.f ) );	//queue tells you which queue the brick is in
	c = ivec3( c.x - 2048 * queue, 0, 0 );
				
	// Find the offset inside the brick
	x  = x * n - intPartf( x * n );
				
	// Get the color value
	cv = getColorWithTF( c + intPart( x * brickSize ), queue );
}
else	// If node is constant, then the alpha holds the constant value
{
	cv = getColorWithTFVal( v.a );
}
			
// If this node is not in the queue, add it
thisNodeIndex = v.r & mask_nodeindex;
/* if( thisNodeIndex != prevNodeIndex )
{
	subdivision = 0U;
				
	nodeListQueue[queueHead] = 0U; // subdivision | nodeIndex;	
				
	prevNodeIndex = thisNodeIndex;
				
	queueHead++;
	queueHead = ( queueHead == 12 ) ? 0 : queueHead;
} */
				
// Integration
cpv = vec4( cv.rgb * cv.a, cv.a );
cout = cout + cpv * ( 1.0 - cout.a );
			
// Ray step LOD
if( distanceFromEye > 400.0 )
{
	actualRayStep = rayStep * 4.0;
	samplesCollected = samplesCollected + 4U;
}
else if( distanceFromEye <= 400.0 && distanceFromEye > 300.0 )
{
	actualRayStep = rayStep * 3.0;
	samplesCollected = samplesCollected + 3U;
}
else if( distanceFromEye <= 300.0 && distanceFromEye > 200.0 )
{
	actualRayStep = rayStep * 2.0;
	samplesCollected = samplesCollected + 2U;
}
else if( distanceFromEye <= 200.0 )
{
	actualRayStep = rayStep * 1.0;
	samplesCollected = samplesCollected + 1U;
}
			
newVoxel = newVoxel + rayDir * actualRayStep;
}

fragOutput0 = cout / cout.a;
fragOutput1 = uvec4( nodeListQueue[0], nodeListQueue[1], nodeListQueue[2], nodeListQueue[3] );
fragOutput2 = uvec4( nodeListQueue[4], nodeListQueue[5], nodeListQueue[6], nodeListQueue[7] );
fragOutput3 = uvec4( nodeListQueue[8], nodeListQueue[9], nodeListQueue[10], nodeListQueue[11] );

I may have tracked it down a bit. I initialize this array in the beginning of the frag shader inside main()


uint nodeListQueue[12] = uint[12]( 2U, 0U, 0U, 0U, 0U, 0U, 0U, 0U, 2U, 0U, 0U, 0U );

And I use its output as this as you can see from the code above:


fragOutput0 = cout / cout.a;
		fragOutput1 = uvec4( nodeListQueue[0], nodeListQueue[1], nodeListQueue[2], nodeListQueue[3] );
		fragOutput2 = uvec4( nodeListQueue[4], nodeListQueue[5], nodeListQueue[6], nodeListQueue[7] );
		fragOutput3 = uvec4( nodeListQueue[8], nodeListQueue[9], nodeListQueue[10], nodeListQueue[11] );

However only thing that is being written to fragOutput1-3 are just zeros even though I don’t modify the array anywhere.

Oh, and it works if I don’t dynamically index it, so if I do something like this.


if(queueHead == 0 )			nodeListQueue[0] = subdivision | nodeIndex;
				else if(queueHead == 1 )	nodeListQueue[1] = subdivision | nodeIndex;
				else if(queueHead == 2 )	nodeListQueue[2] = subdivision | nodeIndex;
				else if(queueHead == 3 )	nodeListQueue[3] = subdivision | nodeIndex;
				else if(queueHead == 4 )	nodeListQueue[4] = subdivision | nodeIndex;
				else if(queueHead == 5 )	nodeListQueue[5] = subdivision | nodeIndex;
				else if(queueHead == 6 )	nodeListQueue[6] = subdivision | nodeIndex;
				else if(queueHead == 7 )	nodeListQueue[7] = subdivision | nodeIndex;
				else if(queueHead == 8 )	nodeListQueue[8] = subdivision | nodeIndex;
				else if(queueHead == 9 )	nodeListQueue[9] = subdivision | nodeIndex;
				else if(queueHead == 10 )	nodeListQueue[10] = subdivision | nodeIndex;
				else if(queueHead == 11 )	nodeListQueue[11] = subdivision | nodeIndex;