Probable nVidia GLSL compiler bug


Does the discard keyword imply a return? If the answer is yes here, I have a bug in the nVidia GLSL compiler.

It’s very hard to package a small repro. This is the best I could do so far. My apologies.

The bug occurs when you replace //WORKING1 or //WORKING2 with //NOTWORKING.
It causes an application freeze, or sometimes a BSOD on XP, or a driver recovery in Windows 7.
I expect the NOTWORKING, WORKING1 and WORKING2 code to behave the same.
But somehow, ‘discard’ doesn’t seem to do its job: execution seems to continue.

I would appreciate if somebody could have a look to check if I am not going nuts…

A few remarks:

  • please ignore the crap at the beginning of main(), above the first ‘z’ loop. These are not the culprit here.
  • also ignore the ‘problem’ flag and its management. I wrote this to try to detect odd code behavior.
  • you know something goes wrong during execution with shader performance: discard seems to continue execution, and as a result execution is very slow in my case.

flat in int segmentsPerScanLine;
flat in int primitiveNum; // always 4096

in vec3 normalDir;
in vec3 viewDir;
flat in vec4 finalColor;

void main(void)
	bool shouldDiscard = false;
	bool shouldExit = false;
	bool problem = false;
	int bufferSize = primitiveNum; // & 0xFFFF;

	int ttStartOffset = segmentsPerScanLine;
	float fbufferSize = float(bufferSize);
	float fbufferSizeM1 = fbufferSize - 1.0;
	float xx = gl_TexCoord[0].s * fbufferSize;
	float yy = gl_TexCoord[0].t * fbufferSize;
	float vx = clamp(xx, 0.0, fbufferSizeM1);
	float vy = clamp(yy, 0.0, fbufferSizeM1);
	float vxFloating = clamp(xx, 0.0, fbufferSize);
	int iVx = int(vx);
	int iVy = int(vy);
	int lineStartOffset = GET_TEXTURE_IVEC4(ttStartOffset + iVy).r;
	int currentOffset = lineStartOffset;
	int referencePixelIndex = 0;
	int nn=0;
	for (int z=0; z < (primitiveNum * 200); z++)
		if (shouldExit)
			problem = true;
		ivec4 pixelGroups = GET_TEXTURE_IVEC4(ttStartOffset + currentOffset);
		for (int n=0; n < 4; n++)
			int pixelGroup = pixelGroups[n];
			int lastPixel = (pixelGroup & 0xFFF);
			if (iVx <= lastPixel)
				bool canDisplayPixel;
				vec4 theColor = finalColor;
				if ((pixelGroup & 0x40000000) != 0 || (pixelGroup & 0x20000000) != 0) // interior pixel
					canDisplayPixel = true;
				else // exterior pixel
					canDisplayPixel = false;
				if (canDisplayPixel)
					shouldExit = true;
					shouldExit = true;
					shouldDiscard = true;

		if (shouldExit)
	if (problem)
		gl_FragColor = vec4(0.0, 0.0, 0.0, 1.0);
	else if (shouldDiscard)
		gl_FragColor = vec4(1.0, 0.0, 0.0, 1.0);


NB: drivers tested: 285.58, 260.99. Windows XP 32-bit, GeForce GT 430 1024MB.

According to my copy of the Orange book:

An implementation might or might not continue executing the shader, but it is guaranteed that there is no effect on the framebuffer.

A quick read through the GLSL spec backs this up: all that it says is that the shader outputs won’t be written, but it doesn’t specify whether or not the shader continues executing.

