ATI problems with variable length for loops?

I’m sorry that you are not aware of the previous reply.

I think I have sent you the mail about it. Maybe you missed it.

I’d like to chime in with a similar problem which has been 100% reproducible on my laptop (Radeon 3450, Catalyst 10.7, Windows 7). It’s a ghetto multiple-point-light shader that uses neither textures nor uniform blocks. When I use a literal value for the loop count, the shader works correctly – though the lighting model is crap, as designed :slight_smile: If I use the lightCount uniform instead (set to the same value), not only is the fragment shading incorrect, but the vertex shader seems to be affected as well; my geometry gets collapses to the YZ plane.

The same shader works correctly in either scenario on my desktop (Radeon 5850, Windows 7, Catalyst 10.7).

Here’s the code:

#version 150 core

in vec3 outVS_WorldPos;
in vec3 outVS_WorldNormal;
out vec4 outFS_FragColor0;

#define MAX_NUM_LIGHTS 256
struct Light
{
	vec4 pos; // XYZ=pos, W=falloff distance
	vec4 color;
};
uniform Light lights[MAX_NUM_LIGHTS];
uniform int lightCount;

void main(void)
{
	vec4 outColor = vec4(0.1,0.1,0.1,1); // base ambient level
	// Broken on 3450; replace lightCount with literal 256 and it works...
	for(int iLight=0; iLight<lightCount; ++iLight)
	{
		vec3 lightPos = lights[iLight].pos.xyz;
		float falloffDistance = lights[iLight].pos.w;
		vec4 lightColor = lights[iLight].color;
		
		vec3 toLight = lightPos-outVS_WorldPos;
		float distToLight = length(toLight);
		float attenuation = 1.0 - smoothstep(0, falloffDistance, distToLight);
		outColor.xyz += clamp(dot(outVS_WorldNormal, normalize(lightPos-outVS_WorldPos)),0,1)
			* attenuation * lightColor.xyz;
	}
	outFS_FragColor0.xyz = outColor.xyz;
}

Is this the same bug, Frank?

I think I have sent you the mail about it. Maybe you missed it. [/QUOTE]
Frank, sorry about that. Somehow I did indeed miss that message. Thanks for the update as well as the suggested workaround. I’m looking forward to the fix.

It’s not the same bug. If you query the max uniform components on Radeon 3450, you will find the limitation is 1024, which means you could use 256 uniforms at most.

So look at the shader,
#define MAX_NUM_LIGHTS 256
struct Light
{
vec4 pos; // XYZ=pos, W=falloff distance
vec4 color;
};
uniform Light lights[MAX_NUM_LIGHTS];

512 uniforms are declared, the result is unexpected under the case. That’s the root cause.

facepalm
Durp.
Thanks Frank :slight_smile:

Sorry to bring this thread up again, but would this bug also affect geometry shaders? I’ve been trying to run this, and having the same problem:



#version 400

layout(max_vertices = 96) out;

uniform int RES_R; // works fine with: const int RES_R = 32;
uniform float Rg, Rt;

flat out int layer;
flat out float r;
flat out vec4 dhdH;

void main() {
	
	for (int i = 0; i != RES_R; ++i) {
		float rl = float(i) / (float(RES_R) - 1.0);
		rl = rl * rl;
		rl = sqrt(Rg * Rg + rl * (Rt * Rt - Rg * Rg)) + (i == 0 ? 0.01 : (i == RES_R - 1 ? -0.001 : 0.0));
		
		float dmin = Rt - rl;
		float dmax = sqrt(rl * rl - Rg * Rg) + sqrt(Rt * Rt - Rg * Rg);
		float dminp = rl - Rg;
		float dmaxp = sqrt(rl * rl - Rg * Rg);
		
		gl_Position = gl_in[0].gl_Position;
		gl_Layer = i;
		EmitVertex();
		
		gl_Position = gl_in[1].gl_Position;
		gl_Layer = i;
		EmitVertex();
		
		layer = i;
		r = rl;
		dhdH = vec4(dmin, dmax, dminp, dmaxp);
		
		gl_Position = gl_in[2].gl_Position;
		gl_Layer = i;
		EmitVertex();
		
		EndPrimitive();
	}
}

Could you try to add the input and output topology together with max_vertices as
layout(triangles) in;
layout(triangle_strip) out;
?

Yep. Just tried with:

layout(triangles) in;
layout(triangle_strip, max_vertices = 96) out;

and still getting the same problem.

The shader works for me. Do you mean the geometry shader doesn’t work in your program? Could you please try to narrow down the application and send it to me? Thanks.

I would really wish companies would stop bragging about what drivers are out first, but brag about the least amount of bugs with their drivers! This bug still exists today.

    uniform int samplesin= 4; 


for(int i=0; i < samplesin; ++i)
{

    }

If I hard code samplesin to not be a uniform all works fine. Its December now! please fix.

SDK INFO: GL_VERSION = 3.3.10317 Compatibility Profile/Debug Context
SDK INFO: GL_VENDOR = ATI Technologies Inc.
SDK INFO: GL_RENDERER = ATI Radeon HD 3400 Series
SDK INFO: GL_SHADING_LANGUAGE_VERSION = 3.30
SDK INFO: GLEW_VERSION = 1.5.7

I think the bugs that are complained above are fixed. Could you please provide more details to reproduce your problem? We would like to fix them soon.
Sorry for the inconvenience.

Thanks! its as simple what I posted…what I do now is if I detect an ATI card, I use a #define in my shader to use in my loops instead of downloading a uniform (uniformi). My coworker just got a higher end card, it will be interesting to see if this issue just exists on the lower end hardware. BTW, all my other unforms get set properly. Also, we have multiple shaders and the problem exposes itself in all of them.

I don’t have time to make a test app, but if you have a specific question I can answer that. BTW, we use #version 330 core in our shaders… not sure if that matters or helps.

Porting our stuff to additionally work with ATI from the NVIDIA world has been a frustrating process, but we are making headway. If I can just figure out why the ATI drivers deadlock on glmapbuffer sometime after a gldrawelements life will get better. Another strange find…a compiled shader with no errors can sometimes make gldrawelements throw an invalid operation! go figure… BTW, I did find we had an ARB enabled in the shader, and once removed all worked great and gldrawelements did not create an opengl error! I should write a book or at least a blog to help others in this process once I’m done.

BTW, the deadlock would cause the windows 7 OS to report it has recovered from the ATI driver…

Okay, I will ask some questions about the loop expression.

  1. Is the uniform “sampleIn” used to indirect index the sampler array? We have limitation on it. The feature is supported on HD5xxx and above.
  2. Which shader do you use the loop expression on? Vertex? Fragment? Geometry? Do you use uniform block?
  3. Is there any error/warning message reported from the ATI’s compiler?

For your deadlock problem, you could use glGetError to locate where the error is.

Thanks
Frank

The hardware might not be capable of loops. Loops with static lengths simply get unrolled, whilst those with dynamic lengths fail to compile.

I do not get any shader compile errors, but what you are saying seems to make sense with what I’m seeing. This really surprises the heck out of me as I would have thought for loops for an OpenGL 3.2 supported card would be expected.

Since I do not get any opengl errors, I would still consider this a bug on the ATI/AMD side. I’m also very troubled… does any API call exist so I can determine what AMD/ATI cards can support a for loop while others cannot? If this is an ati/amd hardware limitation, my nvidia card from 2006 can handle such…which is why I’m still having a hard time accepting this is an AMD/ATI hardware limitation and not a driver bug.

No GL error exists, and a deadlock is a hard core deadlock where the windows OS says it recoverd from the AMD/ATI driver crash. Even if I use the latest gdebugger (version 5.8), it deadlocks.

I have found out how to get around one deadlock, where if I use “0” for the multisampled FBOs a deadlock does not occur, and the geometry does indeed show up. I currently do not know how to get round this issue btw. I can probably tell your ati/amd customers that multisampling will not be available on certain AMD/ATI cards(until we figure this out).

As for the shaders… I have tried uniform int as well as to sneak it into a vec4 with some other data.

#version 330 core
precision highp float;
uniform int loop;
out vec4 outcolor;

void main()
{
ivec2 pt = whereintexture
vec3 info;
vec4 yadda;

for(int i=0; i &lt; loop;++i)
{
	info  = texelFetch( positionTex, pt,i ).xyz;

	info  does stuff with yadda
}

outcolor = yadda/loop;

}

Usually the compiler will throw out the error when the hardware doesn’t support the feature.
There is an known issue for texelFetch on a multi-sampled depth texture on some special hardwares. Could you please send your program to me by frank.li@amd.com? It’s helpful to resolve your problem.

Thanks for your feedback
Frank

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.