Ok, I’ve been having a recurrent problem with the X800 chip and running extremely simple GLSL vertex/shader program pairs.
Essentially what Ive got is a very simple multitexure blend demo which just passes through the position and a pair of tex coords. Sample both textures and mul the color…simple right?
Well apparently on the X800 the jist im getting is that some render state is defaulting the device back to software, so we’re getting about 2FPS on the X800, and all nVidia cards are running 95FPS+ Problem is right now what Im doing is so minimalistic that I cant really pin down whats causing it to do this. We’ve forced off any hw FSAA and Aniso.
This is with the current released catalyst drivers. So is there something really simple that Im just not getting about ATI/GLSL or, might it be a driver bug??
Thanx for any help.
Yeah, this is hardly enough information to be able to actually be of any help.
We would need to see your code and your shader, preferably some kind of minimal test case that exhibits the problem.
Well, the code that manages the GL state is pretty deeply embedded in the render mgmt code. The setup code is all very simple, there is one vertex buffer, and one index buffer allocated w/ STATIC_DRAW. Add the shader program object at init time as well.
I’ll conjur up approximately whats going on here:
I apologize if the code is very generic, but thats about whats going on there. Assured the code works for nVidia cards wonderfully. If there’s anything more specific needed I’d certainly be glad to post that.
I seem to recall ATI having some problems with POINT_SMOOTH/LINE_SMOOTH/POLYGON_SMOOTH and GLSL.
Try searching in your code for any glHint or SMOOTH type paramaters.
I just checked the status of all, and they all come back false.
Have you read the compile and link info logs?
The info log returns no log. There were no link errors.
What’s your texture internal format, size and filtering? Have you tried to run this program without shader?
Yea, the internal format is RGB8, linear filtering on a 512x512 base texture and a 128x128 lightmap. The program runs fine with no shaders bound.
Try this code:
gl_Position = ftransform();
gl_TexCoord = gl_MultiTexCoord0;
gl_TexCoord = gl_MultiTexCoord1;
uniform sampler2D baseTex;
uniform sampler2D lmTex;
vec4 baseCol = texture2DProj(baseTex, gl_TexCoord);
vec4 lmCol = texture2DProj(lmTex, gl_TexCoord);
gl_FragColor = baseCol * lmCol;
This can produce slightly different results, but for your simple case this is a good point to start with.
The above code yields the same result.
Then try the following formula in your vertex shader:
gl_Position = gl_ModelViewProjectionMatrix * gl_Vertex;
gl_Position = ftransform();
Actually, when retrieving the link info log on the program object, the ATI card does return that the VP and FP would run on hardware. The nVidia card doesnt return a link log at all.
The slow down you’ve noticed is that the ATI renderer is reverting to a software rendering method that strictly follows the OpenGL specification instead of rendering incorrect results.
The Win32 OpenGL implementation doesn’t have a ‘pure’ mode (in DirectX language), or ’ AGL_NO_RECOVERY’ on MacOSX, that forbid software fallback.
Having the the software fallback happens for various reasons which are not (well) documented : it could be some various render states enabled even after the shader has been linked succesfully in hardware.
Such render states (i’ve noticed) that produces software fallback: line or point smoothing, enabling fog - under some specific circumstances -, rendering into a texture using an unsupported texture adress mode, or enabling transparency when rendering into a floating point texture, using the polygon offset (when access gl_FragCoord).
In your code, try to using the gl_TexCoord instead of using varying vec2, or using varying vec4 and pack your two varying vec2 into 1 vec4.
Call glUseProgramObjectARB(currentProgramObject); just before the glDrawElements, not before setting your vertex data.
Finally, try a simpler shader (gl_Position = ftransform(); / gl_FragColor = vec4(1.0,1.0,1.0,1.0)), and check if the software fallback is seen or not.
If all is not working, using ARB_vertex_program and ARB_fragment_program extensions, which are less prone to software rendering fallback on ATI (and generally better performer when correctly written).
grxmx, could you send me the app with source to epersson ‘at’ ati.com so I can take a look at it?