Slow shaders on RADEON X300/X550 Series (128 MB)

Hi All,

We have a texture blurring code that runs very slow only on this hardware:

Graphics card:  	RADEON X300/X550 Series (128 MB)
Graphics card:  	RADEON X300/X550 Series Secondary (128 MB)
Chiptype:        	RADEON X300/X550 Series (0x5B60) 
3D accelerator  	ATI Radeon X300 (RV370)
Installed driver:  	ati2dvag (6.14.10.6575) 

RAMDAC frequency:      400 MHz
Pixel pipelines  	4
TMU per pipeline  	1
Vertex shaders  	2 (v2.0)
Pixel shaders  	1 (v2.0)
DirectX support, hardware  	DirectX v9.0
Pixel Fillrate  	1296 MPixel/s
Texel Fillrate  	1296 MTexel/s 

ATI GPU Registers:
			ati-00F8  	08000000
			ati-0140  	00000070
			ati-0144  	1A289111
			ati-0148  	D7FFD000
			ati-0154  	F0000000
			ati-0158  	31320032
			ati-0178  	00001017
			ati-01C0  	01FF0000
			ati-4018  	00010011
			ati-CLKIND-0A  	03301D04
			ati-CLKIND-0B  	00001A00
			ati-CLKIND-0C  	0400BC00
			ati-CLKIND-0D  	00807FFA
			ati-CLKIND-0E  	04002400
			ati-CLKIND-0F  	00000000
			ati-CLKIND-12  	00031212
			ati-MCIND-6C  	00000000 




Chipset:  Intel Grantsdale-G i915G

GPU code:  	RV370 (PCI Express x16 1002 / 5B60, Rev 00)
GPU speed:  	324 MHz (orginal: 324 MHz) 

CPU type  	Intel Pentium 4 520, 2800 MHz (14 x 200) 

Supported:  	x86, MMX, SSE, SSE2, SSE3 



OpenGL Extensions Viewer 3.0 says:

Renderer: ATI Radeon X300/X550/X1050 Series
Vendor: ATI Technologies Inc.
Memory: 128 MB
Version: 2.1.8543 Release
Shading language version: 1.20

To me all the info tell that this machine can fully support a shader program, while the real machine make us think that we need to disable shader support on it.

Why? The shader code follows below.

Thanks,

Alberto

// size of kernel for this execution
const int KernelSize = %len%; 
                                      
// array of offsets for accessing the base image
uniform float Offset[KernelSize];

// value for each location in the convolution kernel
uniform float KernelValue[KernelSize];

// image to be convolved
uniform sampler2D BaseImage;

void main()
{

   int i;
   vec4 sum = vec4(0.0);

   for (i = 0; i < KernelSize; i++) 
   {
      vec4 tmp = texture2D(BaseImage, gl_TexCoord[0].st + vec2(Offset[i], 0));
      sum += tmp * KernelValue[i];
   }

   gl_FragColor = sum;
                                     
}

For what value of KernelSize is it slow ?

To me “supported” and “usable” are kind of orthogonal. A small micro benchmark at runtime, during an “auto-detect settings” phase, allow to make a better decision whether to use a feature or not.

The user should always be able to force the use of any supported feature, even if it does not pass the “usable” framerate, but defaults settings should really take in account the real performance.

Hi ZbuffeR,

static int kernelSize = 19;

It is so small…

ZbuffeR,

I understand you point but if we need to test everything for speed before using it, all the version numbers what are there to do?

Thanks,

Alberto

Maybe on this platform, Separable Convolution would be a better fit for your algorithm. This would reduce the number of texture lookups at the expense of an intermediary texture write.

http://http.developer.nvidia.com/GPUGems/gpugems_ch21.html

You should find what the bottleneck is however before doing all this.

The X300/X500 cards are very, very slow. A 19-tap kernel will basically destroy them.

Unfortunately, version numbers don’t tell a whole lot about performance. You either have to measure at runtime, as suggested, or build a list of video cards beforehand.

Ok, the only viable solution is to test speed at runtime. So we can check and disable blurring.

In general what is the recover approach in the case the time becomes acceptable? (I know that in shader case it’s impossible to get better results) You set a flag = false and never do the computation again, but what if the model changes and blurring can be done?

Thanks again,

Alberto

You mean, when the user upgrades its video card ?
A big fat button labelled “re-detect graphic settings”.
Or you can do that silently at each startup (it should be fast).
Or check the GL_VENDOR GL_RENDERER GL_VERSION strings, if any one changes, redo the auto-detect.

The best solution will depend on your application.
I still have some trouble understanding what you sell exactly, is that a low level graphic engine, a scenegraph, … ?

No ZbuffeR,

I mean in general, suppose you have a very complex scene and you decide to turn off some feature to keep the navigation fast enough, then the scene becomes simpler: what is the best approach to re-activate complex/slow features? If you continuously try to see how fast complex features are you will end up with a slow fps.

Do you remember some program that instead of objects draws boxes to allow smooth navigation on slow machines? Perfect, suppose that the scene comes simpler while navigating, how do you re-activate accurate object representation?

Or maybe there is no way and the user needs always to press one button to change LOD and get different performaces?

We develop a small software component that allow 3D models visualization.

Thanks again ZbuffeR,

Alberto