Hi ,
currently i`m trying to abuse the graphics hardware to do image convolution plus some additional calculation with the convolution result:
To be precise the algorithm has 3 passes which are :
- calculate x and y derivates step 1
- calculate x and y derivates step 2
- for every pixel sum the 3 possible products between dx and dy in the neighbourhood of each pixel
Pass 1 and 2 work like a charm, number 3 does not because the compiler is allocating an excessive number of temporaries , without any need to do so …
Here is the GLslang code :
const int KERNEL_WIDTH = 5 ;
const int KERNEL_HEIGHT = 5 ;
uniform samplerRECT textures[1];
uniform vec4 kernel [4][3];
void main(void)
{
int i,j,k;
vec4 accum[3] ;
vec2 addr = gl_TexCoord[0].xy;
for (i=0;i<KERNEL_HEIGHT;i++)
{
addr.x = gl_TexCoord[0].x;
for (j=0;j<KERNEL_WIDTH;j++)
{
vec4 tex = texRECT (textures[0],addr);
half4 gx,gy;
gx.xy = unpack_2half (tex.x);
gx.zw = unpack_2half (tex.y);
gy.xy = unpack_2half (tex.z);
gy.zw = unpack_2half (tex.w);
vec4 gxx = gx * gx ;
vec4 gxy = gx * gy ;
vec4 gyy = gy * gy ;
for (k=0;k<4;k++)
{
accum[0][k]+= dot (gxx * gxy , kernel[k][j] );
accum[1][k]+= dot (gxy , kernel[k][j] );
accum[2][k]+= dot (gyy , kernel[k][j]);
}
addr.x+=1;
}
addr.y+=1;
vec4 gxx = vec4 (accum[0][0],accum[0][1],accum[0][2],accum[0][3]);
vec4 gxy = vec4 (accum[1][0],accum[1][1],accum[1][2],accum[1][3]);
vec4 gyy = vec4 (accum[2][0],accum[2][1],accum[2][2],accum[2][3]);
gl_FragColor = (gxx + gyy - sqrt((gxx - gyy)*(gxx - gyy) + 4*gxy*gxy))/2.0;
}
Now , looking at the asm ouput ( i can provide it but its rather lengthy ) the compiler tries to make the texture fetches independent from each other by using different temporary variables for the accums and summing them up only at the very end. By this for every loop iteration another set of 3 temporaries is used , and so i
m quickly hitting the 32 tmps boundary for Geforce FX .
Has anyone experienced similar behaviour ? Is There a way to tell the compiler not to use this “optimization” , or better : shouldnt the compiler be clever enough to disable it when it is running out of temps when using the optimization ? I
m using a GeForce FX 5600 with Forceware 66.32.
Best Regards ,
Martin Kraus