Convolution performance w/ non-separable filters

I have a fragment shader that uses 2 textures. One is the base image and the other a convolution kernel. The filter that the kernel is generated from in non-separable. Therefore I’m just using a simple brute force approach of a nested loop to iterate over the image/kernel and summing the results. The kernel can be up to 256x256. Rendering is quite slow on my 7800 GTX. Any shader ‘tricks’ that I should try to improve performance?

If you don’t want to go the full FFT route (which I think can and has been hardware accelerated BTW), consider just a brute force Fourier Transform. It’s a separable process, so by computing the FT of the image and kernel, multiplying them and then taking the inverse FT you’re avoiding any 2Dx2D loops.

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.