I am working on a project which renders Dicom files and do some GPU calculations and rendering regularly like cropping, rotations, …etc, I am wondering if I should implement FFT convolution for general filtering and deep learning model evaluation on GPU or CPU to avoid the cost of implementing two separate algorithms.

So the question is which would be better for my case to implement FFT convolution (cooley and tukey radix 2 algorithm) on CPU using multi threading and SIMD or on GPU ?

I’m an openGL user and not a specialist. Most answers in these conserns points to GPU being very fast due to a massive use of parallelism - Everyone get the answer: write both and test with the clock.

On the other hand … I’m doing a softening of a font-file with a 5by5 kernel. It hardly can be simpler than that, so it may not be applickable to your situation. I’ve put an image of the cpu-side calculation of each transparency-value of a pixel here:

pixel calculation //correct or not, it works.

Just to give a hint of the massive calculation. That should put a burden on any fragment-shader even if done in a more ingeneous way. The image shows how I deal with the edges.

This blur-filter works on black/white, and I got to think of an optimation (test if the 5by5 area of the font-image really needs a blurr or the pixel can skip the calculation). I suppose the same optimization could be done in the uppersit direction (sharpening, if there is no color-gradient across the kernel-area) … this could be part of what you’r doing, but only guessing.

I hope I didn’t miss the point entirely.

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.