CPU vs. GPU on convolutions

tomodachi · March 27, 2005, 2:36pm

Hi,
I tried blurring an image using GLSL, but i find the result is too slow, because if I want to average 8 texels, I must acces 8 times to the texture. The fact that a fragment shader can’t send information to the next execution of itself in the next pixel makes me think that accessing the texture data directly from OpenGL and do calculations on CPU is faster. Am I wrong?

carl_lewis · March 27, 2005, 2:50pm

actually, I’ve found that the GPU out performs the CPU on 2D convolution.
simply averaging a 3x3 area should be performed very much faster than whatever screen refresh rate you care to assign, but for more complex filters you may need to encode the kernel as a 2D texture (with multi texture support)

texture lookups are one of the fastest instructions on the GPU (assumed by most to be a single cycle)

experimenting with a 9x9 area (with texture encoded kernel) i still get around 50 fps (compared to around 20 fps on the CPU)

Korval · March 27, 2005, 4:35pm

texture lookups are one of the fastest instructions on the GPU (assumed by most to be a single cycle)
Huh? If a texture lookup takes only 1 cycle, you’re lucky. A texture lookup is, effectively, a memory access. And, as you should already know, memory access == slow.

Indeed, texture lookups are one of the slowest operations you can do on a GPU. You can do a full 4-vector floating-point dot-product in one cycle, but there’s no way you can expect a texture lookup to take only that long. The memory fetches that are required to satisfy the lookup request are what makes it take time.

That being said, you’ll still easly outstrip your CPU by comparison. The GPU will, typically, do a 2x2 array of tiles all at once, thus mitigating the memory fetch and opcode costs. A CPU cannot. GPU’s typically have much faster memory compared to CPU’s. And GPU’s have caches specially designed for dealing with texture images, as well as storing images in a format specifically designed to make texture fetching faster. CPU’s do not.

tomodachi · March 28, 2005, 1:42am

Imagine I’m doing a 1D convolution averaging 8 pixels. I make a sum of the first 8 pixels, save it, and divide it by 8 so I get the pixel averaged. Then, for the next pixel, I take the result of the prevoius sum, subtract the first pixel of the range, and sum the next first pixel out of the range. This way I’ve got the second pixel averaged with only two operations. This thing cannot be done with GLSL, and it’s what I was talking about.

system · October 19, 2021, 7:47pm

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.