Recently, more and more algorithms for computer vision are ported from the CPU to the GPU to increase performance up to an order of magnitude.
The only drawback I encountered up til now, was that there’s no interpixel dependency due to the parallel processing architecture of the graphics chip. This could easily be solved by copying the framebuffer to a texture and sampling the resulting texture at the desired positions. As this can be performed on the graphics card itself, the performance drop is acceptable.
Right now, I’m in need of some kind of storage register that accumulates values, originating from a fragment shader or even the fixed function pipeline i.e. the sum of all the colorvalues that were drawn for a primitive in 4xfp16 or 4xfp32 format.(4xub8 would also be a great help)
I’ve been looking through the spec and virtually all extensions, but the only extension that comes close to this is the occlusion query. This is however of not much use to me…
Is there any way to accomplish this? Maybe there are some hardware implementations that do not necesseraly clear the temporary parameters of the fragment shaders, or maybe there’s a much simpler solution that I overlooked…
For the moment, I’m forced to use readpixels, accompanied with a large performance drop.
Any help is appreciated, as this would retain the performance gained from using the GPU instead of the CPU.