glReadPixels() too slow

I have to do a

glReadPixels(x, y, w, h, GL_BGRA, GL_UNSIGNED_BYTE, buff);
for(int i = 0, j = w * h; i < j; i++)
{
if(buff[i][3] > 0)
{
samples++;
break;
}
}
The loop checks then the alpha value. If >0 collision found.
This is done after a stencil test I use for 2D collision detection. But glReadPixels() is slow and takes most of my app time.

Any ideas how I can speed up this? Really need to read video
memory. Otherwise my collision detection would be gone.

andreas

Note: This code should run on opengl and opengles.

How many pixels do you really need to retrieve ?
Using a PBO allows parallel downloading of data, preserving parallelism especially if you can check collision one frame later.

To me it sounds not realistic to try having both speed and same codebase between very different plaforms (desktop/embedded) …

What is your high level plan ?

Consider down-sampling your target buffer several times before reading. A special program may be required to choose the final color out of 4 neighbors based on their alpha values.

Smaller buffer = faster reading. Even if the speed change is non-linear, it can be a huge gain.

Another small thing, if you only need the alpha channel then only readback the alpha channel.

While doing PBO readback, you might be able to do sort of double buffered PBO readback, swapping between glReadPixels in one buffer and glMapBuffer in another, and even the memory mapped copy from PBO to system memory be performed in a parallel thread.

Thanks for your fast reply!

The number of pixels depends on a Rect which is the overlapping region of 2 sprites. No collision checks as long as there is no intersection Rect. Rect has w,h,x and y.

With stencil buffer I do the following:
glEnable(GL_SCISSOR_TEST);
glScissor(x, y, w, h);
glEnable(GL_STENCIL_TEST);
glStencilFunc(GL_ALWAYS,1,1);
glStencilOp(GL_REPLACE, GL_REPLACE, GL_REPLACE);
Sprite1->Draw
glStencilFunc(GL_EQUAL, 0x1, 0x1);
glStencilOp(GL_KEEP, GL_KEEP, GL_KEEP);
Sprite2->Draw
glReadPixels(x, y, w, h, GL_BGRA, GL_UNSIGNED_BYTE, buff);
And then I check the buff if there is any alpha to mention as
described above.

Its a game engine. And this code solves on of the biggest problems: Collision detection of sacling and rotating objects. These objects cannot have a precalculated collision mask. All needs to be done at runtime.

Btw I had a occlusion query. Works on my mac but not on iphone opengles (not available). Not a big problem because was also slow.
Regarding PBO. I am afraid its not available on opengles?

Thanks for your reply! What means downsampling? Pls advise.

Downsampling:
draw the original texture to the 2x lower resolution one. Each pixel is a result of 2x2 block. You probably need only pixels with non-zero alpha values, so you can choose one of 2x2 pixels which has the highest alpha and use it as a result.

Generally, you don’t need to create N additional textures for N stages - you can use LODs of the existing texture. In order to draw from one LOD to another, you must disable filtering and set the target LOD in the glFramebufferTexture* call.

Each down-sample level makes your bandwidth load 4 times smaller.

If you dont need to have actual coordinates or values of pixels where alpha > 0, then just dont read this back.

You could instead render to fbo, than use color attachment as texture, draw a quad with it, and discard fragments with alpha <= 0. If occlusion query says something was drawn, you have collision.

But than, you probably want to know where the collision happened :wink:

Edit:

Btw I had a occlusion query. Works on my mac but not on iphone opengles (not available).

Oh well, nevermind …

Thanks for all these great feedbacks. I tried/implemented the following:
-Occlusion queries (not available on opengles & slow)
-Read alpha only (Thx, that was great. I forgot)

Regarding Downsampling, I still have now clue how to do this. My hope is to get a lower amount of pixels to be transferred as intersection rect.
Can u make an example?

Btw: http://developer.nvidia.com/object/fast_texture_transfers.html is a interesting article

Edit: Looks like the following: occlusion query on mac and
glreadpixels() on mobile device. Its really fast on iphone.
Very interresting.

Surprisingly, I haven’t found many articles about down-sampling process… That’s where you can read about it (especially applying to collision detection):

http://books.google.com/books?id=WGpL6Sk…ved=0CBsQ6AEwAA