Android glReadPixels performance issues

#1

I am creating a custom camera preview using the GLSurfaceView, using OpenGl to render the frames given to me by the camera. I have the camera fully implemented and working how I would expect the camera to work with no fps loss and correct aspect ratios etc. But then the issue came when I needed to capture frames coming from the camera feed, my first thought was to use glReadPixles()

Using GLES20.glReadPixels() I find that some devices experience fps loss, it was mainly the devices with higher screen resolution this makes sense because glReadPixels needs to read more pixels with the higher resolution.

I did some digging and found others had similar issues with glReadPixels, and many suggested using a PBO, well using two of them acting as a double buffer which would allow me to read pixel data without blocking/stalling the current rendering process. I fully understand the concept of double buffering, I’m fairly new to OpenGL and need some guidance on how to get a double buffered PBO working.

I have found a few solutions to the PBO double buffering but I can never find a complete solution to fully understand how it interacts with GLES.

My implementation of the GLSurfaceView.Renderer.onDrawFrame()

// mBuffer and mBitmap are declared and allocated outside of the onDrawFrame Method

// Buffer is used to store pixel data from glReadPixels
mBuffer.rewind();


GLES20.glUseProgram(hProgram);
if (tex_matrix != null)
{
    GLES20.glUniformMatrix4fv(muTexMatrixLoc, 1, false, tex_matrix, 0);
}
GLES20.glUniformMatrix4fv(muMVPMatrixLoc, 1, false, mMvpMatrix, 0);

GLES20.glBindTexture(GLES11Ext.GL_TEXTURE_EXTERNAL_OES, tex_id);
GLES20.glDrawArrays(GLES20.GL_TRIANGLE_STRIP, 0, GLConstants.VERTEX_NUM);
GLES20.glBindTexture(GLES11Ext.GL_TEXTURE_EXTERNAL_OES, 0);

// Read pixels from the current GLES context
GLES10.glReadPixels(0, 0, width, height, GLES20.GL_RGBA, GLES20.GL_UNSIGNED_BYTE, mBuffer);

// Copy the Pixels from the buffer
mBitmap.copyPixelsFromBuffer(mBuffer);

GLES20.glUseProgram(0);
#2

No, it’s not just that. The whole design of mobile GPUs rendering to DRAM depends on having a very deep pipeline, with rasterization perhaps occurring a full frame later than CPU submit and vertex transforms. By doing a blocking glReadPixels(), you’re basically disallowing the GPU from doing this, mandating a full pipeline flush, and mandating that the CPU block until the GPU can finish up all that work. In many cases, this may cut your frame rate in half or worse.

If you absolutely must read the rendered image back to application memory, then transfering the data using a ring buffer of PBOs is a reasonable approach. Just keep in mind that double buffering may not be sufficient to avoid slowdowns due to synchronization. The driver might very well queue N frames of data ahead of the pipeline in a command buffer, and take 2 full pipelined frames to render the image. Each of these adds a frame of latency. Bottom line is you may need a ring buffer containing more than 2 images for maximum performance.

However, keep in mind that if you only need the rendered image to feed to subsequent GPU rendering, you should consider rendering this image to a GL texture using an FBO, and then feeding that to the tail end of your frame rendering pipeline. There are some details, but once you have that working it will likely be more efficient.

Just search these forums for “glReadPixels” and “PBO”, and you’ll find bunches of hits describing how to use the PBO approach to read rendered images back to the CPU.