We are using openGL with our video processing project, we grab the video frame from Linux V4l2 interface ( user pointer mode), and we provide pixel buffer object directly for v4l2 driver using as buffers. It is really fast to dump data direct to GPU( DMA ) without copying to RAM first. Then we process the video data in shader fragment, and render to framebuffer directly, and it work fine by this way.
Right now, we want to read those video buffer back to RAM, then encode and send out through network. So other than passing the buffer to the default framebuffer, we also create a FBO and dump data (after processing) to PBOs( ASYNC read back ), then map the PBO buffers to userspace. The problem is that reading video buffers from GPU to RAM is really slow, it take around 18~20ms to copy a YUV420 3840x2160 size frame comparing to 5~6ms from RAM to PBO , which increase our render time, and decrease the frame rate( from 30fps to around 20~25fps). I tried with multi-thread, and copy PBO buffer without blocking the render thread, but the result is not as I expected, the copy time increase to 35~50 ms using non openGl thread.
So are there any methods to increase the speed reading gpu data back to cpu, or improve multi-thread read back performance? Thank you.