In my applications I’m using shaders to performs calculations in parallel on the GPU (I can’t use compute shaders or image load/store, this code path aims to be compatible with older versions of OpenGL).
The DrawCall that makes the shaders execute looks quite fast to me (glDrawArrays in the image), but glReadPixels (I read the image to retrieve the results) is really slow.
As far as I understood commands are buffered before beeing sent to the hardware, but sometimes certain commands require all the previously buffered ones to be executed first (like glReadPixels from a texture).
Is this the reason why it takes so long to execute glReadPixels? (you can see the texture is quite small from the profiler screenshot).
If so, why placing a timer before glReadPixels (CPU side) and after calling glFinish() gives back results in the order of magnitude of the millisecond and not 1/10th of millisecond? (same scenario used for the results shown in the image)