I’m currently working on a shader that emulates what compute shader does but for older versions of OpenGL (OpenGL 2.0 as lowest target).
Currently I’m just issuing drawing commands and calling glReadPixels immediately after in order to retrieve the result. I know this approach forces synchronization between the CPU and the GPU and I’m going to rewrite it using PBOs as soon as possible.
But now I really want to know why I’m getting such degraded performance with this approach.
Unfortunately I’m not able to replicate the time I see CPU-side using Nvidia nsight but I’ll attach a screenshot to show the order the commands are issued (glFinish is there just for debugging purpose):
Those are the results I see from the thread I’m calling gl functions from:
For glFinish() I’m restarting a timer before the call and stopping the timer immediately after.
So my question are:
How can it be that the CPU has to wait so much time before glFinish() returns given that it’s called after the first few commands after SwapBuffers? (I’m using SwapBuffers as delimiter in the screenshot)
Swapping the back and front buffers on the Default Framebuffer may cause some form of synchronization (though the actual moment of synchronization event may be delayed until later GL commands), if there are still commands affecting the default framebuffer that have not yet completed. Swapping buffers only technically needs to sync to the last command that affects the default framebuffer, but it may perform a full glFinish.
The meaning of this is not entirely clear to me…should I expect that all the commands concerning the default frameBuffer have been executed (not just issued) after a swap or not?
What does this sentence exactly mean?
Swapping buffers only technically needs to sync to the last command that affects the default framebuffer