I have an OpenGL application that relies on fast texture uploads (it’s a video processing app).
My current solution is to use main thread with OpenGL context, map the PBO array and then upload to the mapped PBO buffers from the other thread. Then glTexSubImage2D from OpenGL thread to actually start using the textures.
Recently found this presentation:
There’s one more approach described - to use 2 threads with 2 shared OpenGL contexts. So basically we upload the texture in the “uploader” thread, set a fence, and check the fence on the other “consumer” thread to see if we can use the texture.
My question is… what’s the point of PBO in the 2nd approach? We can upload directly from the CPU buffer, right? I mean, the only gain is that glTexSubImage2D returns immediately and the memory is uploaded asynchronously, but we still need to do memcpy to the pbo in the same thread… so from my tests there’s no difference really. Maybe I am missing something?
Good question. If your “upload thread” is just doing blocking GPU texture uploads, then there might not be any real advantage. Uploading to textures through PBOs is just a best practice. If you don’t do it, the driver will internally.
Just trying to think whether there ever could be some advantage in this situation… PBOs (and buffer objects in general) are driver-controlled memory. Whereas your app CPU texel buffer is completely outside of driver control. There might be a minor transfer advantage with the former in some circumstances if the driver imposes some special alignment on the allocation or mapping of driver buffer object memory (at least the CPU image of it your app sees that is), giving it access to things like faster memcpy implementations or being able to stream the data directly into the hardware copy queue – things like that. But I’m reaching… I don’t know that this is the case for any GL driver out there.