The fastest way to upload large textures

Hi every one,

I’m working on a video editing project which uses OpenGL to render the video to screen. As the videos that we need to process are often pretty large (HD1080p or even larger), the texture uploading performance become critical.

Currently we are using PBOs to upload the decoded video(on every video frame, map a buffer, decode into the buffer, unmap it, then use glTexSubImage2D to complete the uploading), but it’s much slower and consumes much more CPU cycles compared to its Direct3D counterpart(which basically does the same thing, only in d3d: lock a texture surface, decode into the surface, then unlock it). While the video is running, the GL code uses 20-25% CPU time while the D3D code only around 15%.

So I would like to know is there anyway to improve OpenGL implementation to something at least on par with d3d or is OpenGL just cannot beat D3D on this particular task? Thanks.

Depends on PBO usage pattern. To get maximum performances you must avoid CPU/GPU stall. My suggestion is to create PBO pool (a pool with several PBO buffers). Keep in mind that PBO memory is not cacheable so it is bad idea to decode frame directly into PBO memory. Let decoder decode frame (random access) in system memory then copy image data into PBO memory (sequential access operation).
Next problem is do not call glMapBufers just after glTexSubImage2D call. When using PBO, all pixel transfer functions become non-blockable, but there is a cacth if you try to map pbo buffer too early. If pending operation is not finished app will stall until pbo gets free (from GPU side). The best solution would be to map buffer much later or next frame. Typical pbo usage pattern can be:

  1. Create PBO
  2. map buffer
  3. give pbo pointer to decoder thread (or put it in pool of free pbo’s)
  4. decoder copy frame in pbo memory and notify render thread about that, or decoder ask pool for free pbo pointer and copy image data and notify render thread about that.
  5. render thread unmap pointer and call glTexSubImage2D
  6. render thread mark that pbo to map its pointer again at next frame (or two frames later)
  7. at next frame (or two frames later) map pbo pointer and give it to decoder thread (or pool)

Using pool you can handle multiple video stream transfers.

Im not d3d guy, so can you do a little test for me… Is the locked pointer (from texture) changed between two consecutive locks or it is always same?

The decoder’s access to the target buffer is write only, it never reads from the buffer so i think it’s fine to decode directly into the PBO as this saves an extra copy from the decoding buffer to the PBO.

Just playing the video is not a big issue, but we are also doing a lot of other processing at the same time, so we do need to squeeze out every last CPU cycle possible.

And yes, it looks like D3D always returns the same memory address.


I can tell you that when I use pbo on our system (Nvidia Quadro) I found that map/unmap are slow, I just use glBufferData, no sub or anything, this is faster in my case. I assume map/unmap are slower because they use the same memory but glBufferData can allocate new buffer if the old one is in use.

Hope it helps.

You can speed up glMapBuffer for VBOs and PBOs considerably if you call glBufferData and pass in null for the data before calling glMapBuffer. Nulling the data essentially flags the driver that the data in the buffer is invalid and it doesn’t have to stall attempting to preserve it. This is of course only useful when you don’t need the old data in the buffer.

This is essentially the same thing as passing the D3DLOCK_DISCARD flag when you a lock a vertex buffer or texture resource in direct3D.


I have pretty much the same question, but I’ll try to be a bit more specific. How can I obtain a pointer into video memory? This is what you get when you LockRect a surface in D3D8, if the texture was created in the default pool. I wouldn’t like there to be a buffer in system memory, and also no copying.
So, I think using PBO-s is not the solution I’m looking for.

There is no way to get pointer into vid-mem.

Mapping a buffer object may give you a pointer to video memory, but it depends on the driver so there is no way to be sure if the buffer obejct is really in vram.

I don’t see how you can get a pointer to video memory. Your own process exists in it’s own memory space and that’s all RAM.
VRAM is only accessed by kernel mode applications.

I though the driver was able to map parts of VRAM to virtual memory…

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.