PBO questions

My questions are splitted in three parts :

  • First, if I want to upload a texture to a PBO using glMapBufferARB / glUnmapBufferARB, if I’m right this is RAW data so the format isn’t important. Let’s say that I have a system memory image in RGB5_A1 format. Then, I’ll be able to transfer this surface properly to the PBO. Is there any way for me to take this PBO in 555_1 format and transfer it to the texture object? Currently I’m using glTexImage2D through which I can choose the internal format correctly, but the data format is not that flexible. Am I missing something?

  • Second, where (CPU/GPU) the format conversion is done if a format isn’t supported nativly by the GPU ? Lets say I use a RGB10 format but the GPU doesn’t support it directly so it will use RGBA8 instead. Is the conversion done by driver or GPU through a convertion shader? Because if the conversion is done in driver… during the PBO->Texture blit there is no CPU involved, and we don’t specify any format during the image->PBO transfer which is the only part done by CPU !?

  • Finally, when we are using PBO, what I understand is that we have a CPU copy (from ram to video (PBO)) and then a GPU blit from video (PBO) to video (texture object) memory… Is that correct? Because I cannot explain why we wouldn’t be able to just copy directly to the texture object with CPU without going through the PBO, or why aren’t we just using the PBO as a texture…

  1. Use the GL_UNSIGNED_SHORT_1_5_5_5_REV for the format parameter

  2. Most probably, on the CPU. So you will get a big hit if the format is not natively ssupported. No idea if modern cards can do 16bit textures in hardware. Probably not.

  3. Well, you don’t know that. It is possible that the driver will use the PBO data directly, without the second copy (still, it is not very likely). The main die of PBOs is to allow asynchronous transfers. If you want to load teh texture direcly, well, you have glTexImage for that :slight_smile: Of course, it would be nice to have a simple extension that would allow direct async loading of the textures (something like glTexImageAsync)

  1. Let’s supposed it is CPU, that means we need to re-read the local video memory (PBO), convert the data to the desired format and resend the converted surface in resident memory (texture object)? This looks horible… I cannot believed it’s done that way.

  2. The asynchronous transfer is done between the PBO and the texture object which are both hardware resident. The PCIe transfer between RAM and local memory is done when we call glMapBufferARB / glUnmapBufferARB if I’m right. The part which is really asyn is between two hardware resident surfaces (PBO -> texture obj) so I don’t get the point of this “async” transfer since “no transfer at all” would do the job if the PBO could be used directly as a texture. But, I’m certainly wrong somewhere…

  1. Well, make a better suggestion. While it is theoretically possible to do conversion on latest Nvidia and ATI hardware (DX10-class GPU), I don’ think they will waste their time on implementing it.

  2. You can load data to a buffer object in a different thread or while other GL operations are still pending. So you get async transfer on both levels.

  1. Maybe the driver keep an additional system copy, a mirror of the PBO in every cases. That way it can resend it in a different format without re-reading it. Or, maybe the PBO is not located in local memory but only in aperture…?

  2. I’m agree with the multithread thing, this is ok… but my initial question remains, why do we need this extra GPU copy -after- the cpu multithreaded one? It seems useless to me to blit from PBO to texture.

  1. Depending upon the driver architecture, the backing store can be accessed (for read only operations) by the CPU without any penalty as long as the VRAM copy is not more up to date AND there are no pending operations that will modify the PBO itself (ReadPixels, BufferSubData) etc… The CPU will then read the data and perform a format conversion to something the GPU can understand.

Then one of two things will probably happen:
i) The new data will be placed in a completely different buffer and a blit will be enqueued from there to the texture.
ii) The cpu will write directly into the texture with the new data.

  1. A buffer object is a linear array of memory. A texture is not usually layed out in a linear fashion anymore. A texture is broken up into tiles of MxN pixels and sometimes an additional tiling of those tiles may be used. Anyways, the GPU blit from the PBO to the texture is needed to tile the data from the linear arrangement used by the PBO to the tiled arrangement used by the texture.

the blit from PBO to texture is,
because GL does not expose the real internal format of a texture,

this is i.e. because some graphics hardware retile the texture internally for faster random memory access along random directions (means increasing texture cache hit rate for the common cases),
many old software renderer in pentium-60mhz times did that too,btw

i think in nvidia g80 specs is a new glGetTex… enum, where u can query the internal format of a texture

well anyway, i didnt use glmapbuffer that much (i had alot of problems in stalling the pipeline), and texsubimage2d was always faster (even while using the double memorybandwith).

but as i did understand the pbo concept for CPU->GPU transfers:

pbo should NOT be located on video memory,
pbo should be located in system memory
(maybe its AGP/PCIe external video memory with cpu-writeback-caching disabled)
so the pbo->texture transfer should actually be the PCIe DMA transfer
and glmapbuffer gives u only a pointer to a PBO (located in cpumem) mapped into the virtual address space

  1. Thanks for those answers, I didn’t thought about the tiling/untiling, this was the point I was missing…

I’m agree with your conclusion which say that PBO should be located in system memory (well, aperture… AGP/PCIe memory) and NOT in video. But in facts, a simple test app on NVidia (g92) shows that if you create a PBO and check the remaining video memory, you’ll see that the PBO uses video memory and no VRAM at all… Looks like the DMA is not PCIe transfer then, this is strange…

On that same topic, …

Let’s say I want to update a texture every frame (from CPU to GPU) while, at the same time, having as little impact as possible on the main thread which is concurrently rendering the scene (i.e. no slow-down).

For that purpose, I guess it would be best to have my PBO allocated in VRAM and to dedicate a CPU thread to do the buffer update (i.e. the memcopy). Note that in such situation (assuming I have an extra CPU core available), I don’t mind having my thread blocking and waiting for the transfer to be completed. I would guess that PIO writes over PCI-Express (from a different thread) should have low impact on the main rendering task.

The alternative would be to have the PBO allocated in system memory, but in that case the long transfer (over PCIe) would take place during the Texsubimage function call, which should most probably be a GPU DMA transfer. Because this long transfer involves both the OpenGL driver and the GPU, it has more chances to impact to main rendering task.

Are my assumptions right?


If I am reading this correctly, then why not do a ping-pong FBO?

I can’t talk for the original poster, but for my application I don’t see how the ping-pong algorithm would help.

My application is rendering a “regular” Opengl scene but must then overlay an image that is coming from another application (ex: frame grabber) at every frame. So the scene is first rendered in a FBO, and the final image (in the back buffer) is the result of using both the FBO and the texture that was updated from the frame grabber. The two images are combined with a custom pixel shader and finally stored in the back buffer.

The part I am still trying to solve is how to seamlessly (i.e. without impacting the main OpenGL rendering thread) update an OpenGL texture from the frame graber’s data. Note that the frame graber can only output in system memory, so the image data needs to be downloaded from CPU to GPU (over PCIe) at one point or another.

That’s why I was thinking of using PBOs and to dedicate an additional CPU thread to do the PBO update (over PCIe). I don’t need the PCIe transfer to be blazingly fast since the texture update is being done in parallel to the scene rendering. I need, however, the texture update to not slow the scene rendering down.

Any other suggestion is welcome.


Euh, I did a mistake in this statement. We should read : “[…] you’ll see that the PBO uses video memory and no system memory at all…”

Well, I guess for this particular application you might be right, but keep in mind that PIO writes are generally slower that DMA through PCIe because of an additional overhead (not sure of the technical details though, or, maybe I’m just wrong?!). Thats why I’m wondering why PBO aren’t in system memory. But for your application : uploading a texture, where you don’t mind waisting time on one CPU core, the CPU -> GPU surface transfer done by the CPU -may- be faster.