So tell me, why should I use PBOs?

I read the specs about PBOs, and while they seem to be quite useful for procedural textures and streaming videos, I wonder if they can be used as an overall replacement for the usual texture handling mechanism. In other words: no more glTexImage2D, but uploading to PBO, and binding the PBO as one binds glTexImage2D-created textures. However, where are the advantages for normal textures (e.g. copying once to the graphics card memory and using it many times afterwards)? Are there any? If so, this means that PBOs are only really useful for dynamic textures?

For static textures, PBO doesn’t really make sense. It’s main advantage is that the up/download is asynchronous. You start a transfer, do something else, and then you come back to it and use the result. This works in both directions, so PBO is not only useful for dynamic textures, but also for async readback…

Right. PBO is really about making Pixel Transfer asynchronous. It also has the advantage of allowing a “short circuit” feedback path in the graphics pipeline where the CPU need never actually touch the data.

Without these buffers with protected access, the driver can never know from one call to the next
whether the CPU has modified the data. So it must assume the worst.

It seems to me that it’s also usefull for streaming from disk
(uploading textures in the background, before you actually use them)

Originally posted by LogicalError:
It seems to me that it’s also usefull for streaming from disk
(uploading textures in the background, before you actually use them)

Yes, that’s the async part.

Originally posted by cass:
[quote]Originally posted by LogicalError:
It seems to me that it’s also usefull for streaming from disk
(uploading textures in the background, before you actually use them)

Yes, that’s the async part.
[/QUOTE]I’m just looking at integrating PBO into the OSG, in particular for the purpose of doing video streaming. The video reading is being done is a back ground threads. It would be great if I could doing the uploading of the imagery from this background. However, this raises the issue of OpenGL graphics context.

The OSG scene graph is written so that there is single thread doing all OpenGL calls. Now doing a pixel buffer upload in the background would require a second thread to need the graphics context as well.

This will require the background thread to negotiate with the main OpenGL thread for the OpenGL graphis context. The negotiation kinda breaks the async nature of what we are trying to achieve - yes the download could be async, but the graphics context can’t be.

Now there are various ways I can think of for managing which thread has the current graphics context, or for passing a upload callback to the main OpenGL thread to avoid the context switching, but either way I can’t see a way round the need for sync’ing the main OpenGL thread with the background one.

Suggestions?

I have tried to use PBOs on the PACK_BUFFER binding and map functionality to do upside-down flipped readbacks without an extra copy.
I have also tried to reduce total readback volume, hoping that the hardware could eventually produce rgb565 data directly from an RGBA8888 framebuffer.

The experiment was a total failure in the performance department. Maps never deliver their theoretical advantages as far as I’m concerned.

Cass,
in case you’re returning to this thread, the Forceware 66.93 ICD throws INVALID_OPERATION on ReadPixels to a PACK_BUFFER if the format is GL_BGR and the type is UNSIGNED_SHORT_5_6_5. This doesn’t happen if you don’t use PBOs. It also does not happen with format:=GL_RGB (which btw verified to me that the buffer was large enough, if that matters).
I don’t think this is conformant behaviour. Might want to look into it.

Originally posted by Robert Osfield:
Suggestions?
I wouldn’t do the upload in a seperate thread. glBufferData is supposed to be VERY lightweight, it just starts the transfer. So it doesn’t really matter if you call it in the thread that produces the data or in the render-thread.

A possible scenario would be a thread that generates the data of a frame (reading from disc, decompressing, …), then it signals the render thread to upload and continues generating the next frame. The render thread checks for this signal at a convenient point in the render loop (as far away as possible from the point where the data is actually needed, ideally in the previous frame), calls glBufferData and does whatever else it wants while the transfer is in progress.

This way you are not syncing any thread to another one beyond the usual producer consumer handling. That is, a thread never needs to wait for another one except when it runs out of data.

Originally posted by zeckensack:

Cass,
in case you’re returning to this thread, the Forceware 66.93 ICD throws INVALID_OPERATION on ReadPixels to a PACK_BUFFER if the format is GL_BGR and the type is UNSIGNED_SHORT_5_6_5. This doesn’t happen if you don’t use PBOs. It also does not happen with format:=GL_RGB (which btw verified to me that the buffer was large enough, if that matters).
I don’t think this is conformant behaviour. Might want to look into it.

Thanks, Zeckensack, we have a bug filed on this problem now. I’ll contact you offline when there’s some resolution on it.

Hi Zeckensack, I’m posting the response on this thread for simplicity:


Our behavior is compliant with the spec. I’m looking at version 1.4, table 3.8 – the table that gives the special interpretations of packed pixel formats. The only compatible pixel format for UNSIGNED_SHORT_5_6_5 and UNSIGNED_SHORT_5_6_5_REV is RGB.

Thanks -
Cass

Originally posted by Overmind:
[quote]Originally posted by Robert Osfield:
Suggestions?
I wouldn’t do the upload in a seperate thread. glBufferData is supposed to be VERY lightweight, it just starts the transfer. So it doesn’t really matter if you call it in the thread that produces the data or in the render-thread.
[/QUOTE]The call itself might be lightweight, but switching graphics context isn’t. Which is why I raised the point, since cass intimated the one could do the upload from the background thread, I was suprised by this since it’d require two makeCurrent calls and sync’ing between the threads to do it.

Originally posted by Overmind:
A possible scenario would be a thread that generates the data of a frame (reading from disc, decompressing, …), then it signals the render thread to upload and continues generating the next frame. The render thread checks for this signal at a convenient point in the render loop (as far away as possible from the point where the data is actually needed, ideally in the previous frame), calls glBufferData and does whatever else it wants while the transfer is in progress.

This is approach I’ve been on.

I also am curious about having the movie reading thread write directly to a mapped PBO, rather than have the movie thread write to its own image buffer, then copy this into the mapped PBO as above. Again this would require syncing of the two threads, which is not so good, but it’d be the movie thread rendering at 25Hz which will be waiting for the rendering thread which runs at 60+Hz, so it’d wouldn’t have to wait too long relative to it own update rate. The aim here would be to avoid another copy operation.

Now if you could have two threads holding the same graphics context at once then doing an async upload via pbo in the movie thread would be cool and avoid many of these issue. However, I’m not aware of a OpenGL graphics driver playing in this way. I stand to be enlighted though :slight_smile:

Robert.

:wink:

Robert Osfield:
(Pardon me, I may be a bit new at this:)

Could not just have two threads with two contexts. (sharing textures etc.) Then you could upload the texture data in one thread while still rendering things in the other thread?

Originally posted by Robert Osfield:
[QB]The call itself might be lightweight, but switching graphics context isn’t. Which is why I raised the point, since cass intimated the one could do the upload from the background thread, I was suprised by this since it’d require two makeCurrent calls and sync’ing between the threads to do it.[qb]
If a non-GL thread prepares data while the GL thread simply issues the Buffer{Sub}Data() call, you don’t need to do context switching.

In this particular case, I don’t know how much is really being saved over just calling Tex{Sub}Image() directly though. Both calls a data copy.