VBOs strangely slow?

Imagine you have more data than will fit on a GPU and you can’t display a “Loading” screen as the character moves–very quickly.

So you are streaming. Then it isn’t a static buffer, is it :wink:

In order to do a streaming world, you have to have some memory set aside for doing streaming into. And since you’re streaming to the GPU, this would include buffer objects.

These buffer objects, just like the streaming space in main memory, are not currently in use. They’re not currently being rendered from. So there’s no need to orphan them. Just upload data to them, and when you need them, display them. If you need more time, then extend the boundaries of the streaming blocks.

Even across a PCIe bus, you can expect 1GB/sec transfer speeds. So in approximately 1 second, you can replace the entire contents of your GPU’s memory.

So just make sure that you pad your streaming time by, say, 0.5 seconds. If you are streaming X segments from disk, and it takes on average 1.5 seconds to get that data from disk, make sure that your application has a 2 second window between the time it provoked the streaming and the time it starts using it.

I think I can figure out when nothing is ready.

How? If you think stalls are being created from a buffer object not being finished uploading, how do you know that a smaller buffer is finished uploading?

Thanks Alfonse. That’s a good way of looking at it. I’ll have to figure out how to page ahead for my application.

Client arrays for VBOs. Slick. Thanks for the detailed write-up!

hi,
just for me understanding correctly:

orphane a buffer means glBufferData(…, NULL); or map using invalidation?

Yes. Per Rob in previous post:

There are two kinds of invalidation that MapBufferRange can do, and they have very different purposes.

One is tied to MAP_INVALIDATE_BUFFER_BIT in the access parameter to MapBufferRange. This essentially means “orphan”. So in the usage I described, you could set this bit when you go back to offset 0 and get the same effect as BufferData(NULL).

The other is a bit more subtle, and it is tied to MAP_INVALIDATE_RANGE_BIT. This may seem a bit redundant, but it is important. It explicitly tells the driver up front “the range I am mapping - it does not need to contain valid data that I can read” - it is a signal to the driver that it is free to replace every single byte in that range with whatever is in your CPU-visible mapped buffer area upon unmap (or explicit flush).

The freedom this provides to the driver, if you have also set the WRITE bit but not the READ bit, is that it can hand back a pointer to completely uninitialized scratch memory - which may well be driver allocated for write-through uncached access etc. By opting into invalidation of the range, you eliminate any need for the driver to put a copy of valid data in that range prior to returning the pointer. If an implementor wanted to keep system-memory images of buffers to a minimum, this would let that driver provide scratchpad memory for maps using these bits (write + invalidate-range) - and then transfer those bits to the final destination later, perhaps via DMA.

Restated more simply, think of MAP_INVALIDATE_RANGE_BIT as a “promise to write the whole range, nothing but the range, and never read from the range” bit.

Great info in this topic, thanks everyone! Quick question…on OpenGL implementations without MapBufferRange support (i.e. OpenGL ES), are there any good alternative ways to implement the dynamic vertex ring buffer that Rob suggested? It sounds like glBufferSubData has some pitfalls in the general case (and with the style of workload described). Is it best to stick with standard non-VBO vertex arrays in this case? Thanks!