VBOs: Various questions

I understand that glBufferSubDataARB and glMapBufferARB won’t return until the GPU has finished using the corresponding bit of data, yet you can discard the buffer (I believe it has to be the entire buffer) and map it, avoiding any synchronisation issues (glBufferData with NULL, then glMapBuffer), while glBufferDataARB is less costly than glMapBufferARB AND doesn’t suffer from being locked out by the GPU.

I would have thought that it would be glUnmapBuffer that had to synchronise with the GPU and that glMapBuffer wouldn’t need to wait for the GPU to finish using the data. Why is the synchronisation on glMapBuffer?

Will there still be synchronisation issues if you call glMapBuffer, asking for read only access?

If there’s no synchronisation with the GPU on mapping read only buffers but there is for write/read-write, wouldn’t it be better for glMapBuffer to automatically invalidate/discard the buffer in use by the GPU, for write/read-write access? Why wouldn’t you call glBufferData with NULL followed by glMapData, as advised by nvidia’s VBO white paper?

Presumably, if I wanted to change just one vertex colour in a large VBO, I would be better to do a glBufferSubData over a ‘discard and map’, or even a glBufferData. But then if I wanted to refresh an entire vertex buffer, with completely new data, I’d want to do a glBufferData above all. If so, then surely there’s a point at which it becomes more efficient to do a complete upload (glBufferData or discard and map) over doing a glBufferSubData, due to the latter being locked out? Has anyone investigated this and produced any performance comparisons?

Finally, when would you want to do the ‘discard and map’ (glBufferData with NULL, then glMapData) over doing the plain old upload (glBufferData)?

Why is the synchronisation on glMapBuffer?
Because glMapBuffer may want to return the actual memory pointer of the data. The user may want to poke at it anywhere. Given that the memory may currently be in use, and thus having the user poke at it might be bad, glMapBuffer has to provoke a sync operation.

Will there still be synchronisation issues if you call glMapBuffer, asking for read only access?
I think the better question is, why are you trying to read from a buffer object to begin with (unless you’re using PBO or something)?

Most of the topics discussed here are implementation specific. I speak with knowledge of ATI’s current implementation and with that in mind realize other hardware venders may have done things differently.

Currently glMapBuffer always causes an idle unless the data is unspecified (ie you called bufferData with a null data pointer). Korval explains why there is a sync at map time. I will tell you that ATI’s read vbo is not currently optimized and will be slow. This may change in the future. I advise avoiding this if you can even if implementations are optimized since no matter how fast we make it will almost always be slower than the app keeping a copy for itself.

Here are some of the ways you can update data (skipping obvious ones) and the situations I would use them for. (Hopefully this will answer the rest of the questions sprinkled throughout your post.)

BufferSubData() - This will not cause any syncing with the vpu. This is the usually the best way to update a portion of a buffer object and is even good for updating the entire buffer.

glMapBuffer() - This will cause the sync you are worried about. The only time you would want to use this is if you know you haven’t used the object buffer in many frames so it’s safe to change without worrying about an update, but in most cases I would just advise using BufferSubData(). There are a few other cases it?s useful, but these aren?t performance cases to I?m not going to mention them.

BufferData() with a null data pointer followed by MapBufferObject() - This will not cause an idle since the current surface bound to the buffer object is undefined (ie we just allocated a new one). Usually this is the same performance as just calling BufferData() with a pointer to the data, with a slight increase in overhead from the extra entry point call. The place were this method is best is if you are generating geometry on the fly and would rather put it directly into the vbo and never place it in an app controlled buffer.

As a bonus performance tip I?ll recommend trying not to resize a buffer object frequently. While the performance hit for this isn?t that bad, it can cause a noticeable performance drop if abused badly. The reason this is bad is because we have to constantly find chucks of memory that fit your new requested size and this can lead to memory fragmentation.

Thanks very much. That’s helped clear a few things up.