Updating persitent buffers

wookie · June 23, 2021, 7:27am

Hi.
In my code I have a case when I’d like to update only some portion of a persistent buffer (the buffer is mapped to CPU visible memory with GL_MAP_UNSYNCHRONIZED_BIT) - e.x. I have some array of structures and on the CPU I update only a few of these structs.

I know that I should synchronize writes and reads in these cases and that’s what I do for my per-frame buffers - I have a buffer x3 the size and just cycle between it’s ranges, making sure the range isn’t used with a fence.
But these buffers have new data every frame, and here only small portion on the data CAN change.
If I wanted to synchronize the access in a similar fashion as with the per-frame buffers, I’d have to send write all the data every frame, which kind of sucks tbh.

If there is a solution for that then I’d appreciate an answer, but what bothers me more is what happens if I don’t synchronize the write? By that I mean what I just memcpymy data to the pinned memory and the drivers issues a DMA during a rendering call uses the buffer that I’m updating?
Is this even a defined behaviour or there is no simple answer and thus it’s better not to do it, a it might cause crashes and etc.

Dark_Photon · June 23, 2021, 12:48pm

Please read this wiki page:

Buffer Object Streaming (OpenGL Wiki)

and then follow up if it doesn’t answer all your questions.

The fact that you’re using GL_MAP_UNSYNCHRONIZED_BIT with GL_MAP_PERSISTENT_BIT indicates some confusion on what these are even for. You typically wouldn’t use them together. See the wiki page for details.

And to answer one of your questions, yes. With either buffer object streaming mechanism (UNSYNCHRONIZED + INVALIDATE_RANGE or PERSISTENT + COHERENT), you can use the stream buffer as a GPU-side “cache” and not have to re-stream the data to the GPU after you’ve put it there in the first place. Just tell the GPU to re-read the data from the location you streamed it to last time.

Use a generational index (e.g. orphan count or sync count) to determine whether your cached GPU-side data is still current and can still be used.

wookie · June 23, 2021, 1:31pm

Okay I think I figured it out.
Just like the page says - don’t write to memory that might be used by the GPU, cause it’s undefined behaviour.
Given that and what you’ve sad about GL_MAP_UNSYNCHRONIZED_BIT with GL_MAP_PERSISTENT_BIT, I’ll map the buffer with GL_MAP_PERSISTENT_BIT and GL_MAP_FLUSH_EXPLICIT_BIT and use it in this fashion:

allocate the memory on the GPU for the array of structs, but make the buffer 2-3x the size of the CPU buffer
write the initial data for the structs
when data in one entry changes:
- write the updated data to the end of the GPU-side buffer (we made it 2-3x times larger)
- add the current entry to a free list, so we can use it for update later, when it won’t be used by any rendering commands
- create a fence for the free list of entries created during this frame
- ~~issue a memory barrier glMemoryBarrier(GL_CLIENT_MAPPED_BUFFER_BARRIER_BIT) so the updates are visible to the GPU~~ (this is wrong due to the error in OpenGL wiki)
- call glFlushMappedBufferRange on the updated portion of the buffer so the updates are visible to the GPU

@Dark_Photon what do you think?

Alfonse_Reinheart · June 23, 2021, 2:10pm

That is needed for the CPU to see what the GPU has modified. So its backwards from what you want. You instead need to flush the range the CPU wrote to in order for the GPU to read it (assuming you’re not just using coherent mapping).

Note that this was an error on the Wiki page (which itself was an error copied from the OpenGL specification that was since corrected).

wookie · June 23, 2021, 2:13pm

I see. I’ll edit the my reply in case someone stumbles on it in the future.

Dark_Photon · June 24, 2021, 1:38am

Overall, sounds fine. Just a few suggestions:

Do you need a free list? Just update the generational index in your state data to point to latest version, and forget about the old one. It’ll get reused later when you wrap the ring buffer.
If your number of buffer content updates per flush is 1, consider just using MAP_COHERENT with MAP_PERSISENT and get rid of the flush
If you’re not comfortable with this sync object stuff, just use MAP_UNSYNCHRONIZED and orphan the buffer when it’s full. Then there’s no need for sync objects.
However, you should avoid doing this if your app is very dependent on the NVIDIA driver’s “Threaded Optimization = ON” option for good perf. Benefit or loss on this is hit-and-miss, based on the app and its GL usage.

2014-01 - Beyond Porting - Modern OpenGL (NVIDIA) – see pp. 15,21-22
2012-03 - Efficient Buffer Management (NVIDIA) – see pp. 35-41,49-52
(NOTE: MAP_NOOVERWRITE ~= MAP_UNSYNCHRONIZED)

Dark_Photon · June 24, 2021, 12:32pm

Oh also…

While you can fence on every frame that you update the buffer, you don’t have to. You could instead fence on pages (subregions) of your buffer object.
With this, you don’t have any per-frame operations and may never need to fence on most update frames. This can all be handled completely inside of your buffer object update method.