glBufferData variant with retained data copying

I understand the motivation for the post, but can probably also see that there are likely as many different access/modify/draw patterns as there are applications.

There are not that many buffer API’s. There is VBO, and there are copying and non-copying API’s for putting data into them. (The copying ones being BufferData and BufferSubData, the non copying one being MapBufferRange.)

In general ISTM that the best way to avoid blocking is to think ahead of time, how can I structure my data deliveries such that I don’t have the GPU wanting to access the thing I am modifying, which usually leads to one side waiting, or the other side getting the wrong data at the wrong time (*).

As alluded to above, in the best case you have a stream of commands in flight that will allow the GPU to complete work on one buffer and switch over to a different buffer without skipping a beat - but that this requires more storage. Not all that different from ping-ponging in DMA sound hardware - you try to have buffer N+1 filled up and ready for consumption well before buffer N is consumed - the hardware is really only interested in the current buffer but it can’t hop smoothly to the next chunk of work until that buffer is ready.

If that kind of storage investment is too high, then you can trade space for time again, but then you may go slower.

With the CopySubBuffer API, you might be able to set up a cascade, where the GPU is consuming data from a finished buffer/mesh, meanwhile you have a separate buffer mapped, and are writing new sections of data into it - followed up with a series of CopySubBuffer calls which could complete in-order to make the final delivery of the updates into the dest buffer.

On the other hand, if analysis shows you that you are changing 75% or more of the vertices in the buffer per draw, then just orphan it (glBufferData(NULL)) and re-fill it, the GPU can keep drawing out of the nameless orphan while you fill up the next one.

I understand the motivation for the post, but can probably also see that there are likely as many different access/modify/draw patterns as there are applications.

I believe you missed the point of Dark Photon’s post. He wants driver developers to inform users as to what the optimal usage patterns are, so that they can code their applications properly.

There are substantive questions about il-defined portions of the specification. The specific meaning of STREAM vs. DYNAMIC vs. STATIC, for example. How much respecifying of vertex data makes something count as STREAM instead of DYNAMIC. Because the specification does not explain what the implementation does with these hints, it is otherwise impossible to know which one to use for your scenario.

There is also the issue of mapping the buffer vs. using BufferData(NULL) and BufferSubData. There is no guidance on what the correct way to do these things are.

These are all things that have an overall effect on performance. But there is little guidance on the proper way to stream vertices. There is some lore floating around, but nothing concrete.

Hmm, I always believed STREAM doesn’t keep a copy in RAM after use (you know, for restoring state on gpu-reset on i.e resolution-change). DYNAMIC looks like it would keep data in RAM and allow DMA on first use, copy to VRAM after that.