I understand the motivation for the post, but can probably also see that there are likely as many different access/modify/draw patterns as there are applications.
There are not that many buffer API’s. There is VBO, and there are copying and non-copying API’s for putting data into them. (The copying ones being BufferData and BufferSubData, the non copying one being MapBufferRange.)
In general ISTM that the best way to avoid blocking is to think ahead of time, how can I structure my data deliveries such that I don’t have the GPU wanting to access the thing I am modifying, which usually leads to one side waiting, or the other side getting the wrong data at the wrong time (*).
As alluded to above, in the best case you have a stream of commands in flight that will allow the GPU to complete work on one buffer and switch over to a different buffer without skipping a beat - but that this requires more storage. Not all that different from ping-ponging in DMA sound hardware - you try to have buffer N+1 filled up and ready for consumption well before buffer N is consumed - the hardware is really only interested in the current buffer but it can’t hop smoothly to the next chunk of work until that buffer is ready.
If that kind of storage investment is too high, then you can trade space for time again, but then you may go slower.
With the CopySubBuffer API, you might be able to set up a cascade, where the GPU is consuming data from a finished buffer/mesh, meanwhile you have a separate buffer mapped, and are writing new sections of data into it - followed up with a series of CopySubBuffer calls which could complete in-order to make the final delivery of the updates into the dest buffer.
On the other hand, if analysis shows you that you are changing 75% or more of the vertices in the buffer per draw, then just orphan it (glBufferData(NULL)) and re-fill it, the GPU can keep drawing out of the nameless orphan while you fill up the next one.