can someone explain slowly and in more details why GL_MAP_UNSYNCHRONIZED_BIT can be bad (multithreaded driver) and can cause a performance bubble (stall between client and server thread)? I am referring to the talk from John McDonald about persistent mapped buffers (see 4:10). My brain is too slow to understand his explanation (english is not my mother tongue). ^^
Thanks in advance.
The problem is not the bit itself. The problem is that the operation of mapping itself requires some time.
OpenGL allows you to do some rather unfortunate things to buffers. You can reallocate them, forcing the driver to allocate new video memory. And so forth.
However, doing a reallocate of buffer storage is itself an OpenGL command. So it may not have finished, or even started, its execution by the time you get around to mapping it. And commands issued prior to the buffer reallocation may also be not have started their execution. Those commands need to see the old memory, so the OpenGL server thread won’t necessarily allocate that new memory immediately. It has to do it at a later time.
The problem is that, while all of those other commands can execute asynchronously, glMapBufferRange must return a pointer right now. Therefore, whatever work that may be outstanding on that buffer (a pending memory allocation, etc) must be completed before a pointer can be returned. That’s where the performance stall comes from.
The unsynchronized bit gets around stalls due to the contents of the buffer still being in use. But it doesn’t get around issues related to the storage itself potentially in the middle of being modified. Granted, I don’t understand why the implementation can’t avoid the stall even if you’re not reallocating buffers and such. Or why it doesn’t avoid it if you use unsynchronized with invalidate. But that’s the general idea of the problem.
Notice that ARB_buffer_storage actually does two different things. It provides persistent mapping capabilities, but it also creates a new way to allocate storage for the buffer which, once allocated, can never be reallocated. And it ties the two elements together; you cannot map persistently from a buffer that is not immutable.
It’s possible that a driver could remove the stall if you do an unsynchronized map for an immutable storage buffer. And therefore, if buffers had been immutable to begin with, we wouldn’t have had this problem. But that’s OpenGL.