…When an application maps a buffer, it is given a pointer to the memory. When the application finishes reading from or writing to the memory, it is required to “unmap” the buffer before it is once again permitted to use that buffer as a GL data source or sink. Mapping often allows applications to eliminate an extra data copy otherwise required to access the buffer, thereby enhancing performance…
Does anyone know how eliminate this extra copy and get enhanced performace on NVIDIA GPUs? All 90 series or later NVIDIA drivers seem to always copy the VBO between video and system memory after a map/unmap killing any performance gains. Is there any way to keep the VBO pinned in video memory? The usage hints seem to have no effect.
If I’ve understood things correctly, when you draw from a buffer to OpenGL (using glVertexPointer etc), the OpenGL implementation has to copy the data into an internal buffer before returning from the draw call (glDrawElements or similar). It can then upload the data from that internal buffer to the video card while your app does other things.
The alternative would be to wait for the upload to complete before returning, as the OpenGL implementation has no other way to be sure that the memory you’ve pointed to is still valid after the draw call returns.
Using a VBO, the OpenGL implementation effectively gives you direct access to this internal buffer, and so it doesn’t have to copy data into it, since you fill it directly.
I believe this is the eliminated copy the text mentions, since afaik there’s no way for an app to directly access video memory.
Is there any way to keep the VBO pinned in video memory? The usage hints seem to have no effect.
The implementation allows this behavior from a driver when you map/unmap it. This allows a driver to create VBOs for memory that you can’t actually map, or those that are currently in use (which may be your problem).
If you had read the document that Timothy Farrar referenced in his post you would have discovered your conclusion was false. VBO simpler alternative to VAR. VAR allows explicit mapping of video memory as describe in the paper available on this page. http://developer.nvidia.com/object/Using_GL_NV_fence.html
Similar experience here on NVidia with map/unmap (slower), but I’ve found that using a fixed (max) sized null glBufferData (to dump the old contents) followed by glBufferSubData call each time to reload works fastest, particularly with PBOs. The intuition I read was that if you tell GL you don’t care about the old contents, and then provide an update, yes it’s got to copy it immediately to the GL driver, but it doesn’t stall on previous uses of the buffer.
Those claiming glBufferData or glBufferSubData is faster are still living in a single threaded / single CPU world. Mapping allows drawing from one VBO to occur in parallel with filling another. What could be faster?
You missed the point. Why keep the driver/GPU busy transferring vertex and index data when it could be receiving drawing commands. Mapping allows the vertex and index data transfers to occur with minimal driver/GPU activity. The thread filling the VBO’s does not need a OpenGL context because it is not interacting with the driver.
No worries. Always good to get misconceptions cleared up anyway.
I must admit I never had the use for mapping a VBO, so I skipped those parts when reading about them. I just assumed the driver uploaded the data using DMA or something, and thus that it was faster for it to do that from an internal buffer instead of having the app wait for the upload to complete.
In regards to GPU waiting when using glMapBuffer(), this is probably the most important thing to gleam from the NVidia doc,
To solve this conflict we you just need to call glBufferDataARB() with a NULL pointer. Then calling call glMapBuffer() tells the driver that the previous data aren’t valid. As a consequence, if the GPU is still working on them, there won’t be a conflict because we invalidated these data. The function glMapBuffer() returns a new pointer that we can use while the GPU is working on the previous set of data…
Basically always insure the GL driver doesn’t have to block waiting for the GPU to flag that it is finished with the previous frame’s VBO.
Can I assume that the same is true for PBO’s? From what I’ve understood, which I obviously can’t rely on anymore, PBO and VBO are essentially the same in the way the buffers are handled. Is this correct?
Why keep the driver/GPU busy transferring vertex and index data when it could be receiving drawing commands.
Because you wanted to upload data to the GPU. That requires talking to the driver.
OK, let’s say you do this two rendering thread thing, where you have a mapped pointer in one thread and you’re rendering in another. What happens if the driver in the rendering thread suddenly decides that it needs to pull your buffer out of video memory and put it into main memory to make room for a texture?
There must always be communication between the mapped buffer and the driver.
Mapping allows the vertex and index data transfers to occur with minimal driver/GPU activity. The thread filling the VBO’s does not need a OpenGL context because it is not interacting with the driver.
What if you’re rendering from that buffer when you decide to go mapping it? The driver has to ensure that the previous rendering command finishes before mapping it. And that requires access to the context.
Now, GL 3.0 will offer an ultimate form of mapping which provides you with absolutely no guarantees on anything; it just hands you a pointer and you’re expected to ensure that the data isn’t being read from/etc. But GL 2.1 doesn’t have any such concept.
And even in GL 3.0, it won’t be some magical process that can happen without the driver’s consent; it will still need to know about it.
Can I assume that the same is true for PBO’s?
They’re all just buffer objects. The fact that one gets bound to a gl*Pointer slot and the other gets bound to a PACK/UNPACK slot is fairly irrelevant to how you access the data.
What happens to a mapped buffer when a screen resolution change or other such window-system-specific system event occurs?
RESOLVED: The buffer’s contents may become undefined. The application will then be notified at Unmap time that the buffer’s contents have been destroyed. However, for the remaining duration of the map, the pointer returned from Map must continue to point to valid memory, in order to ensure that the application cannot crash if it continues to read or write after the system event has been handled.
Where did you get this fact? Once the driver produces a pointer it is valid for all threads. No driver activity is involved in using the pointer.
Excerpt from ARB_vertex_buffer_object:
…The expectation is that an application might map a buffer and start filling it in a different thread, but continue to render in its main thread (using a different buffer or no buffer at all)…