VBO perfomance drop after some BufferData calls


We discovered strange VBO performance behaviour, when updating it’s data via glBufferDataCall with the same size but NULL pointer, and then mapping/writing new data/unmapping it.
Issue was found on G80, WinXP, 178.13 drivers.

The legend is that we decided to test some stripping issues and made simple app to test perfomance on rectangular grid of tesselated batches.
We have some limited number of VBOs (number of cells), each cell consists of array buffer and element array buffer (of about 20K simple vertices, each vertex is 24 bytes).
We have slider controlling some parameter (height, for example, or geometry form). Changing it causes VBO to be recreated without changing it’s size.

Firstly, we deleted and created VBO anew (with glDeleteBuffers and glGenBuffers calls). Everything worked fine as expected, changing this local parameter didn’t seem to affect FPS.
Then, we decided to update VBO without deleting it. For each update we do:

  1. Call glBufferData(size, NULL) (size remains the same) to avoid CPU stall
  2. Map it
  3. Write data
  4. Unmap
    Rather common scheme, I hope.

And we got strange situation:

  1. After about 9-10 iterations we got FPS decreased in 2.5 times for GL_STATIC_DRAW (iterations number depends on data size)
  2. After about 3-4 iterations we got FPS decreased in 2.5 times for GL_DYNAMIC_DRAW (iterations number depends on data size)
  3. If we change data size (for example, add one more vertex) - everything turns to normal. But every following VBO update with size remaining constant causes perfomance to be dropped instantly by one update.
  4. If we pass data right in glBufefrData call, without mapping/unmapping, everything is fine.

Any suggestions?

Appreciating your replies,

Have you tried just using glBufferSubData (instead of mapping/unmapping) or just not calling glBufferData for that matter? According to the docs, glBufferData creates a new data store for the buffer object currently bound to the target. Any pre-existing data store is deleted. I would expect that if your size is constant, you don’t want this behavior…

its apparently not good to recreate buffers with glBufferData(size,NULL). its same as uploading textures every frame with glTexImage2D(). the driver has to check for memory and does many operations to ensure your hints (STATIC/DYNAMIC/…) are satisfied. that for you have few variations like map/unmap or glBufferSubData calls which are optimized for that only purpose. creating buffer with DYNAMIC hint will place that data in proper memory for fastest updating possible. about the size of the buffer which changes dynamically i have no idea how to handle that rather then provide a big enough buffer at startup and a vertex limit for updating it.

Nobody says it’s our common way of living - recreating buffer each frame. Sure, it’s no good.

I only pointed on such a problem. We don’t recreate buffer each frame, but if it has to be changed (and we don’t know about it’s new size, whether it would be equal to previous one or not), so what we should do?
Imagine, sometimes we have to recreate buffer for some reasons. For example, upcoming data is changed. So, we have 2 options: totally recreate buffer with deleting/creating new one, or resetting current buffer (with map/unmap. or without).
And we are doing all these things according to spec and nVidia VBO usage recommendations.

I wanted only point on that. That using such a usage scenario causes perfomance to drop. And any step away from this absolutely coorect usage scenario return everything to normal.

DataSingleton const* singleton = DataSingleton::Create();

if (vertex_buffer_id == 0)
   glGenBuffersARB(1, reinterpret_cast<GLuint *>(&vertex_buffer_id));
glBindBufferARB(GL_ARRAY_BUFFER, vertex_buffer_id);
glBufferDataARB(GL_ARRAY_BUFFER, singleton->vertices_size, NULL, GL_STATIC_DRAW);
Vertex * vertices = reinterpret_cast<Vertex *>(glMapBufferARB(GL_ARRAY_BUFFER, GL_WRITE_ONLY));
memcpy(vertices, &singleton->vertices.front(), singleton->vertices_size);
glBindBufferARB(GL_ARRAY_BUFFER, 0);

You can use the GLExpert tool to get information about location where the the VBO resides. It is possible that the driver tries to be too smart and moves the buffer to more CPU friendly (and slower from GPU) memory if it detects that you updating it more than few times. Nvidia drivers have this habit.

Thanks, Komat, will try out this tool.
We thought this way - about driver’s being too smart. Looks like it is so.