VBO performance drop with newest drivers/hardware

We recently upgraded our equipment.
My workstation went from Quadro FX 3800 to Quadro 4000.

In the process the application I develop as seen performance drops.
I’ve been able to identify where it comes from: loading VBOs is much slower as it used to be.

The newest the drivers, the worst it is from I have been able to gather. (there is a very big difference in behavior between 260.19 and 295.20)

here’s a sample of code,
has something been deprecated or know to perform less efficiently on fermi gpus?

    GLfloat* dummy = new GLfloat[x.size()*3];
    for (int i=0; i < x.size(); i++)
        dummy[3*i    ] = x[i] - camera->xOffset();
        dummy[3*i + 1] = y[i] - camera->yOffset();
        dummy[3*i + 2] = z[i];


    // Bind it
    glBindBuffer(GL_ARRAY_BUFFER, vboData);

    // Allocate storage for it
    glBufferData(GL_ARRAY_BUFFER, x.size()*3*sizeof(float),
                 0, GL_DYNAMIC_DRAW);

    // Get a pointer to that storage....
    GLfloat* positionBuffer = (GLfloat*) glMapBuffer(GL_ARRAY_BUFFER, GL_WRITE_ONLY);

    // if valid, fill it with the data
    if (positionBuffer)
        memcpy((void*)positionBuffer, (void*)dummy, x.size()*3*sizeof(float));

        positionBuffer = NULL;

    // Unbind the buffer....
    glBindBuffer(GL_ARRAY_BUFFER, 0);

Thanks for any help and advice…

Something peculiar I noticed, the application performs
much better (or more acurrately to previously observed levels) inside gDEBugger!

How often you perform this code? Once, once per frame, or several times per frame?

Do you really need a copy of the values in CPU memory? If not, you could directly fill into positionBuffer, so you would not need to use memcpy().

Have you tried using glMapBufferRange() instead, with buffer invalidation? Calling glBufferData() is then probably not useful more than once initially.

I perform this code when the user is asking for different datasets… so it doesnt happen too often, it is not a per frame code.

Before I posted, I was writing directly into positionBuffer, I changed it to see if it was faster this way.

At the beginning of this method there was 2 lines:

glDeleteBuffers(1, &vboData);
glGenBuffers(1, &vboData);

Removing them has given me back the performance I was used to.

So I have a (partial) solution, but I’d like to know what’s going on. It seems something is different in the way the new Nvidia cards handle buffers on the GPU, as this piece of code was never a problem up to now (on nvidia or amd hw).

I still have massive problems when resizing window…(deleting/creating framebuffer / frame)

You can react to resize in different ways. First, you can resize as soon as you get the event. If you do this, it may be bad for resize performance. An alternative strategy is to take not that resize is needed, and do nothing, and in beginning of your render code, check if resize is needed and only then resize.

I am currently using the latter strategy, but as a result there is no rendering until window resize is finished (resize widget released). There are ways to improve this. You could combine first and second strategy by limiting the frequency how often resize is handled, so that any resize too soon after previous resize would be marked to be handled in next render instead. Or you could grab the latest framebuffer as texture and render that, however then you’d get some aspect ratio issues. Or just render a gray window while resizing.

Thanks for the good tips tksuoran.
I know I can get around the performance issue, but the essence of my question was more “how can a simple driver update can create such problems?”

Turns out givinp up on glMapBuffer and using glBufferData instead gave back smooth results for every config I tried.

I thought the glMapBuffer technique was well established…
It’s a bit worrying…

What is “the glMapBuffer technique”? There are many ways to map buffers for streaming, and some of them may not achieve the performance you want. The OpenGL Wiki has a page on this issue.