VBO vs vertex array performance on 8800 cards

I suspect I’m running into a driver issue here, so I hope I put this in the right forum. I’m trying to tune performance for a new screensaver that draws implicit surfaces with around 8000 vertices at most (VBO size < 200,000 KB). They’re broken down into tristrips with an average size of about 4.001.

For vertex array draw:
glDrawElements(GL_TRIANGLE_STRIP, triStripLengths[i], GL_UNSIGNED_INT, &(indices[start_vert]));

For VBO draw:
glMultiDrawElements(GL_TRIANGLE_STRIP, (const GLsizei*)(&(triStripLengths[0])), GL_UNSIGNED_INT, (const GLvoid**)(&(vbo_index_offsets[0])), num_tristrips);

I draw each surface between about 5 and 20 times, so I expect VBOs to outperform vertex arrays by quite a bit. On a Quadro FX 3400 under WinXP I get about 40-50% better performance with VBOs. The same is true of a 6600GT under WinXP and Fedora 11.

However, I get about 50% better performance from vertex arrays using a 8800GTS under Windows Vista and Fedora 7 and about 100% better performance on a 8800GTX under WinXP.

Anyone know of any hangups with the 8800 hardware or drivers? This behavior is very inconsistent with the older NVidia cards I have tried.

As an aside, upgrading to the newest Windows driver (197.45) for both XP and Vista adds a new vertex array performance problem: I get about 10 slow frames (0.5-1Hz) when I start the screensaver before they suddenly start rendering at full speed (15-30Hz).

GL_TRIANGLE_STRIP is really suboptimal for both methods, you need to be using GL_TRIANGLES (and setting up your indexes for that) instead. This will be particularly true with newer hardware. It’s the same principle as applies to hard disks and network connections: very few (or one) large action(s) will always outperform a high number of small actions.

As it is you’re making about 2000 draw calls per pass which is way way too much and does not play nice with modern hardware at all. It looks like you can reduce this to one draw call, so that’s the first thing to do.

It’s also possible that by using a GL_UNSIGNED_INT index format you’re sending your VBO through a software emulation path; try using GL_UNSIGNED_SHORT instead. Although this would be expected to affect older hardware more, I’d still advocate use of GL_UNSIGNED_SHORT (even it it means breaking the batch) all the time for better compatibility with customers machines.

Thanks mhagain. I was definitely hitting a slow path with GL_TRIANGLE_STRIP VBOs on the 8800s. It helped that situation a lot, and improved the performance on the other cards a bit too.

I’ve read that unsigned short should be faster, but it has never helped me for some reason. It’s probably just my particular use case.

Disable NVIDIA PowerMizer. This feature slows down the GPU clock and it takes ages to recover back to full speed.
Look to NV control panel, set “Power management mode” to “Prefer maximum performance”.