Ideal vertex buffer sizes for various vendors

I work with mainly Quadro variants of the GeForce4, a few Wildcats, and some 9700s. I have a fairly flexible batching system, and I’d like to know what the maximum vertex buffer size for best performance is for most vendors. I’ve reached the point where huge batches are clearly slowing down the 9700s, so I’ve tried limiting them based on vertex count and index count, futzing with the 65k mark. I’d appreciate some estimates, based on your observations.

Does the number of indices called with DrawElements matter more or less than the size of the vertex buffer being accessed?

Data is not interleaved, and ranges from 1 to 10 million vertices viewable at any given time.

Before ordering the indices for cache efficiency, I’m getting about 60M verts/s, with triangle lists, on the 9700. Seems pretty low. Sometimes it’s actually faster to do ARB_fp lighting, and lighten the load on the transform / lighting side.

[This message has been edited by CatAtWork (edited 10-02-2003).]

DirectX uses half-words (2 bytes) for indices.

This means that hardware is likely optimized for using GL_UNSIGNED_SHORT for your index data type, which means the vertex array itself shouldn’t be wider than 65k verts.

In addition, I’d try to keep the total number of verts < 65k (actually, we limit artists to 32k verts – ours is a real-time application after all, intended to run on a GeForce 2 MX :slight_smile:

If geometry throughput is of importance to you, I suggest using ARB_vertex_buffer_object. It’s likely to increase throughput compared to regular vertex arrays, or compared to LockArraysEXT.

I don’t know about ideal, but the maximum size for a single buffer on the 9700 is 32 MB. .

This is a military viz application, so I can’t really skimp on vertex detail. I could force Maya to export several LODs, but some data I get from a FEM package, and this horribly reverse-engineered file format.

Currently using display lists, as all of the geometry is static. I tried VBO, and it still has issues on both ATi and NVidia hardware. Any reason to use VBO over display lists?

Originally posted by CatAtWork:
Any reason to use VBO over display lists?

From my experience of Nvdia drivers VBO’s are significantly faster than display lists for models where there are a moderate number of large geometries.

However, for a large number of smaller geometries display lists, or sometimes even no display lists and no VBO’s work the fastest. This indicates that the NVidia OGL driver arn’t yet optimized well for large numbers of VBO’s, a situation which should improve over time.


I tried VBO, and it still has issues on both ATi and NVidia hardware.

What issues are you having?

VBOs not being faster than display lists, mainly. There was also an issue with NVidia 44 or 45 Dets, in which drawing my data to a depth texture was causing a crash in nvoglnt with VBOs.

On some specific scene, upgrading from 44.03 to beta 52.10 boosted my frame rate from 80 fps to 230 fps. I was rendering about 100K polys without any big fragment effects. (video is GeForce FX 5900 ultra)

Drivers 45.23 just killed the frame rate on nearly all situations (fill rate, transfer, even raster limitations !)