Don’t bet on it. “Vertex arrays” typically beat out “classic VBO” batches with smaller batches, on NVidia at least. Binding buffer objects isn’t cheap!
Another thing you can try is put your batches in display lists. That’ll give you the fastest perf for a given set of batches.
The only way I’ve made batches fly like NVidia display lists, with VBOs or otherwise, is to use NV bindless extensions with VBOs, but that is unfortunately still NVidia-only (more details on this below). Hoping for something in OpenGL 4.2 along these lines, to get rid of a bunch of the CPU-side memory access inefficiency in the driver when submitting small VBO batches.
One other option for static VBOs is to use VAOs. That’ll speed you up some, but it won’t get you to the performance of display lists or bindless+VBOs (the latter two are pretty much equals).
I draw up to 20 particle systems containing up to 400 particles each. Under the heaviest load, I’m using around 20-250 particles in each of the 20 systems for a total of about 2000 particles.
In addition to the above, another option you can try is putting all of your batches in one VBO. Then when you need to draw another batch, there’s no need to bind another buffer with the VBO path.
Another option of course is to use larger batches, but don’t get too big as this hits your culling efficiency. There’s a balancing act here depending on the CPU and GPU horsepower you’ve got to work with.
To draw each particle system with vertex arrays:
To draw with VBOs:
I find it odd that you are rendering one with DrawArrays and one with DrawElements. I also find it odd that you are rendering one with a dedicated interleaved array call and in another case rendering interleaved arrays by making the appropriate pointer and enable calls.
If you are trying to do an apples-to-apples comparison here, you should be using the same batch registration technique, same type of batch, and same batch data with both vertex arrays (client arrays) and VBOs (server arrays).
I have tested on Linux with a GeForce 6800GT and 8800GTS with current drivers. I really expect VBOs to be faster, but they aren’t yet. Anyone have any suggetions? Is it possible that my hardware is just too old?
It’s not necessarily that. …though it is getting old. …especially that 6800.
It’s typical to see this when you’re CPU bound. This is more likely to happen when you have smaller batches. Fast GPU and slow CPU aggravates the problem.
On NVidia, try display lists (one display list per batch; only put the batch in the display list; no state changes – that is, only put your buffer binds, pointer calls, pointer enables, and batch calls in the display list) and baseline all your performance measurements as a percentage of that. Then try client arrays and VBOs+bindless. You’ll likely find you can get the display list performance without the display list compile times from VBOs+bindless:
I’ve posted on my batch perf experiences here in various posts, but here’s one thread where I give some example code showing how to render batches with bindless VBOs side-by-side with plain “classic” VBOs. Just change the #ifdef to switch from one to the other.
Bindless of course currently isn’t a good option unless you can presume an NVidia G80+ GPU (GeForce 8 or better), or are open to a run-time switch on which draw path to use based on the available OpenGL extensions. Your GeForce 6800 wouldn’t support this path. But it would support display lists. Also, VBOs were even more slow on older GPUs (IIRC from years back). Though this may have been primarily due to CPUs and CPU memory being slower then.