Originally posted by Madoc:
Seems strange that drawarrays should be slower than immediate mode. It should still alleviate a good deal of CPU work and I thought it facilitated DMA transfers. With some of the HW I used to work with waaay back, drawarrays was actually the fastest method, faster than drawelements. It’s been far too long since I used it so I can’t say about any recent experiences with it.
If you render a lot of static geometry with a single DrawArrays call, it should be faster, I agree.
If you’re changing pointers frequently and rendering with lots of DrawArrays calls, it may well be slower.
If your geometry is dynamic, and you build the whole array up front beforehand, then you may not be getting the CPU/GPU parallelism that you would with immediate mode.
Mainly, I wanted to point out that “arrays are faster” is not a simple truism. In order to make things faster, the feature/mechanism must be widening a bottleneck that is currently limiting performance.
you made we want to bring up another question. There’s been a few discussions about large vs many small VBOs. You said the cost was in the glpointer calls. What I didn’t find clear is whether the cost of these calls is greater when a different VBO is bound or if it’s the same even under the same VBO.
In other words, as an example, would be well off binding a single VBO and then specifying different offsets through glpointer calls (possibly maintaing smaller index formats) or should we minimise the number of gl*pointer calls and use larger indices and rely on DrawRangeElements to reduce the index sizes?
This will vary some among implementations, but for NVIDIAs, the performance will be mostly driven by the number of gl*Pointer calls, not so much by how many VBOs are involved.
Too many VBOs and you pay some (marginal) penalty for more frequent VBO state changes. Too few VBOs and you pay a (potentially very high) penalty for forcing a coherent CPU/GPU view of an unnecessarily large chunk of memory. Forcing this coherency requires either synchronization stalling or lots of in-band data copying. This is a real waste if that coherency is not essential.
Small VBOs solve the coherency problem and make driver-side memory management much easier. In the long term, I expect a one or two attribs for a few hundred vertexes per VBO to be “free”. And it will never hurt (though it may not help much) to pack multiple attributes (perhaps from multiple objects) into a single VBO – if they are static or nearly static. This is probably a good idea if you have lots of static objects with very few vertices - though if you don’t render these things all at the same time, immediate mode may be better still.
Does that help?
edit: clarification …
[This message has been edited by cass (edited 12-18-2003).]