glMultiDrawElements -> only marginal gain

DraKKaR · February 3, 2005, 2:19pm

I’ve been testing the glMultiDrawElements function.

A have a mesh of thousands of strips. Each strip contains 10-40 tris.

glMultiDrawElements lets me the ability to render all that thousands of strips in one single call. But, unfortunately, with the latest drivers I only get a marginal gain over using thousands of calls to glDrawElements!

I tested that on some NVIDIA cards. I am no fillrate bounded because framerate doesn’t change chaging the windows size. I don’t use textures, nor lighting, only plain indexed triangle strips.

Furthermore, glMultiDrawElements does not let you to specify a range of used vertices (such as glDrawRangeElements)…

What do you think about this? Is glMultiDrawElements really worth it?

Obli · February 3, 2005, 11:26pm

I also heard it does not provide so much gain but I am not able to test this because I need a more intensive test case.

Provided you have enough CPU power, it’s quite obvious you don’t get speedup from saving CPU time.
First time I tried standard vertex arrays on a Pentium1 machine I got a +200% speedup. Now, the same program on my Athlon 1ghz+ does not show any performance increase.
Make sure your CPU load is high enough to have a significant impact on real performance. If not, then this kind of measurement is not really significant.

DraKKaR · February 4, 2005, 1:41am

Ok, I understand. So, on modern systems, it seems better to use multiple glDrawElements (specifying the correct range of vertices) than using a single glMultiDrawElements (without the ability to specify the correct range of vertices).

I am right? Or I misunderstood anything?

Obli · February 5, 2005, 1:11am

Rule of thumb: it’s always good to maximize batch size.
So MultiDraw should be better than Draw. Anyway, don’t expect any performance increase if you’re not CPU bound.
Example:
Rendering 30M static vertices on a 3Ghz+ machine using VBO. The whole render loop basically does
Render mesh
Render mesh
Render mesh
Render mesh
It’s likely you are GPU/transform bound.
So, no matter if you send a single batch of 30M vertices or 30 batches counting 1M vertices each, because, the CPU saving you realized is thrown away.
The fact you get more speed or not depends on a variety of factors you should investigate by doing “pipeline balancing”.
It’s not a feature like VBO in which you turn it on and you magically get +200% performance.
Got it?