I’m trying to optimize the driver overhead of a certain vegetation rendering system. Problem is this: I have a large number of instances of the same mesh that need to be rendered (on the order of 100-200 worst case). I need some way to reduce the impact of making 100 calls to glDrawElements. Right now I’m binding the VBO, then the shader, and then looping over the instances, setting the transform for each and then calling draw. A pretty basic approach.

What I’m looking for here ideally is some sort of instancing, where I could pass an array of matrices or something to a single draw call and say “ok now draw 97 of these”. To my knowledge OpenGL has no such functionality.

Merging my instances into a single array is not really a great option, at least not on a large scale, since the list changes frequently.

Does anyone have any suggestions? Thank you!

You could try display lists but I have no idea how well that works in combination with VBOs and if it will actually accelerate anything on todays hardware.

On G80-series there is an extension , which does exactly what you’ve described.
According to that table - , it is supported.
Actually, I had no deal with it, so I can’t say exactly, if there is some caveats.

EXT_draw_instanced is interesting… would certainly provide benefit to people with that hardware. Ideally I’d like more support than that, but maybe I’ll just have to live with it.

Depending on how 3d-ish the mesh is, you might be able to get away with rendering once to a texture, and then just texture-mapping it 100+ times onto a simpler surface.

Unfortunately, for now it is the only way to benefit from “fair” instancing with OpenGL.
Previous hardware may support it in future, but I think nVidia will “push forward” their new generations and will “make a bolt” on elder, it’s quite understandable :frowning:

When I tried EXT_draw_instanced on the g80 it was about half as fast as just looping over DrawElements and setting transforms.

It sounds like you’re assuming your bottleneck is on the CPU, and are thus trying to optimize out draw calls. Perhaps your bottleneck is on the GPU? If that was the case, then I could see how it could lower performance. Or… it could very well just be an unoptimized implementation in the driver.

Kevin B

Definitely CPU limited in my app - if I measure time spent simply making all those calls its quite bad. I would love for the bottleneck to be the GPU :stuck_out_tongue:

It’s not that I was actually trying to reduce any CPU bottleneck - OpenGL draw calls are pretty fast if you aren’t changing much state - I just wanted to see if instancing had reasonable performance yet. I was hoping it would at least be the same speed as simply looping over a draw call and setting a uniform.

edit: ext_draw_instanced didn’t even work correctly until recent drivers, so it could just be a work in progress at this point

You could try do vertex shader constant based instancing. Store multiple copies of your model (for example 16 copies so you can draw 16 instances at once) and store an ID for each instance in the vertices. Then you put your per instance data into a uniform array and look up your per instance data in the vertex shader with the ID.

I don’t think the mulptiple draw calls are a big problem with OpenGL. But you might want to look into the “Pseudo Instancing” demo from nVidia. Instead of setting a uniform for each instance they update vertex attributes and use the vertex attributes as the per instance data.

I have gained 30% of performance in my terrain engine by reducing number of draw call at cost of passing new index array each time.

What Humus proposes is a good and backward compatible idea.
If your meshes aren’t complex and there’s plenty of them then you could also try rebuilding vertex array entirely on the CPU and performing single draw call - this also allows mixing different objects.
Note that you don’t have to rebuild this vertex array every frame.
You could also split the area covered with vegetation into sectors - each having own vertex array. This way you only need to update outer sectors once every n frames. You will have one draw call per sector. This solution is also OpenGL 1.1 compatible :slight_smile:

Originally posted by AlexN:
When I tried EXT_draw_instanced on the g80 it was about half as fast as just looping over DrawElements and setting transforms.
This makes no sense. Can you send me a test app which shows this slowdown? The extension is intended to solve exactly this problem.

Pseudo-instancing is the recommended approach for older hardware (i.e. sending the xform via current state).