Instancing Performance

I’ve implemented and tested geometry instancing on a bunch of simple, 4 polygon tree models. I’m packing the tree positions in a texture buffer. My 285 takes close to 16 ms a frame to render about 10 batches of 5,000 trees.

I’ve also merged all the tree geometry together, so I have the same number of batches, but I’m just calling the good ol’ glDrawElements. If the tree groups are compiled into display lists, the frame time is just 5 ms. If not, then the frame time is about 10 ms.

So, in this simple case, I’ve found that instancing performs worse than straight GL calls. Have others reached a similar conclusion? Is there a batch size or model polygon count for which instancing would outperform regular GL calls for static objects?

First, what kind of instancing are you using?

Second, what hardware and drivers are you using? Have you tested with other hardware?

geforce 285. I don’t see any reason to test with a lesser card.

I’m using the instancing described in EXT_draw_instanced. I store the per instance data in a texture buffer object (EXT_texture_buffer_object) and access it in the vertex shader with gl_InstanceID.

My 285 takes close to 16 ms a frame to render about 10 batches of 5,000 trees.

Are you rendering 5,000 instances in a single draw call, and you’re making 10 of them, or are you rendering 10 instances in 5,000 draw calls?

Also, since you have a GL 3.x-capable card, are you using VAOs for your vertex data?

Yes, 5,000 instances in a single draw call and I’m making 10 calls. It’s only 20,000 triangles per call though. I need to test with a more complex model.

I don’t have the code in front of me now, but I doubt it’s using VAOs.

Wait, each instance is only 4 triangles? The per-instance overhead is what’s going to dominate performance there. You should at least have enough triangles per instance to fill up the post transform cache.

I am not entirely certain whether this is “normal behavior” and whether it is so still today, but i think when i used instancing i discovered that rendering x instances is fine, and rendering >x instances became slower again.

I rendered more complex models though.