Instancing Performance

ViolentHamster · September 11, 2009, 1:21pm

I’ve implemented and tested geometry instancing on a bunch of simple, 4 polygon tree models. I’m packing the tree positions in a texture buffer. My 285 takes close to 16 ms a frame to render about 10 batches of 5,000 trees.

I’ve also merged all the tree geometry together, so I have the same number of batches, but I’m just calling the good ol’ glDrawElements. If the tree groups are compiled into display lists, the frame time is just 5 ms. If not, then the frame time is about 10 ms.

So, in this simple case, I’ve found that instancing performs worse than straight GL calls. Have others reached a similar conclusion? Is there a batch size or model polygon count for which instancing would outperform regular GL calls for static objects?

Alfonse_Reinheart · September 11, 2009, 2:10pm

First, what kind of instancing are you using?

Second, what hardware and drivers are you using? Have you tested with other hardware?

ViolentHamster · September 11, 2009, 5:02pm

geforce 285. I don’t see any reason to test with a lesser card.

I’m using the instancing described in EXT_draw_instanced. I store the per instance data in a texture buffer object (EXT_texture_buffer_object) and access it in the vertex shader with gl_InstanceID.

Alfonse_Reinheart · September 11, 2009, 5:28pm

My 285 takes close to 16 ms a frame to render about 10 batches of 5,000 trees.

Are you rendering 5,000 instances in a single draw call, and you’re making 10 of them, or are you rendering 10 instances in 5,000 draw calls?

Also, since you have a GL 3.x-capable card, are you using VAOs for your vertex data?

ViolentHamster · September 11, 2009, 5:45pm

Yes, 5,000 instances in a single draw call and I’m making 10 calls. It’s only 20,000 triangles per call though. I need to test with a more complex model.

I don’t have the code in front of me now, but I doubt it’s using VAOs.

Alfonse_Reinheart · September 11, 2009, 6:00pm

Wait, each instance is only 4 triangles? The per-instance overhead is what’s going to dominate performance there. You should at least have enough triangles per instance to fill up the post transform cache.

Jan · September 12, 2009, 2:17am

I am not entirely certain whether this is “normal behavior” and whether it is so still today, but i think when i used instancing i discovered that rendering x instances is fine, and rendering >x instances became slower again.

I rendered more complex models though.

Jan.