English is not my language so please be forgiving for my aproximate use of english…
I need to draw a lot of instance of the same geometry.
For now my geometry is a “cross bilboard”: two bilboards facing in perpendicular direction. I use them to draw “cheap” far away trees…
I’ve tried two approach, batching and instancing.
-By batching I mean I got 1 VBO with multiple instances in it, each one transformed to a different position. (this should give the optimum drawing speed but this is not very dynamic and require a lot of CPU to create batches. I want to avoid that…)
-By instancing I mean I got 1 VBO with the geometry and one VBO with a lot of 4x3 transformation matrixes. I use “glVertexAttribDivisorARB” to send one matrix to each instance.
Something like that:
glBindBufferARB(GL_ARRAY_BUFFER_ARB, positionBuffer );
glEnableVertexAttribArrayARB( attribLocationX );
glVertexAttribPointerARB ( attribLocationX, 4, GL_FLOAT, GL_FALSE, sizeof(Vector4D)*3, 0 );
glVertexAttribDivisorARB( attribLocationX,1 );
glEnableVertexAttribArrayARB( attribLocationY );
glVertexAttribPointerARB ( attribLocationY, 4, GL_FLOAT, GL_FALSE, sizeof(Vector4D)3, (void)(sizeof(Vector4D)));
glVertexAttribDivisorARB ( attribLocationY,1 );
glEnableVertexAttribArrayARB( attribLocationZ );
glVertexAttribPointerARB ( attribLocationZ, 4, GL_FLOAT, GL_FALSE, sizeof(Vector4D)3, (void)(sizeof(Vector4D) * 2));
glVertexAttribDivisorARB ( attribLocationZ,1 );
I’ve got pretty weird results here and I will be happy if someone have comment or want to share their knowledge about efficient instancing.
On a Quadro 5600 with lastest drivers, in a tiny viewport so the limitting factor should be vertex throughput:
TEST1 - 100 batch of 1000 cross-bilboards drawn with indexed primitives take 4 miliseconds (250 FPS) to draw
glDrawElements ( GL_TRIANGLES, 1000 * 6 * 2, GL_UNSIGNED_INT, 0 );
TEST2 - 100 batch of 1000 instanciated cross-bilboards drawn with not-indexed QUADS take around 5 ms (200 FPS) to draw
glDrawArraysInstancedEXT(GL_QUADS, 0, 4 * 2, 1000);
TEST3 - 100 batch of 1000 instanciated cross-bilboards drawn with not-indexed TRIANGLES take 8 ms (125 FPS)to draw which is also acceptable:
glDrawArraysInstancedEXT(GL_TRIANGLES, 0, 6 * 2, 1000);
TEST4 - now 100 batch of 1000 instanciated cross-bilboards drawn with indexed TRIANGLES take 25 ms (40 FPS)to draw which is not good at all:
glDrawElementsInstancedEXT(GL_TRIANGLES, 6*2,GL_UNSIGNED_INT, 0, 1000);
What the #$@% is going on!
It seems that glDrawArraysInstancedEXT can be as fast as a batched geometry (extra cost in the TEST3 come from sending 6 vertices where TEST1 and TEST2 use 4 vertices by bilboard).
BUT glDrawElementsInstancedEXT is very slow…
This is bad because I also need to draw more generic geometry (houses, …) and as vertexes reuse is a must, indexed geometry should be used.
This results are what I got on my NVIDIA QUADRO card…
Is there a known caveat to use indexed geometry with instancing?
Does the figures look the same on some other graphic boards?
Any help in efficient instancing?
Thank you for your time!