I’m working with opengl to recognize Brazilian Sign Language and I need some help about performance because I need to render a lot of meshes.
In my research I need to render 64 groups of objects that each one has 36 objects with different meshes and they are organized hierarchically. Each group has its own viewport and transformation tree, so to render the whole scene I need to do 2240 calls to glDraw* and its very slow. I thought that I could use instancing rendering, but how each group has objects with different meshes I don’t know how to do that (because all documentation says that instancing is used to render the same mesh a lot of times). I saw other possibles solutions like glMultiDrawElementsBaseVertex that if I’m not wrong, can render a lot of different meshes that are in a VBO, but I didn’t understand how could I transform some mesh (using instancing I could use gl_instanceID to set witch matrix to use) using shader. I thought now that how my group of objects has some objects that are equals maybe I could use instancing inside the group than instead of call glDraw 36 time for each group maybe I could call 3 or 4 times. The image shows my objects group.
Having many meshes does not stop you from putting them all in the same VBO so glMultiDrawElementsBaseVertex works well for this. You could put a unique index into the vertex structure that you can use to reference a matrix array. Also have a look at ARB_multi_draw_indirect
Are you sure that making the draw calls is your main bottleneck? Based on the numbers you’re giving, I’m not convinced that this is the case. What platform are you working on? What’s your current frame rate?
To reach 60 fps with 2240 draw calls per frame, you would need throughput of 134,400 draw calls per second. That’s not a huge number. I would expect that most relatively high performance platforms could give you throughput of a few million draw calls per second if your state changes between calls are fairly minimal (say a couple of bind calls).
While optimizing the number of draw calls could give you some improvement, I suspect that your main performance limit is elsewhere. Before spending a lot of time optimizing, you may want to use profiling tools to find out what limits your performance. This can start simply, e.g. by seeing if you’re CPU limited or GPU limited by watching CPU usage while you’re rendering. If you’re CPU limited, you can use CPU profiling tools to see where the time is spent. If you’re GPU limited, your platform might have GPU profiling tools, or you can also try simple experiments like making the window smaller, simplifying your shaders, reduce the number of vertices, etc., to narrow down where your bottleneck is.
If you really need to update your transformation matrix for each draw call, that could be fairly expensive. I assume you’re using uniform variables in your shader for the transformation matrix? Depending on hardware architecture and optimization of the driver on your platform, updating uniforms can be fairly expensive. If you share my suspicion that this is a possible bottleneck, you could confirm or deny the theory by skipping the per object transformation updates, and see if your rendering gets much faster.