I’m creating a debug draw class to render debug shapes for my engine. This debug rendering is very slow, you can see the code here: https://pastebin.com/DYT9kVLA
If you see the Render method, you can see that obviously the renderng is slow doing it in that way.
I have been looking for optimizations and found some posts in forums and a gdc video talking about the glMultiDrawElementsIndirect.
I tried to imagine how can I change the rendering loop using the glMultiDrawElementsIndirect but I’m totally lost.
Can someone help me and change the rendering loop code to use the glMultiDrawElementsIndirect? Or if exist another way to optimize the rendering using another way it’s also ok. (I just want to optimize the rendering, doesn’t matter what I need to use).
First, there’s no advantage here to using glMultiDrawElementsIndirect() compared to glMultiDrawElements(). The main reason for using the indirect variant is if the parameters are being generated on the GPU (e.g. in a compute shader).
Also, there probably isn’t much reason to use glMultiDrawElements() over glDrawElements(). Either way, you cannot change the values of uniforms within a single draw call. So the main change required to optimise the rendering will be to either replace the uniforms with attributes, or to store the uniforms in an array and add an integer attribute containing an index. Once that’s done, rather than creating and populating a VBO for each set of primitives, you need to create and populate a single VBO (and element array) containing all of the data, so that it can be rendered with a single draw call.
You’re doing a lot of glGets at runtime; these can involve round-trips to the GPU which can cause pipeline stalls.
You’re creating and destroying GL objects at runtime. This is very slow; instead create them once-only at startup and reuse them at runtime, modifying them if necessary.
Likewise you’re allocating and freeing heap memory via new/delete at runtime; also potentially a slow operation.
glLinewidth with values other than 1 might not be hardware-accelerated.
You’re unbinding which is destroying any work the driver might be able to do to optimize state changes.
Basically I’d advise trashing this code and starting again, bearing these observations in mind. If you deliberately set out to write underperforming code it wouldn’t be as bad. I don’t mean to be rude here, just realistic - this is almost like a collection of the worst things you can do in terms of performance.