[LEFT]Hey! I’m trying to improve the graphics performance of my app (http://audulus.com if you’re curious). Here’s a screen shot:
I’ve built a scene graph to represent the 2D UI. The UI is mostly procedural (few pre-rendered images) and a lot of it is animated. I’m wondering if folks around here have some advice on optimization strategies for this kind of thing. More info over on Stackoverflow:[/LEFT]
From your comments on stackoverflow you are cpu bound. Most comments from an OpenGL viewpoint will drive you to be cpu bound because being cpu bound implies you are doing the best things to optimise the gpu.
You need to profile the cpu code to find where your bottlenecks are; modify these until you are gpu bound and then refocus on the OpenGL.
The basics for improving OpenGL are
minimise state changes like active textures, blend mode
Thanks for the reply! Its true, I’m CPU bound, but the vast majority of the CPU rendering time is spent inside OpenGL calls. With texture caching off, 44% is spent in glDrawArrays.
One problem I face is that I can’t reorder the drawables to minimize state/shader changes because correct rendering requires them to be rendered back-to-front (this is 2D graphics after all, no z-buffer). In my test document, there are 780 drawables and 96 shader changes per frame. The shader changes are due to the interleaving of paths and text in the rendering order. I could try to combine the path and text shaders into one multi-purpose shader, but that would be fairly ugly.
Another possibility is to convert the text into paths, so almost all drawables are paths. This would be a lot of work, as I’d have to upgrade my path renderer (or replace it with another one) so it can handle text (the tessellation is currently very limited), plus generate the paths from the glyphs in the first place.
Just because you are in 2D does not mean you cannot use a z Buffer to control render. If you give objects an artificial depth based on there z-order you can use a depth buffer to draw in any order. This will allow you to cut down your shader flipping.
If you are using textures for text I assume you are using a texture atlas for characters so that there is only 1 texture for all of your font.
Looking at your screen you should only need 2 shaders - one for lines and one for text and other images. Depending on your images you may also be able to get away with 1 texture atlas for all your images and text font.
You could also look at rendering your path as a series of thin triangles. Now you only need 1 very simple shader that just renders textured triangles and you colour lines with a simple solid filled colour texture. This is what I do.
You can also try packing multiple line objects into 1 buffer to cut down on you draw calls. I assume this is running on a iPad. I don’t have any experience with them but 780 calls looks like a lot for such a device. Packing improves draw times but complicates deletion and modification.
Right, I should have been more specific in that its the alpha-blending I do that requires back-to-front rendering. Otherwise I could have used a z-buffer.
I’m not, and that’s an excellent point – thanks! Each text drawable is rendered into its own texture, resulting in a lot of small textures.
Now that I think about it a bit more, it might not be that bad to combine the path and text shaders into one. My path shader anti-aliases the paths (I found this looked nicer and was faster than FSAA), but I think I could extend it to do texturing easily enough.
As a test of what combining shaders would do, I simply disabled the text. I get a modest improvement in CPU frame time, but since there is no change in GPU frame time, I would assume that the improvement mainly comes from fewer GL calls. Does that sound right?
Yes, I have found the 80% benefit comes from cutting down OpenGL calls, 20% from everything else; this is why I suggested grouping objects into buffers where possible. The idea is to keep the gpu fully occupied and this can be quite hard with lots of calls and little other processing.
You might also try reusing a vbo and load it with data as needed. I don’t know if this is faster than binding different a different vbo for each render call.
My data is quite static so I can create large buffers to cut down on render calls.