Performance hit & glUniform

I’m currently using a call to glUniformMatrix4fv every call I have to render something I have on the screen to update the modelview/projection view in the vertex shaders of my shader programs. While working on a GUI application that renders a bunch of buttons, a panel and a bunch of labels, I noticed a huge performance hit from my code using std::map calls. I removed those, but didn’t get the performance gains I was expecting. The application will run at around 40 fps. This doesn’t sound so bad, but considering all I’m doing is rendering ~400 textured triangles with very simple shaders, I’m disappointed. My terrain-mesh application that renders a terrain made up of close 9000 multi-textured tris runs faster. I’m assuming it’s because to render the terrain mesh, I make about 2 calls to glUniform. To render a line of text, I make (3+n) calls to glUniform where n is the number of characters. The data that is sent each character is a 4x4 matrix for the modelview, to describe the characters position. The other three are color(4f), projection(4x4f), and texture unit (1i).

Does anyone have any thoughts on this, or can offer an alternative that is easier than generating & storing the vertices & texture coordinates and colors for each line of text I have at runtime?

I’m quite worried because eventually I want to have many models, about 50-100, running around on screen. And buffering & updating individual vertex data for each of them just doesn’t seem practical, or video memory efficient.

For a 2D GUI you don’t really need matrices, and updating a matrix for each character in a line of text definitely seems overkill; position is just a simple x/y so modelview can be left at identity.

Actually, yes it does. 40 fps is 25 ms/frame. That’s tons of time for GL rendering if you do it properly. We have a hard 60 fps requirement (16.6 ms/frame) and we render huge numbers of batches and triangles.

…but considering all I’m doing is rendering ~400 textured triangles with very simple shaders, I’m disappointed.

I would be too. Somewhere you’ve got a horrible performance sink, and you need to run a profiler or comment some things out selectively to figure out what your primary bottleneck is.

I’m assuming it’s because to render the terrain mesh, I make about 2 calls to glUniform.

First rule of performance optimization: don’t assume! Measure, measure, measure!

In my experience glUniform isn’t slow, except on ancient cards that are now 6-7 generations old (where it was sometimes recompiling the shaders when the uniforms changed).

…or can offer an alternative that is easier than generating & storing the vertices & texture coordinates and colors for each line of text I have at runtime?

How many vertices and draw calls are we talking about here?

Are you using client arrays, VBOs, or display lists?

Are you leaving the data on the GPU, or are you updating it every frame?

What’s your ms/frame if you comment out your draw calls?

Be sure to disable sync to vblank.

You can take a look at GLIM (link in my signature), i use it to render text and other frequently changing data and it works very well. It’s as easy to use as immediate mode, but generates a VBO and thus will be executed in a single drawcall.

Jan.

Just FYI, there is a profile config item:

“Do you want to view other user’s signatures with their posts?”

which some folks have set to no to keep text thread-relevant. You may have to enable it temporarily to see this sig.

How many vertices and draw calls are we talking about here
My text-rendering code creates a single VBO for the vertices, and a single VBO for the texture coordinates. The code re-uses these for every character drawn, scaling it to the glyph’s width and height and translating it to the glyphs draw position, relative to it’s location in the text string/the text strings draw position. There’s one draw call per character to glDrawArrays(GL_QUADS,0,4).

Are you using client arrays, VBOs, or display lists?

Nothing but VBOs

Are you leaving the data on the GPU, or are you updating it every frame?

Not sure what data you are talking about. The vertex data and texture coordinate data remains the same, that is, I don’t use any calls to glBufferSubData() or glBufferData after the initialization call. I’m updating the uniform values in my shaders using glUniform*() every frame.

What’s your ms/frame if you comment out your draw calls
If I comment out all draw calls related to rendering text other than the small FPS counter I have, the code will render at about 300-400 FPS, which still seems kind of slow because in that case all that’s being rendered is a single quad. With nothing being rendered but the FPS counter, it’ll boost around 1000+ FPS.

Be sure to disable sync to vblank.
Forced off in nVidia control center.

There’s one draw call per character to glDrawArrays(GL_QUADS,0,4).

… That’s pretty horrible. Especially considering that you have to call gl*Pointer to rebind the position/texcoord arrays for every character.

You would be far better off simply generating the positions/texture coordinates for all of the glyphs of each string. Upload them to a buffer object and render with them. Change the data only when the string changes. You can still position this string where you wish, but you’re not constantly rendering single characters.

I see. So the goal for efficiency is to store everything you can on the graphics card, and bring data transfer between the graphics card and the cpu to a minimum during my main loop. Would this mean storing all of my renderable game components in their own buffers in video memory? That is, having a “gui_vertex_buffer” of sorts, and selecting offsets & lengths of the data to render based on what GUI components are visible?

But this still erks me because if I store a model in video memory in local coordinates and I want to render many of them, what other way is there to render them without telling the GPU where each is located?

All the glUniform calls are relatively fast except that it has been reported that on some nVidia drivers, when certain values are sent to the shader, the driver recompiles and reoptimizes your shader. This is obviously a problem for games. Values are 0.0, 0.5, 1.0. There is no solution other than to avoid those exact numbers. Has nVidia solved this issue in recent drivers? Unknown.
In the OpenGL wiki on GLSL, this problem is mentioned with nVidia drivers. Does anyone know if this information is valid still?

I performed a test on one of my other test applications, that renders the large terrain with a 300-400 poly model. And renders the FPS string 40 times per frame, without any changes from before, and I receive no significant performance hit.

Pretty sure this is a GeForce 7 and earlier thing. Killed with G80 (GeForce 8) I believe.