my made tests weren’t very concluent to give me a good answer, so I ask this question here.
I know that to make a quick rendering, everything is important: render from front to back, switch as few as possible VAO/VBO, texture objects, materials, clear buffers as few as possible and as best as possible (ie, clear depth+stencil at once) and change other OpenGL states (blending, masks…) as few as possible.
But as you know too, having such a renderer is not easy to achieve: if we choose to give priority in the order of the rendering this might imply to switch textures and material states more often for example. And vice-versa, if we give priority to textures and materials and VAO/VBO, then the ordering in which we render might be perverted.
I am not talking about spatial sorting and other techniques to reduce what’s sent to GL. I consider them to be done in an efficient manner.
What I would like to ‘know’ here is what, regarding your experience, are the most important things to do in GL in order to reduce the requested time to render a frame:
- rendering order (front to back),
- don’t switch VBO often,
- render all objects with same textures and materials regardless the order or VAO/VBO bindings,
- clear the stencil buffer even if the current scene won’t use stencil tests (for example in my engine I can’t be 100% sure that stencil won’t be used for a frame, for some reasons).
- enable/disable blending only once or twice
- change GL masks only once or twice
- switch shaders as few as possible
From the tests I made (on a single computer), I could deduce that clearing the color buffer has very few impact, having a dozen of VBO or a single one has few impact too, rendering from front to back has a little impact (I think thanks due to occlusion queries). But I actually didn’t try to give more priority to reduce texture switches or material switches.
The thing I noticed to make things slow down very quickly is the rendering of fonts (using wgl/glx fonts with display lists).
What do you think about this ?
From my personal experience perspective, the most influential things on rendering speed seem dependent on the driver.
VBO, for instance, can hurt performance very badly, if you use map/unmap buffer vs. buffer-sub-data. I encountered situations where traditional glBegin/End worked way better than VBOs.
Second thing I noticed is switching shaders. I would rather group batches by shaders then textures. I think this is caused by setting uniforms for the current shader in use.
Thanks for the comment. Yes unfortunately many things depend on how graphic card manufacturers do things.
I would have had appreciated more comments but it seems my question was idiot
I don’t think it is. I’ve been researching ways to improve performance for my diploma thesis and personal conversation and a host of reads have brought me to a conclusion similar to Alfonse’s in a different thread about batching: It depends on the concrete use case and no single measurement can account for any behaviour under different circumstances.
As Jason Gregory, the lead dev for Naughty Dog (Uncharted Series) notes in Game Engine Architecture, the only sure way to verify that implementing a presumed optimization is worth the trouble is to measure the actual impact. Using a single approach will never be right for every case. Still, when developing for a PC, differences in hardware across vendors makes it virtually impossible to come up with an optimal approach. Add to that, consoles are a completely different topic. I remember a alumni of my Uni, who has been a developer for BlueByte (Settlers series) say in a seminar that “Developing for the PC is a nightmare in contrast to consoles.”
Also, GL (and to be fair Direct3D implementations) implementations may cut down execution time of application calls on next generation hardware significantly. Therefore assumptions on performance need to be revised from time to time. Something that hasn’t been feasible in the past may look way more attractive a few years later.
In tests I ran, for instance, the number of texture binding calls was completely insignificant in regard to frame time - at least if below a certain amount.
The worth of doing any kind of optimization is entirely dependent on where your bottleneck is, and that’s obviously going to vary from program to program. The steps given by the OP are more or less fairly simple techniques that anyone can do and that are intended to help ensure that your bottleneck is not (or is as small as possible) in the driver/API layer (it could still be CPU-side in your own code, or in the hardware, in which case they’ll be quite ineffective) and not really much more. Obviously, the requirements of your program will supersede these; if your program requires you to not clear stencil, for example, then you just won’t clear stencil and as for the performance cost - you’ll need to take it on the nose.
You’re all right. I think, when I wrote this topic, I didn’t entirely realized that this is all about optimizations. I was more looking for the “how to render efficiently with avoiding some pitfalls I might not know but which might be known by other people”.
In fact, I wanted to have “the most efficient forward renderer”, relatively talking, before moving to a deferred technique and without going deeply into optimization, just because I am not at this stage to deal with this and just because the 3% I might gain won’t change my world.
I am currently quite satisfied with my current renderer (for instance it is twice faster than a year ago - even if this means very few), but as all forward renderers it suffers when dealing with more than a single light… The program is GPU limited of course (at least for now, and hoping it will remain like this), so this is why I was asking for such questions about some GL calls.
I understand that different graphic cards behave differently, that different drivers for a same graphic card inside the same “computer machine” could also behave differently. It also makes sense that, for example, binding a shader can be more costly for one system that for another one. It must also be the same for how fast or efficient a driver might update the depth buffer. And each of these things could make very few improvement, or very few slow-down, depending on the machine it runs on.
The thing I was more looking for, and it seams there’s no answer to this, was just to know, or at least have an overview, about how costly are commands like listed previously regarding themselves. Maybe there was some of them which where known to be more costly and others very light.
But it seems they all are (quite) light-weight and their costs are too much relative for some people to have noticed some common statistics (except most certainly for shaders bounds and uniforms that look more heavy).
Thanks for the interest.