What is the next step? (Optimization)

View frustum and “out door occlusion culling working”. What is the next step to speed up rendering. The environment are out doors.

Best Regards

and sorting/batching.

Do you use the GL_ARB_occlusion_query for occlusion?

LOD i know what it is, but what is the second sorting/batching(of what ) Any recommended literature?

No I don’t.

I think it means state sorting.
Basically you have to minimize state changes.
You can achieve this by grouping object with the same shader.
Shader are the most expensive switch, changing texture is also an expensive operation texture, but grouping object by texture is quite an impossible task. To minimize texture switch you can use texture atlas.

Rosario I think answered your question, but as to sources you can read? Most of the stuff I’ve seen is high-level and scattered. There’s some stuff in these sources (among many others):

But this isn’t rocket science. The main thing you need to know is: what are the “GL states” I need to change between my batches, and what are the most expensive ones to change on my hardware. E.g. Changing a render target (FBO) can be very expensive. Changing shaders is pretty expensive. Changing uniforms is pretty cheap.

This tells you how you need to group objects in your rendering to minimize the most expensive state changes. You can use build-time tools to help group things (best case), and/or you can use run-time (CULL time) techniques to help group things (good, if you can make it fast!). Grouping them to minimize state changes of course. For run-time grouping techniques, strongly prefer a radix sort technique (where just by looking at the state you know exactly which “bin” it needs to go in). That’s about as cheap as you can get short of not having to sort at runtime at all (i.e. build-time sorting).

Also, when you need to change pipeline state, rather than considering every single state change you need to make with lots of complex logic (expensive), group state changes into “state groups” (aka “state sets”, “materials”, whatever) so that you can skip the entire state change process with a single “if” test if the active GL state is what the desired GL state is for the next batch). I.e. just compare a pointer or an integer, and if it’s the same, you’re done changing state!

Also the other main thing is, at the individual GL state change level, don’t tell GL to change the state to what it already was before. This is sometimes termed state tracking, duplicate state change elimination, or lazy state changes. You can do this with simple wrappers around GL calls.

What occlusion culling algorithm do you use?


Identify the specific biggest bottleneck you have and tackle that. You could spend a lot of time optimizing something that is only causing a 1% to 5% slow down otherwise.

There are many tools and methods for profiling applications, but sometimes the crudest and most primitive are the best, so just selectively comment out parts of your renderer and do before/after speed comparisons, recording the results in a spreadsheet. You should very quickly be able to zero in on what’s causing the most slow down in your app.

Don’t go beating your head against a wall trying to optimize something that is already as fast as it’s going to get, either, even if that something is your biggest bottleneck.

Learn a little about the underlying hardware and what formats/etc work best with it. This includes formats such as 16-bit vs 32-bit indexes, texture formats, texture sizes and so on. Because OpenGL doesn’t expose you directly to the hardware, you occasionally find OpenGL code (and examples) that uses suboptimal formats and performs poorly, when all it needs is some minor tweaks to get it’s speed up. A specific example: I had awful trouble with glTexSubImage2D some time ago, but switching it to use a type of GL_UNSIGNED_INT_8_8_8_8_REV and a format of GL_BGRA resolved everything, and even on modern NVIDIA hardware spends one sixth the time in the driver as the more commonly found GL_UNSIGNED_BYTE/GL_RGBA.

Most GPUs work relatively similarly these days, and the days of hardware-specific hacks for specific cards or vendors being the norm are thankfully behind us, but these occasionally can still crop up, so keep an eye out on the forums for any notifications of stuff (along the lines of “X is really slow with the new Atividiatel driver”).

Above all, remember - if it’s “fast enough” it’s fast enough! If you’re already getting 60 FPS with your heaviest load on your target hardware, there’s no need to waste time optimizing further. You’re done.

Good stuff. Two things I’d add to the “identify the bottleneck” task.

  1. While at any given time you may be most bottlenecked in one place, over the course of a frame, you may be bottlenecked in different places at different times. So try to “turn off” as much stuff as you can without affecting the bottleneck.

  2. In a pipelined system, work your way backwards in the pipe. That is, test the last thing first (e.g. shrink your window to see if your fill bound first). If so, you’ve found it. If not, move on to the previous stage.

It helps to line up the worst case you can possibly find and set things up so your app comes up with that exact same worst case active every time. Easier to get to the bottom of it because you know there’s nothing subtle slightly different every time.

Above all, remember - if it’s “fast enough” it’s fast enough! If you’re already getting 60 FPS with your heaviest load on your target hardware, there’s no need to waste time optimizing further. You’re done.

Can a 3D engine ever be fast enough? I find sometimes, tinkering is worthwhile.

If you build a faster engine is because you want to put more polygons, more fx, more animation, more complex material in your scene. In that moment you will find that some part is slower than another and worth optimization.

So basically no, it’s never enough. :smiley: