This seems to be a very undocumented topic and I spent hours researching this without finding any reasonable answer.
Basically from all the various sources that I have researched (These very forums, to stack overflow, reddit and even the Wiki) there is one general recommendation to guarantee high performance in OpenGL and that is limiting the number of state changes.
I have managed to find a benchmark(From Nvidia), which I am not sure how up to date is, which shows various OpenGL operations and how expensive they are.(Can’t post links otherwise I would place it here)
So the most basic optimization which seemed to be suggested everywhere is sorting your Shaders, Textures and buffers based on the benchmark, to guarantee the least amount of state changes per single frame.
This is pretty straight forward and makes sense.
However I dug up a pretty large hole when it comes to managing state changes of a VBO.
I have not found any reasonable documentation or benchmarks about how to do this efficiently and what is the considered to be the most “Modern, up to date method” of handling them.
Generally there are two approaches.
- Use relatively many VBOs and bind them as appropriate (Letting drivers handle their memory management)
- Use a single VBO per every unique combination of vertex attributes that object might have, bind it once and put all the data you need into it and draw it via batch draw commands.
The second case, as I mentioned earlier seems to be a massive rabit hole, because using one large VBO isn’t as simple as it sounds.
From what I have understood, the basic idea is that you allocate a large chunk of GPU memory and manage it on your own.
So you need some kind of memory manager or a memory allocation system that “Allocates memory” within the buffer efficiently and fast.
So each time you want to add, or modify vertex data, the allocator finds the appropriate address within the VBO and places the data there.
Needless to say, writing a fast and efficient memory allocator isn’t trivial and in practice, especially in operating systems, multiple algorithms are used (And chosen based on heuristics from meta data in order to guarantee the fastest memory allocation performance). You need to deal with memory meta data overhead, external fragmentation, internal fragmentation allocation speed and it is nothing but a compromise between all of those.
I could spend hundreds of hours perfecting such system and even then it would be questionable whether the performance gain would outweight using separate VBO for almost everything.
What I find absolutely astonishing is there is literally zero documentation, zero discussion, zero anything about this. There are no general libraries that allow you to manage VBO memory, there are no “Best practices” or guidelines which warn you against possible pitfalls.
So all this is just a big smell for me and I am not even sure if this is the right way to go.
Further information regarding this topic would be appreciated.
Thanks.