Efficiency of VAO with VBO for every Model

So, I’ve been trying to find out the best way to render ~1000 models on screen. I see two general approaches to this. Some people pack their entire scene into a single giant VBO and load that in. I don’t understand how you would go about using different MVP matrices for each model, if they are all packed into one VBO.

I saw someone who made one VAO, then had a different VBO for each model. They bound all the VBOs to the VAO, and then just called DrawArrays for each VBO, passing any new attributes between drawing each VBO.

My question is how efficient is this? I have been under the impression that multiple draw calls is a major slowdown, or am I mistaken? I know that binding is also intensive as well.

What would you suggest for someone that needs to achieve what I need?


It’s perfectly possible to have a single large VBO and draw 1000 models with different MVPs; you just use multiple draw calls and vary the parameters to your draw functions. E.g.

glBindBuffer (largeVBO);
glLoadMatrixf (mvp1);
glDrawArrays (model1Begin, model1End);
glLoadMatrixf (mvp2);
glDrawArrays (model2Begin, model2End);

Remember that a VBO is just dumb storage - it doesn’t define anything else (not even vertex format), and you’re never obliged to draw the entire VBO in any given draw call.

With 1000 models I’m going to guess that it’s highly likely you’re drawing the same model many times over; if that’s the case then I’d suggest that you look at using instancing as an alternative approach. However, I’m also going to guess that you’ll find that vertex setup and draw calls are very probably less of a bottleneck than fillrate, as those models will probably cover quite a good portion of the screen and have significant overdraw.

In my experience, separate VBO per model (without VAOs or NV bindless) performs poorly – it’s even worse than client arrays. I’d expect one-VBO per model with one VAO for all (what I think you’re talking about above) to be nearly the same performance.

Sep VBO per model but also using one VAO per model (not just one shared across all of them) can speed that up some, but you still end up with a lot of cache thrashing just getting the vertex arrays bound and enabled with a bazillion VAOs floating about.

For best results in my experience, either use NVidia bindless for vtx attrib/index list binds/enables (particularly in the case where you have a lot of static VBOs with smallish batches), OR use a streaming VBO approach (similar to what client arrays probably does under-the-hood) as those two approaches avoid nearly all of the overhead of binding many VBOs to render a frame.

Otherwise, lots of VBO binds can kill your performance.