What are the performance implications of having dedicated VAOs for each VBO?

I know it is possible to “reuse” VAOs for multiple VBO’s, I haven’t been able to find any best practice material on this, and I haven’t really ever utilized this feature, what are the performance implications for having a dedicated VAO for each VBO created?

Because they would be initialized only when VBO’s are initialized their speed is not something that I believe would be a problem, similarly I can’t imagine them having that much memory associated with them. However I could see the state change being a possible source of slowdown, would that be significant? If so are there ways to mitigate it?

Broadly speaking, what you want to avoid are having lots of vertex formats. Indeed, if you pretend that buffers can’t be attached to VAOs at all, that vertex buffer bindings are actually context state, it’d be best for everyone.

So, step 1: use the separate attribute format API for manipulating VAOs. You create a VAO when you have a distinct vertex format to use (aka: the state set by glVertexAttribFormat, glVertexAttribBinding, and glEnableVertexAttribArray). Also, glVertexBindingDivisor.

Sort your models by vertex format. So for a particular vertex format, you’ll have many models. Try to combine different models into the same buffer object where possible, but if you need to change buffer objects, you can do so via glBindVertexBuffer(s).

This is not about how much memory a VAO represents; it’s about how much changing vertex formats costs during rendering. Some hardware doesn’t actually have vertex fetching hardware, so a vertex format is really a piece of vertex shader code that gets grafted to your shader right before rendering. If you keep the VAO (and program) the same as much as possible, then the system won’t have to stitch together new vertex format shader code.

Changing vertex buffer bindings is cheap; changing vertex formats is expensive.

1 Like

Thank you for the comprehensive answer this sheds light on a lot, and even more efficiencies in my code, thanks :slight_smile:

Other than the number of GL calls there isn’t much of a difference, think of all the changes you make as deferred until a draw call is made… At draw calls all the changed state is collected and compiled into a command buffer for the GPU to update it’s state… all of this takes CPU time to validate the state changes also.

SO IN GENERAL… don’t update state if you don’t have to.