Instance rendering for everything

I’m wondering if anyone has any thoughts on whether it’s fine to use instanced rendering for all meshes or not.

In my application, I render a non instanced mesh like so

vkCmdBindIndexBuffer(...) //Bind indices
vkCmdBindVertexBuffers(...)//Bind vertex attributes like position, normal & color
vkCmdBindDescriptorSets(...) //Bind a single model matrix & other mesh properties

vkCmdDrawIndexed(...) //Instance count is 1

and an instanced mesh like

vkCmdBindIndexBuffer(...) //Bind indices
vkCmdBindVertexBuffers(...)//Bind vertex attributes like position, normal & color
vkCmdBindVertexBuffers(...) //Bind a set of model matrices, one per instance
vkCmdBindDescriptorSets(...) //Bind mesh properties

vkCmdDrawIndexed(...) //Instance count is variable

So the only difference is an extra vkCmdBindVertexBuffers and obviously the instance count would be a variable in the vkCmdDrawIndexed command. Would it be reasonable to use the second approach for everything, even for meshes that will only have one instance? It would really simplify my rendering logic because I would not need to maintain two separate pipelines, shaders, etc… Or is there too much performance overhead involved with doing things that way?

In the past the answer to something like this would have been more obvious, but what’s the modern day consensus on this with the Vulkan API?

@CasualKyle Hi,

  1. you can call vkCmdBindVertexBuffers for set all buffers instead of 2 calls, see ```
    parameters uint32_t bindingCount

Setup of instance buffers may have some additional overhead, so should use instance call for for multiple meshes.

Instancing only works if you’re rendering the same mesh repeatedly. So I’m not sure what it is you expect to gain from doing this.

If your idea is that you’re going to use the base instance index to select which matrix to fetch, that’s probably going to be better than using a per-object descriptor set. But that’s only viable if you can remove all other per-object data from the descriptor set too. And since the storage for per-instance data in vertex buffers is pretty limited, you’d likely have to use mechanisms like arrays of samplers (for per-object textures) and arrays stored in SSBOs (for other per-object data), with the instance index being used to fetch which array element to pick.

And if you’re doing that, there’s no point in using a vertex buffer at all. The base instance would just be a per-draw call constant you use to fetch values from SSBOs and sampler arrays. So you wouldn’t need an instance buffer at all. And thus, you would avoid:

Also, you should try to put meshes in the same buffers whenever feasible. So ideally, you wouldn’t be binding vertex buffers for each mesh, only each group of meshes. For example, if you’re streaming vertex data, then all meshes in each streaming block should use the same buffer (so long as they use the same vertex format too).

I see, good to know!

@Alfonse_Reinheart I guess I was not clear or did not give enough information in my question, but that’s okay because I think I understand the approach your suggesting. At the end of the day I think really what I’m after is a rendering implementation which works the same regardless of if I’m rendering one mesh or the same mesh repeatedly.

So instead of using a vertex buffer to hold the per instance data, I could use an SSBO and retrieve the per instance data by indexing into some array stored in the SSBO using the instance index. And even if there will only be one item in the array (because one instance is only being rendered), that is totally fine?

You could argue: Yes, just implement the rendering logic so that when a mesh is rendered, it renders n instances of the mesh and who cares if n is 1 or some arbitrary number. Vulkan makes this pretty easy as vkCmdDrawIndexed takes in an arbitrary instance count. This problem is, from what I’ve read online about other graphics libraries this mentality of using an instanced rendering approach for meshes with 1 instance is a bad idea because it has too much overhead and I’m not sure if that also applies to Vulkan.

So I should have phrased my question more broadly: Is it reasonable to just implement my renderer so that it will render n instances of a mesh even if n is 1?

Why would there be only one item in the array? You shouldn’t change arrays for each mesh. Ideally, they should all use the same per-draw SSBO, even if they’re using different vertex data. Just give each draw a different range of indices to use.

You don’t want to be binding new descriptor sets for each draw if you can avoid it.

There is no non-instanced draw call in Vulkan. Every Vulkan draw function must consider the possibility of a plural number of instances. So this reasoning does not apply to Vulkan.

Oh I see, all the instance data for all the meshes could be in one large array.

With one array, wouldn’t the memory layout have to look something like this:

mesh 1 instance data 1
mesh 1 instance data 2
mesh 1 instance data 3
mesh 2 instance data 1
mesh 3 instance data 1
mesh 3 instance data 2

Then you’d bind an SSBO with this array before rendering any meshes. Then when you call vkCmdDrawIndexed to render each mesh, the first instance parameter would just be the index of the current mesh. 0 for mesh 1, 3 for mesh 2, 4 for mesh 3. That way, in the shader gl_InstanceIndex is simply the index into this one large array which points to the instance data.

I can see how this approach doesn’t require binding a descriptor set for each draw. Am I understanding your suggestion correctly?

I’ve implemented the approach laid out in my previous post and it has been a very positive change. It does accomplish what I set out to achieve. It has made my rendering logic more simple on the CPU side by handling an arbitrary number of mesh instances and on the GPU side by only binding one descriptor set which contains all the instance data.

Thank you for the help @Alfonse_Reinheart.

@CasualKyle you seem to be happy with your solution and that’s what matters ultimatetely.

I wanted to add the following since I happened to play around with various permutations of draw and memory setups, among other things to check out performance implications as well as fitting in nicely with the way I want my framework to be.

I have found that, for myself at least, that there is only a small number of things that change, so far model matrix, while a I have a small list of per mesh data (texture ids) which I don’t anticipate to change (so far at least).

Personally I have found the ease to hand in these model matrices via push constants incredibly handy AND fast. Not for one moment do I think of going to a mapped memory model, it is inferior to push constants and per object draws in all my tests.

I was initially overly worried with draw calls (for all the wrong reasons and comparisons to opengl) and I wanted to go with the instance count but as I said it turned out that I get really good effects the way I chose.

  1. low device local memory for models I have
  2. low mesh specific SSBO (also device local only since I don’t write to it)
  3. no mapped memory flushing and stuff because of matrix push constants.

so I can add new instances of models purely on the cpu side and they will get mechanically rendered in the draw loop, or updated according to their transformations.

I tried out indirect draw and was surprised that for my use case at least it did not perform better than push constants and drawIndexed, plus had the memory update requirement.

last but not least, I can highly recommend generating command buffers and draws in an own thread or via a thread pool. the gains when the object instances increase is incredible. add frustum culling and you can have a really large set of objects.

I would have never dreamed of that performance since I made bad experience with opengl which is admittedly my own fault. but Vulkan did educate me how to do things right, despite the initial hurdle.

1 Like