Disadvantages of using multiple command buffers with one thread


currently, I try to port an OpenGL engine to Vulkan. I arrived at the point where I have to port offscreen rendering to textures which are then used for composing the final frame.

My current plan was to give every texture target its own command buffer, then call vkQueueSubmit once the offscreen target is finished, then start with the command buffer of the next texture target.

Now my question: Does the usage of multiple command buffers this way offer serious disadvantages in the areas of performance or synchronization? I’m asking because having all offscreen targets and the window surface share the same command buffer would require refactoring work I’d like to avoid.

There can be rendering to up to 40 texture targets once a frame.


Multiple command buffers is good. You can then trivially optimize the recording with threads.

40 vkQueueSubmits are relatively bad though. Should batch submit things and keep it under 10 I think.

You should keep it under two (per queue, per frame). There is almost never a good reason to submit twice to the same queue in the same frame.

Same for worker threads that upload Buffers etc. ? Or should those have their own queue?

You cannot submit to the same queue from different threads at the same time. Therefore, if you have multiple threads building CBs, they must either synchronize their vkQueueSubmit calls or send their CBs synchronously to a thread that does a single vkQueueSubmit call. The latter is generally going to be better than the former, as sending work to a queue can usually be done lock-free, while synchronizing calls to Vulkan objects usually has to use a real mutex.

For clarification: Is the overhead of additional buffers from the allocation or submission of the buffers? Asking differently, can multiple threads make up for the overhead of additional buffers?

Um… what “additional buffers” are we talking about? Can you explain your use case more clearly?

I’m talking about command buffers.

I gathered that. But I don’t understand what you mean here. What “overhead” are you referring to? The overhead of creating them? Of recording commands into them? Of re-binding the same program if they use the same program?

Without the details of how you’re trying to render, it’s not clear what the issue is. Especially since I mainly took issue with the way you submit the command buffers, not the way you build them.

The submission is usually the primary concern.

The recording is not free. But as you say, this is the main part where multithreading in Vulkan comes in. Plus by API design, things like Pipeline, Framebuffer, and Render Pass creation are gutted out of the command buffer, and many could be created outside of the main loop.

The command buffers are a pooled resource, and so the allocation itself should not be a large problem. Well assuming you do not also create 40 command pools each frame, which might be bad.

Ok, let’s rephrase the problem with a simple example:

Is calling vkQueueSubmit with 40 small command buffers (plus building those) significantly more expensive than building only one large command buffer and submitting that?

You are asking the wrong question. The right question is this: what is building these command buffers?

The main reason why CBs exist is to enable threaded construction of them. To be able to build work for a GPU from multiple CPU threads. Ignoring secondary CBs for a moment, if you only have “one large command buffer”, that means it was built from a single thread.

Is that OK? That depends on what you’re doing with the other CPU threads. If you’re trying to achieve maximum performance, is that an effective use of your CPU? Only your particular use cases can know for sure, but generally speaking, it isn’t.

Any overhead from submitting multiple CBs is going to be dwarfed (in reasonable cases) by the cost of not using the available CPU resources fully.

1 Like