Disadvantages of using multiple command buffers with one thread

Desperado17 · June 8, 2021, 9:53am

Greetings,

currently, I try to port an OpenGL engine to Vulkan. I arrived at the point where I have to port offscreen rendering to textures which are then used for composing the final frame.

My current plan was to give every texture target its own command buffer, then call vkQueueSubmit once the offscreen target is finished, then start with the command buffer of the next texture target.

Now my question: Does the usage of multiple command buffers this way offer serious disadvantages in the areas of performance or synchronization? I’m asking because having all offscreen targets and the window surface share the same command buffer would require refactoring work I’d like to avoid.

There can be rendering to up to 40 texture targets once a frame.

Regards.

krOoze · June 9, 2021, 7:37pm

Multiple command buffers is good. You can then trivially optimize the recording with threads.

40 vkQueueSubmits are relatively bad though. Should batch submit things and keep it under 10 I think.

Alfonse_Reinheart · June 9, 2021, 8:22pm

You should keep it under two (per queue, per frame). There is almost never a good reason to submit twice to the same queue in the same frame.

Desperado17 · June 9, 2021, 9:04pm

Same for worker threads that upload Buffers etc. ? Or should those have their own queue?

Alfonse_Reinheart · June 9, 2021, 9:26pm

You cannot submit to the same queue from different threads at the same time. Therefore, if you have multiple threads building CBs, they must either synchronize their vkQueueSubmit calls or send their CBs synchronously to a thread that does a single vkQueueSubmit call. The latter is generally going to be better than the former, as sending work to a queue can usually be done lock-free, while synchronizing calls to Vulkan objects usually has to use a real mutex.

Desperado17 · June 10, 2021, 8:21pm

For clarification: Is the overhead of additional buffers from the allocation or submission of the buffers? Asking differently, can multiple threads make up for the overhead of additional buffers?

Alfonse_Reinheart · June 10, 2021, 8:29pm

Um… what “additional buffers” are we talking about? Can you explain your use case more clearly?

Desperado17 · June 10, 2021, 8:29pm

I’m talking about command buffers.

Alfonse_Reinheart · June 10, 2021, 11:55pm

I gathered that. But I don’t understand what you mean here. What “overhead” are you referring to? The overhead of creating them? Of recording commands into them? Of re-binding the same program if they use the same program?

Without the details of how you’re trying to render, it’s not clear what the issue is. Especially since I mainly took issue with the way you submit the command buffers, not the way you build them.

krOoze · June 11, 2021, 1:21pm

The submission is usually the primary concern.

The recording is not free. But as you say, this is the main part where multithreading in Vulkan comes in. Plus by API design, things like Pipeline, Framebuffer, and Render Pass creation are gutted out of the command buffer, and many could be created outside of the main loop.

The command buffers are a pooled resource, and so the allocation itself should not be a large problem. Well assuming you do not also create 40 command pools each frame, which might be bad.

Desperado17 · June 11, 2021, 1:32pm

Ok, let’s rephrase the problem with a simple example:

Is calling vkQueueSubmit with 40 small command buffers (plus building those) significantly more expensive than building only one large command buffer and submitting that?

Alfonse_Reinheart · June 11, 2021, 2:25pm

You are asking the wrong question. The right question is this: what is building these command buffers?

The main reason why CBs exist is to enable threaded construction of them. To be able to build work for a GPU from multiple CPU threads. Ignoring secondary CBs for a moment, if you only have “one large command buffer”, that means it was built from a single thread.

Is that OK? That depends on what you’re doing with the other CPU threads. If you’re trying to achieve maximum performance, is that an effective use of your CPU? Only your particular use cases can know for sure, but generally speaking, it isn’t.

Any overhead from submitting multiple CBs is going to be dwarfed (in reasonable cases) by the cost of not using the available CPU resources fully.

system · December 11, 2021, 2:26pm

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.