Question regarding buffer uploads and vkQueueWaitidle

Greetings, Vulkan beginner question:

in the Vulkan tutorial, buffer uploads are done making a cpu side staging buffer, then copy this to the gpu side buffer:

The method uses a separate command buffer from the one where the draw commands go to.

The copyBuffer method employs endSingleTimeCommands which casts a vkQueueWaitIdle.

Now my question is: Do I really need the separate command buffer and the vkQueueWaitIdle? The staging buffer is finished once it is unmapped, isn’t it? Can’t I just submit all commands to the draw command buffer assuming that vkCmdCopyBuffer is done when the first drawcall using it starts?


No, you can use a command buffer that also contains draw calls and something more fine grained than vkQueueWaitIdle for the synchronization.

No, you must correctly synchronize the vkCmdCopyBuffer and draw calls using the data in the destination buffer. The Synchronization Examples show how to do this with and without queue ownership transfer. Of course with queue ownership transfer you do need more than one command buffer because they get submitted to separate queues.

Even in a single-threaded scenario like in the example?

I assumed that the only reason why they they sync at this point is because the staging buffer is on the stack so they have to upload it before return.

Yes. The GPU executes independently of the CPU. So you always have at least 2 threads. Plus, with specific exceptions, any set of GPU commands that uses data written by some previous commands requires synchronization between them on the GPU.

But Vulkan still guarantees that commands are completed in the correct order if they are submitted to the same command buffer within the same queue, right? Is Vulkan allowed to start with a vkCmdDraw before a preceding vkCmdCopyBuffer is complete?

No. Synchronization between commands is the application’s responsibility (again, with a few specific exceptions). That’s why there is an entire chapter on how synchronization works in the standard.

If there is no correct synchronization between those operations, yes.

Ok, so what if the vkCmdCopyBuffer is submitted in a separate command buffer that is sumitted to the queue before the one that has the drawcalls will that have any guarantees? And if no, is there anything besides vkQueueWaitIdle or fences that will make the drawcall wait?

Ok, how about this:

I have two command buffers. The first command buffer includes the vkCmdCopyBuffer and is submitted to the render queue first. The second command buffer includes all of the render commands including the start/end of the render passes. The render passes have a VkSubpassDependency with VK_SUBPASS_EXTERNAL for srcSubpass. That means all draw commands within the render pass should wait for the commands before the vkCmdBeginRenderPass which includes the vkCmdCopyBuffer, right?

Stop trying to weasel out of synchronization. You can’t. That’s what Vulkan is for.

Synchronization doesn’t care which command buffers a set of commands is in. It only cares about the order of the commands as submitted to the queue. By the time synchronization matters, the question of which commands were in which buffers has been made irrelevant.

Not good enough. You need a proper memory barrier as well to ensure the visibility of the data, and that barrier needs to cite the appropriate stages. The default external dependency doesn’t do that.

Again, you need synchronization. Stop trying to side-step that. Just spell it out like everyone else.

I don’t understand. The VkSubpassDependency page even compares its functionality to memory barriers.

" For non-attachment resources, the memory dependency expressed by subpass dependency is nearly identical to that of a VkMemoryBarrier (with matching srcAccessMask and dstAccessMask parameters) submitted as a part of a vkCmdPipelineBarrier (with matching srcStageMask and dstStageMask parameters)."

The reason why I don’t want to use vkQueueWaitIdle is that it uses Linux poll() internally which strace tells me sometimes takes 10ms in high load case on the thread that builds the buffers. I want to shift this burden to the background vulkan threads.

That’s because it is a memory barrier. Or rather, it can contain memory barriers.

But you still need to provide the dependency. That’s what I mean by “spell it out”. You have to set it up properly at render pass creation time. When I said that an external dependency wasn’t good enough, I meant that literally: merely having an external dependency isn’t good enough. You have to have the correct external dependency, one which includes appropriate memory barriers for the source and consuming operations.

You don’t have to explain why you don’t want to use vkQueueWaitIdle. You should never call this function in production code, outside of cases where you need to do a teardown of the device (at which point, it should probably be vkDeviceWaitIdle). When I said that you needed explicit synchronization, I did not mean the CPU waiting for a queue to idle.

Ok, so assuming I add the correct stage and access masks to the VkSubpassDependency or use a VkMemoryBarrier instead I could avoid the vkQueueWaitIdle, right?

So long as you do the other things needed to make the range you’ve written available, yes.

For example, if you wrote to non-coherently-mapped memory, after writing to it, you need to invoke vkFlushMappedMemoryRanges, and the range needs to be properly aligned.

There are also a bunch of rules you have to follow if you want to write to the memory after you have submitted the queue operation that will read from it (which is odd, but not entirely unreasonable).

1 Like