Parallel execution of multiple command buffers in a single queue family


I create a command pool like this:

VkCommandPool commandPool = VK_NULL_HANDLE;
void MainWindow::createCommandPool()
	QueueFamilyIndices queueFamilyIndices = findQueueFamilies(physicalDevice);

	VkCommandPoolCreateInfo poolInfo{};
	poolInfo.queueFamilyIndex = queueFamilyIndices.graphicsFamily.value();

	if (vkCreateCommandPool(logicalDevice, &poolInfo, nullptr, &commandPool) != VK_SUCCESS) {
		throw MakeErrorInfo("Failed to create command pool!");

I have two well-known(probably) all functions:

VkCommandBuffer MainWindow::beginSingleTimeCommands()
	VkCommandBufferAllocateInfo allocInfo{};
	allocInfo.commandPool = commandPool;
	allocInfo.commandBufferCount = 1;

	VkCommandBuffer commandBuffer;
	vkAllocateCommandBuffers(logicalDevice, &allocInfo, &commandBuffer);

	VkCommandBufferBeginInfo beginInfo{};

	vkBeginCommandBuffer(commandBuffer, &beginInfo);

	return commandBuffer;

void MainWindow::endSingleTimeCommands(VkCommandBuffer commandBuffer) {

    VkSubmitInfo submitInfo{};
    submitInfo.commandBufferCount = 1;
    submitInfo.pCommandBuffers = &commandBuffer;

    vkQueueSubmit(graphicsQueue, 1, &submitInfo, VK_NULL_HANDLE);

    vkFreeCommandBuffers(device, commandPool, 1, &commandBuffer);

With all this in mind, I wondered what would happen if multiple parallel threads started executing any commands simultaneously using this function using these commands.

1 question. Since I’m currently using vkQueueWaitIdle(graphicsQueue) I shouldn’t be afraid of anything, because they will be executed sequentially and one at a time, right?

2 question. What if I’m using VkFence and vkWaitForFences()?

VkFenceCreateInfo fenceInfo{};

VkFence executionCompleteFence = VK_NULL_HANDLE;
if (vkCreateFence(logicalDevice, &fenceInfo, VK_NULL_HANDLE, &executionCompleteFence) != VK_SUCCESS) {
	throw MakeErrorInfo("Failed to create fence");

vkQueueSubmit(graphicsQueue, 1, &submitInfo, executionCompleteFence);
vkWaitForFences(logicalDevice, 1, &executionCompleteFence, VK_TRUE, UINT64_MAX);

vkFreeCommandBuffers(logicalDevice, commandPool, 1, &commandBuffer);
vkDestroyFence(logicalDevice, executionCompleteFence, VK_NULL_HANDLE);

Then vkQueueSubmit() can be triggered simultaneously in parallel threads, what happens in this case?


All vkQueue* functions require you to synchronize access to the queue in question. That is, you cannot call two such functions from different threads on the same queue at the same time.

Also, never use vkQueueWaitIdle. And never submit more than once per-frame to the same queue. If you’re going to throw away virtually all of the benefits of Vulkan, you may as well use OpenGL.

That changes nothing. Also, you should not be creating and destroying fences in the middle of a frame like that.

1 Like

Never at all? And if in the same function, but never call it from multiple threads?

Isn’t asynchrony and hence parallelism the advantages of Vulkan over OpenGL?

I can’t think of a circumstance when it would be appropriate. vkDeviceWaitIdle is useful in one instance: when you’re right about to destroy your Vulkan device and all the objects along within it. You have to wait for the device to be done with stuff before you can do that.

But waiting for a queue to idle doesn’t make sense to me. Waiting for a particular batch of work to finish, sure. But by the time you try to wait on that batch, there should be more work already submitted after that batch. So the queue would only ever idle if you can’t feed it fast enough.

Did you just say this:

If you’re not creating work from multiple threads, where’s the “asynchrony[sic] and hence parallelism” going to come from? You don’t just get these things because you write code using Vulkan. You have to write good code using Vulkan. That isn’t it.

The way you’ve written your code won’t buy you anything on the CPU compared to GL. In fact, it will likely perform worse, as OpenGL implementations bend over backwards to avoid the exact kind of GPU/CPU sync that your vkQueueWaitIdle call does.

1 Like

What should be there instead of vkQueueWaitIdle(graphicsQueue)?
What happens if I call vkQueueSubmit(graphicsQueue,…) at the same time from different threads?

Nothing. Why do you need the CPU to wait until the GPU has finished the work?

I’m working on a thread right now about this sort of thing.

Undefined behavior. You’re not allowed to do that.

1 Like

Isn’t it to avoid problems like this:

void init()
    VkCommandBuffer commandBuffer = beginSingleTimeCommands();
    VkBufferCopy copyRegion{};
    copyRegion.size = size; //Very big size, very long copy time
    vkCmdCopyBuffer(commandBuffer, srcBuffer, dstBuffer, 1, &copyRegion);

    VkCommandBuffer commandBuffer = beginSingleTimeCommands();
    VkBufferCopy copyRegion{};
    copyRegion.size = size; //Very big size, very long copy time
    vkCmdCopyBuffer(commandBuffer, srcBuffer, dstBuffer, 1, &copyRegion);

void renderFrame()
    //The copy hasn't finished yet
    vkCmdBindVertexBuffers(commandBuffers[i], 0, 1, vertexBuffers, offsets);
    vkCmdDraw(commandBuffers[i], vertexes.size(), 1, 0, 0);

Or is it not a problem?

Then put some synchronization in. Since you appear to be submitting these transfer operations on the same queue as the frame rendering, have the transfer operation raise an event after it’s finished, and have your renderFrame function perform a vkCmdWaitEvents on that event (along with the appropriate memory barriers).

1 Like

Thank you for your answer, I understood the main idea.