Vulkan fence being waited on indefinetely

Hello!

I am currently facing a very weird issue, where after resizing my window (and of course properly handling in with recreating the swapchain and such, I had fixed all the validation errors stemming from that) a completely unrelated command buffer submission is never completed.

Thread 0, Frame 151:
vkBeginCommandBuffer(commandBuffer, pBeginInfo) returns VkResult VK_SUCCESS (0):
    commandBuffer:                  VkCommandBuffer = 0x55d4ece282f0
    pBeginInfo:                     const VkCommandBufferBeginInfo* = 0x7ffe1eb04208:
        sType:                          VkStructureType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO (42)
        pNext:                          const void* = NULL
        flags:                          VkCommandBufferUsageFlags = 0
        pInheritanceInfo:               const VkCommandBufferInheritanceInfo* = UNUSED

Thread 0, Frame 151:
vkCmdCopyBuffer(commandBuffer, srcBuffer, dstBuffer, regionCount, pRegions) returns void:
    commandBuffer:                  VkCommandBuffer = 0x55d4ece282f0
    srcBuffer:                      VkBuffer = 0x55d4ece36780
    dstBuffer:                      VkBuffer = 0x55d4ece35370
    regionCount:                    uint32_t = 1
    pRegions:                       const VkBufferCopy* = 0x7ffe1eb040d0
        pRegions[0]:                    const VkBufferCopy = 0x7ffe1eb040d0:
            srcOffset:                      VkDeviceSize = 0
            dstOffset:                      VkDeviceSize = 0
            size:                           VkDeviceSize = 1

Thread 0, Frame 151:
vkEndCommandBuffer(commandBuffer) returns VkResult VK_SUCCESS (0):
    commandBuffer:                  VkCommandBuffer = 0x55d4ece282f0

Thread 0, Frame 151:
vkQueueSubmit(queue, submitCount, pSubmits, fence) returns VkResult VK_SUCCESS (0):
    queue:                          VkQueue = 0x55d4ec4b5af0
    submitCount:                    uint32_t = 1
    pSubmits:                       const VkSubmitInfo* = 0x55d4ed4f0b08
        pSubmits[0]:                    const VkSubmitInfo = 0x55d4ed4f0b08:
            sType:                          VkStructureType = VK_STRUCTURE_TYPE_SUBMIT_INFO (4)
            pNext:                          const void* = NULL
            waitSemaphoreCount:             uint32_t = 0
            pWaitSemaphores:                const VkSemaphore* = NULL
            pWaitDstStageMask:              const VkPipelineStageFlags* = 0x55d4ed4dcea0
            commandBufferCount:             uint32_t = 1
            pCommandBuffers:                const VkCommandBuffer* = 0x55d4ed4e1210
                pCommandBuffers[0]:             const VkCommandBuffer = 0x55d4ece282f0
            signalSemaphoreCount:           uint32_t = 0
            pSignalSemaphores:              const VkSemaphore* = NULL
    fence:                          VkFence = 0x55d4ece2cbb0

This is the relevant snippet of the API dump, where I perform a simple copy operation that ends up never completing. A thing worth noting is also that the same exact queue is used right before for the vkQueuePresentKHR that fails like so:

vkQueuePresentKHR(queue, pPresentInfo) returns VkResult VK_ERROR_OUT_OF_DATE_KHR (-1000001004):
    queue:                          VkQueue = 0x55d4ec4b5af0

If needed I would also be willing to share the full 7mb dump but it’s too long to just paste here and sadly I cannot include links. If someone would also just point me or give me a hint for where to start looking for the problem that would be great too because I am completely lost.

Thanks in advance!

I also found out that sometimes I get the same blocking behavior under the same circumstances but when waiting on the fence that should be signaled by the vkAcquireNextImageKHR.

Are you properly reseting the fence before reusing it (and not double using it in another queue)?

So I’ve finally solved it. The problem was that if I resized my window I would immediately destroy the resources still in use by vkQueuePresentKHR that had a queue operation running.
This undefined behavior is only implied in the spec in this paragraph so I may be wrong (implied as in if we submit it to the queue then there must be a window of time where the resources are being used and I am modifying them).

Queueing an image for presentation defines a set of queue operations, including waiting on the semaphores and submitting a presentation request to the presentation engine. However, the scope of this set of queue operations does not include the actual processing of the image by the presentation engine.

While I am still not sure if this is the correct solution or if I am even doing it right, I have fixed it for now by calling vkQueueWaitIdle on the queue used for vkQueuePresentKHR if the window is resized as the function doesn’t allow you to signal any synchronization primitives.