What Synchronization is Needed When Reusing Attachments Between Frames?

Looking around online, it’s suprisingly difficult to get a conclusive answer to this seemingly basic question. Let me provide a concrete example to help frame the question. Don’t focus too much on the specifics. A lot of it probably isn’t relevent to the question but may help focus the discussion.

  • I have 3 swapchain images.
  • I have 2 command pools, 2 command buffers, and 2 fences that I alternate using each frame. I’ve seen this referred to the number of “frames in flight”.
  • I have 1 queue that’s used for both graphics and presentation.
  • At the beginning of each frame I wait on the associated frame in flight fence (i.e. the signal fence of vkQueueSubmit 2 frames ago) and a new vkAcquireNextImageKHR signal fence. I process user input and update my simulation at fixed time steps while waiting. This is the only explicit synchronization I have as far as I can tell; there are no other fences and no semaphores or events or explicit barriers.
  • The goal of 3 swapchain images with 2 frames in flight with the above fences is to minimize input latency while still allowing for some CPU/GPU parallelism, e.g., something like this: www.pandza.xyz/images/swap-chain/graph-latency-frames.svg
  • I have one multisample color image and one multisample depth image every frame reuses.
  • I have one render pass with one subpass that uses the multisample color and depth images to render a frame and finally resolve to the acquired swapchain image. There are no explicit subpass dependencies.

The render pass attachments are defined as:

VkAttachmentDescription attachments[] = {
    [0] = {
        .format = swapchain_format,
        .samples = VK_SAMPLE_COUNT_4_BIT,
        .loadOp = VK_ATTACHMENT_LOAD_OP_CLEAR,
        .storeOp = VK_ATTACHMENT_STORE_OP_DONT_CARE,
        .stencilLoadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE,
        .stencilStoreOp = VK_ATTACHMENT_STORE_OP_DONT_CARE,
        .initialLayout = VK_IMAGE_LAYOUT_UNDEFINED,
        .finalLayout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL,
    },
    [1] = {
        .format = swapchain_format,
        .samples = VK_SAMPLE_COUNT_1_BIT,
        .loadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE,
        .storeOp = VK_ATTACHMENT_STORE_OP_STORE,
        .stencilLoadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE,
        .stencilStoreOp = VK_ATTACHMENT_STORE_OP_DONT_CARE,
        .initialLayout = VK_IMAGE_LAYOUT_UNDEFINED,
        .finalLayout = VK_IMAGE_LAYOUT_PRESENT_SRC_KHR,
    },
    [2] = {
        .format = VK_FORMAT_D16_UNORM,
        .samples = VK_SAMPLE_COUNT_4_BIT,
        .loadOp = VK_ATTACHMENT_LOAD_OP_CLEAR,
        .storeOp = VK_ATTACHMENT_STORE_OP_DONT_CARE,
        .stencilLoadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE,
        .stencilStoreOp = VK_ATTACHMENT_STORE_OP_DONT_CARE,
        .initialLayout = VK_IMAGE_LAYOUT_UNDEFINED,
        .finalLayout = VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL,
    },
};

VkAttachmentReference color_attachments_0[] = {
    {
        .attachment = 0,
        .layout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL,
    },
};

VkAttachmentReference resolve_attachments_0[] = {
    {
        .attachment = 1,
        .layout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL,
    },
};

VkAttachmentReference depth_attachment_0 = {
    .attachment = 2,
    .layout = VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL,
};

This is all to say that I believe it’s possible for two frames to render in-parallel if the stars align, and the answer from this thread (community.khronos.org/t/framebuffer-attachments-clarification/7595/9) “you almost certainly already have some form of synchronization between the two render passes” isn’t sufficient.

As described, this all works, and the validation layers don’t complain. If, however, I explicitly add minimal do-nothing external subpass dependencies to override the implicit defaults, e.g.:

VkSubpassDependency dependencies[] = {
    {
        .srcSubpass = VK_SUBPASS_EXTERNAL,
        .dstSubpass = 0,
        .srcStageMask = VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT,
        .dstStageMask = VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT,
        .srcAccessMask = 0,
        .dstAccessMask = 0,
    },
    {
        .srcSubpass = 0,
        .dstSubpass = VK_SUBPASS_EXTERNAL,
        .srcStageMask = VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT,
        .dstStageMask = VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT,
        .srcAccessMask = 0,
        .dstAccessMask = 0,
    },
};

Then the validation layers complain as expected about write-after-write hazards:

SYNC-HAZARD-WRITE_AFTER_WRITE(ERROR / SPEC): msgNum: -33950239 - Validation Error: [ SYNC-HAZARD-WRITE_AFTER_WRITE ] Object 0: handle = 0x1d4edc7e110, type = VK_OBJECT_TYPE_RENDER_PASS; | MessageID = 0xfdf9f5e1 | vkCmdBeginRenderPass: Hazard WRITE_AFTER_WRITE vs. layout transition in subpass 0 for attachment 0 aspect color during load with loadOp VK_ATTACHMENT_LOAD_OP_CLEAR.
    Objects: 1
        [0] 0x1d4edc7e110, type: 18, name: NULL
SYNC-HAZARD-WRITE_AFTER_WRITE(ERROR / SPEC): msgNum: -33950239 - Validation Error: [ SYNC-HAZARD-WRITE_AFTER_WRITE ] Object 0: handle = 0x1d4edc7e110, type = VK_OBJECT_TYPE_RENDER_PASS; | MessageID = 0xfdf9f5e1 | vkCmdBeginRenderPass: Hazard WRITE_AFTER_WRITE vs. layout transition in subpass 0 for attachment 2 aspect depth during load with loadOp VK_ATTACHMENT_LOAD_OP_CLEAR.
    Objects: 1
        [0] 0x1d4edc7e110, type: 18, name: NULL

Note: I think these minimalist subpass dependencies also create a hazard where the swapchain image results aren’t made “available” to the presentation engine, but the validation layer is known to be lacking features around swapchain image synchronization (see Add swapchain image hazard detection by jzulauf-lunarg · Pull Request #2606 · KhronosGroup/Vulkan-ValidationLayers · GitHub) and this issue is unrelated to my question about reusing the offscreen color and depth attachments anyway.

Now, if I just make a slight modification to the incoming external subpass dependency by changing the dstStageMask and dstAccessMask:

{
    .srcSubpass = VK_SUBPASS_EXTERNAL,
    .dstSubpass = 0,
    .srcStageMask = VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT,
    .dstStageMask = VK_PIPELINE_STAGE_EARLY_FRAGMENT_TESTS_BIT,
    .srcAccessMask = 0,
    .dstAccessMask =
        VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT |
        VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_WRITE_BIT,
},

Then the validation layer complaints go away. But should they? It’s my understanding that the .srcStageMask = VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT means “don’t wait for anything”, so how could this prevent any memory hazards? Is this an oversight in the validation layers, or is there something more subtle going on here that makes this subpass dependency sufficient? I’m also not clear on if I should be including more stages in the .dstStageMask or if VK_PIPELINE_STAGE_EARLY_FRAGMENT_TESTS_BIT implies all later stages as well.

If I were being overly-cautious, my full incoming external subpass dependency would look like:

{
    .srcSubpass = VK_SUBPASS_EXTERNAL,
    .dstSubpass = 0,
    .srcStageMask =
        VK_PIPELINE_STAGE_EARLY_FRAGMENT_TESTS_BIT |
        VK_PIPELINE_STAGE_LATE_FRAGMENT_TESTS_BIT |
        VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT,
    .dstStageMask =
        VK_PIPELINE_STAGE_EARLY_FRAGMENT_TESTS_BIT |
        VK_PIPELINE_STAGE_LATE_FRAGMENT_TESTS_BIT |
        VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT,
    .srcAccessMask =
        VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT |
        VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_WRITE_BIT,
    .dstAccessMask =
        VK_ACCESS_COLOR_ATTACHMENT_READ_BIT |
        VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT |
        VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_READ_BIT |
        VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_WRITE_BIT,
},

With the goal of making sure the color and depth attachment writes from the previous frame are totally done before I start reusing the attachments this frame. But looking around online at similar questions and examples, I feel like I’ve seen every permutation between the minimal do-nothing dependency and the overly-cautious dependency.

If I do need something like this overly-cautious dependency, are the *_READ_BIT's necessary? I’m not actually trying to read the attachment results from the previous frame since the first thing I do is clear them.

It’s also worth noting I use VK_ATTACHMENT_LOAD_OP_CLEAR and VK_ATTACHMENT_STORE_OP_DONT_CARE on these attachments in-question, so the actual order of the render passes doesn’t matter so long as they don’t interleve in any weird way. Maybe that has something to do with why .srcStageMask = VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT is sufficient according to the validaiton layers? I’m really murky on this bit in particular.

I’ve also read conflicting advice about using external subpass dependencies vs using vkCmdPipelineBarrier. Subpass dependencies are certainly more convenient and they can supposedly help with performance because the driver “has more information”, but I’ve also read today’s drivers don’t actually do these theoretical optimizations. And the amount of synchronization they do seems overly broad: they act as full memory barriers and apply to every attachment (or just the ones that are changing layouts? Not clear on that). So, for example, the above overly-cautious dependency applies to the swapchain image as well, but I theoretically don’t need it to because I wait on the vkAcquireNextImageKHR fence before recording/submitting this render pass’s commands. But I’ve also read memory barriers for specific resources are rarely better than full memory barriers! So should I use a separate vkCmdPipelineBarrier for each of the three incoming layout changes? One for all three of them? Should I use vkCmdPipelineBarrier for just the offscreen color and depth images but rely on the implicit default external subpass dependencies for the swapchain image? There are so many permutations here, and I’ve read arguments for many of them. No one seems to agree what the right thing to do here is!