Artifacts when resolving MSAA depth buffer and reading in later render pass

I recently added MSAA support to my renderer that uses dynamic rendering. I need to render in two passes:

  1. Main render pass. MSAA toggleable on/off.
  2. Second render pass. No MSAA.
    • Writes to color buffer from previous pass.
    • Reads depth buffer from previous pass to:
      • Do depth tests.
      • For sampling in fragment shaders (to implement e.g. soft particles).

Everything works as expected on Intel and AMD GPUs I have available for testing. With MSAA enabled and disabled.

However, on the Nvidia GPU I am testing on everything works as expected when MSAA is disabled but not when it is enabled. It seems like depth testing does not work reliably in the second render pass. E.g. soft particles mostly do not show up. Depending on resolution they are visible in some parts of the screen and sometimes with a checkerboard like pattern like in the following image:

Disabling the depth testing in the soft particle pipelines make the particles always show up. Sampling depth buffer in fragment shader seems to always work. As can be seen in the screenshot, I can also sample the resolved depth buffer OK when drawing it in a debug overlay in a later render pass.

I do not get any warnings from the validation layer with VK_VALIDATION_FEATURE_ENABLE_SYNCHRONIZATION_VALIDATION_EXT enabled. Not on any of the GPU:s regardless of if MSAA is enabled or not. So I am not sure what is wrong. Is the validation layer not validating barriers correctly in this case? Or is there maybe a bug in the Nvidia driver?

Below is how I setup my render passes with dynamic rendering (copied from renderer and modified to make sense “standalone”, hopefully there were no mistakes when I extraced it):

/////////////////////////////

// Main render images and views. VK_SAMPLE_COUNT_1_BIT if MSAA disabled. VK_SAMPLE_COUNT_X_BIT, X > 1, if MSAA enabled.
VkImage     color_image = ...;
VkImageView color_image_view = ...;
VkImage     depth_image = ...;
VkImageView depth_image_view = ...;

// Resolved images and views. Always VK_SAMPLE_COUNT_1_BIT sample. VK_NULL_HANDLE if MSAA disabled.
VkImage     color_resolved_image = ...;
VkImageView color_resolved_image_view = ...;
VkImage     depth_resolved_image = ...;
VkImageView depth_resolved_image_view = ...;

////////////////////////////// Main render pass start...
{
    VkImageMemoryBarrier color_barriers[2];
    u32 color_barriers_count = 1;
    color_barriers[0] = {.sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER};
    color_barriers[0].srcAccessMask = VK_ACCESS_MEMORY_READ_BIT;
    color_barriers[0].dstAccessMask = VK_ACCESS_COLOR_ATTACHMENT_READ_BIT | VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT;
    color_barriers[0].oldLayout = VK_IMAGE_LAYOUT_UNDEFINED;
    color_barriers[0].newLayout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;
    color_barriers[0].srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
    color_barriers[0].dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
    color_barriers[0].image = color_image;
    color_barriers[0].subresourceRange = {VK_IMAGE_ASPECT_COLOR_BIT, 0, 1, 0, 1};
    if (color_resolved_image) {
        color_barriers[1] = {.sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER};
        color_barriers[1].srcAccessMask = VK_ACCESS_MEMORY_READ_BIT;
        color_barriers[1].dstAccessMask = VK_ACCESS_COLOR_ATTACHMENT_READ_BIT | VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT;
        color_barriers[1].oldLayout = VK_IMAGE_LAYOUT_UNDEFINED;
        color_barriers[1].newLayout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;
        color_barriers[1].srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
        color_barriers[1].dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
        color_barriers[1].image = color_resolved_image;
        color_barriers[1].subresourceRange = {VK_IMAGE_ASPECT_COLOR_BIT, 0, 1, 0, 1};
        color_barriers_count++;
    }

    vkCmdPipelineBarrier(command_buffer,
                         VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT,
                         VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT,
                         VK_DEPENDENCY_BY_REGION_BIT,
                         0,
                         nullptr,
                         0,
                         nullptr,
                         color_barriers_count,
                         color_barriers);

    VkImageMemoryBarrier depth_barrier = {.sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER};
    depth_barrier.srcAccessMask = VK_ACCESS_SHADER_READ_BIT;
    depth_barrier.dstAccessMask = VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_WRITE_BIT;
    depth_barrier.oldLayout = VK_IMAGE_LAYOUT_UNDEFINED;
    depth_barrier.newLayout = VK_IMAGE_LAYOUT_DEPTH_ATTACHMENT_OPTIMAL;
    depth_barrier.srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
    depth_barrier.dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
    depth_barrier.image = depth_image;
    depth_barrier.subresourceRange = {VK_IMAGE_ASPECT_DEPTH_BIT, 0, 1, 0, 1};

    vkCmdPipelineBarrier(command_buffer,
                         VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT,
                         VK_PIPELINE_STAGE_EARLY_FRAGMENT_TESTS_BIT,
                         VK_DEPENDENCY_BY_REGION_BIT,
                         0,
                         nullptr,
                         0,
                         nullptr,
                         1,
                         &depth_barrier);

    if (depth_resolved_image) {
        VkImageMemoryBarrier depth_resolve_barrier = {.sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER};
        depth_resolve_barrier.srcAccessMask = VK_ACCESS_SHADER_READ_BIT;
        depth_resolve_barrier.dstAccessMask = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT;
        depth_resolve_barrier.oldLayout = VK_IMAGE_LAYOUT_UNDEFINED;
        depth_resolve_barrier.newLayout = VK_IMAGE_LAYOUT_DEPTH_ATTACHMENT_OPTIMAL;
        depth_resolve_barrier.srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
        depth_resolve_barrier.dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
        depth_resolve_barrier.image = depth_resolved_image;
        depth_resolve_barrier.subresourceRange = {VK_IMAGE_ASPECT_DEPTH_BIT, 0, 1, 0, 1};

        vkCmdPipelineBarrier(command_buffer,
                             VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT,
                             VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT,
                             VK_DEPENDENCY_BY_REGION_BIT,
                             0,
                             nullptr,
                             0,
                             nullptr,
                             1,
                             &depth_resolve_barrier);
    }

    VkRenderingAttachmentInfo color_attachment = {.sType = VK_STRUCTURE_TYPE_RENDERING_ATTACHMENT_INFO};
    color_attachment.imageView = color_image_view;
    color_attachment.imageLayout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;
    if (color_resolved_image) {
        color_attachment.resolveMode = VK_RESOLVE_MODE_AVERAGE_BIT;
        color_attachment.resolveImageView = color_resolved_image_view;
        color_attachment.resolveImageLayout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;
    }
    color_attachment.loadOp = VK_ATTACHMENT_LOAD_OP_CLEAR;
    color_attachment.storeOp = VK_ATTACHMENT_STORE_OP_STORE;
    color_attachment.clearValue = {0.0f, 0.0f, 0.0f, 1.0f};

    VkRenderingAttachmentInfo depth_attachment = {.sType = VK_STRUCTURE_TYPE_RENDERING_ATTACHMENT_INFO};
    depth_attachment.imageView = depth_image_view;
    depth_attachment.imageLayout = VK_IMAGE_LAYOUT_DEPTH_ATTACHMENT_OPTIMAL;
    if (depth_resolved_image) {
        depth_attachment.resolveMode = VK_RESOLVE_MODE_MIN_BIT;
        depth_attachment.resolveImageView = depth_resolved_image_view;
        depth_attachment.resolveImageLayout = VK_IMAGE_LAYOUT_DEPTH_ATTACHMENT_OPTIMAL;
    }
    depth_attachment.loadOp = VK_ATTACHMENT_LOAD_OP_CLEAR;
    depth_attachment.storeOp = VK_ATTACHMENT_STORE_OP_STORE;
    depth_attachment.clearValue.depthStencil = {1.0f, 0};

    VkRenderingInfo rendering_info = {.sType = VK_STRUCTURE_TYPE_RENDERING_INFO};
    rendering_info.renderArea.extent = {width, height};
    rendering_info.layerCount = 1;
    rendering_info.colorAttachmentCount = 1;
    rendering_info.pColorAttachments = &color_attachment;
    rendering_info.pDepthAttachment = &depth_attachment;

    vkCmdBeginRendering(command_buffer, &rendering_info);
}
////////////////////////////// Main render pass rendering...
{
    vkCmdEndRendering(command_buffer);

    VkImageMemoryBarrier color_barriers[2];
    u32 color_barriers_count = 1;
    color_barriers[0] = {.sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER};
    color_barriers[0].srcAccessMask = VK_ACCESS_COLOR_ATTACHMENT_READ_BIT | VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT;
    color_barriers[0].dstAccessMask = VK_ACCESS_MEMORY_READ_BIT;
    color_barriers[0].oldLayout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;
    color_barriers[0].newLayout = VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL;
    color_barriers[0].srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
    color_barriers[0].dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
    color_barriers[0].image = color_image;
    color_barriers[0].subresourceRange = {VK_IMAGE_ASPECT_COLOR_BIT, 0, 1, 0, 1};
    if (color_resolved_image) {
        color_barriers[1] = {.sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER};
        color_barriers[1].srcAccessMask = VK_ACCESS_COLOR_ATTACHMENT_READ_BIT | VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT;
        color_barriers[1].dstAccessMask = VK_ACCESS_MEMORY_READ_BIT;
        color_barriers[1].oldLayout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;
        color_barriers[1].newLayout = VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL;
        color_barriers[1].srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
        color_barriers[1].dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
        color_barriers[1].image = color_resolved_image;
        color_barriers[1].subresourceRange = {VK_IMAGE_ASPECT_COLOR_BIT, 0, 1, 0, 1};
        color_barriers_count++;
    }

    vkCmdPipelineBarrier(command_buffer,
                         VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT,
                         VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT | VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT,
                         VK_DEPENDENCY_BY_REGION_BIT,
                         0,
                         nullptr,
                         0,
                         nullptr,
                         color_barriers_count,
                         color_barriers);

    VkImageMemoryBarrier depth_barrier = {.sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER};
    depth_barrier.srcAccessMask = VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_WRITE_BIT;
    depth_barrier.dstAccessMask = VK_ACCESS_SHADER_READ_BIT;
    depth_barrier.oldLayout = VK_IMAGE_LAYOUT_DEPTH_ATTACHMENT_OPTIMAL;
    depth_barrier.newLayout = VK_IMAGE_LAYOUT_DEPTH_READ_ONLY_OPTIMAL;
    depth_barrier.srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
    depth_barrier.dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
    depth_barrier.image = depth_image;
    depth_barrier.subresourceRange = {VK_IMAGE_ASPECT_DEPTH_BIT, 0, 1, 0, 1};

    vkCmdPipelineBarrier(command_buffer,
                         VK_PIPELINE_STAGE_LATE_FRAGMENT_TESTS_BIT,
                         VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT,
                         VK_DEPENDENCY_BY_REGION_BIT,
                         0,
                         nullptr,
                         0,
                         nullptr,
                         1,
                         &depth_barrier);

    if (depth_resolved_image) {
        VkImageMemoryBarrier depth_resolve_barrier = {.sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER};
        depth_resolve_barrier.srcAccessMask = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT;
        depth_resolve_barrier.dstAccessMask = VK_ACCESS_SHADER_READ_BIT;
        depth_resolve_barrier.oldLayout = VK_IMAGE_LAYOUT_DEPTH_ATTACHMENT_OPTIMAL;
        depth_resolve_barrier.newLayout = VK_IMAGE_LAYOUT_DEPTH_READ_ONLY_OPTIMAL;
        depth_resolve_barrier.srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
        depth_resolve_barrier.dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
        depth_resolve_barrier.image = depth_resolved_image;
        depth_resolve_barrier.subresourceRange = {VK_IMAGE_ASPECT_DEPTH_BIT, 0, 1, 0, 1};

        vkCmdPipelineBarrier(command_buffer,
                             VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT,
                             VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT,
                             VK_DEPENDENCY_BY_REGION_BIT,
                             0,
                             nullptr,
                             0,
                             nullptr,
                             1,
                             &depth_resolve_barrier);
    }
}
////////////////////////////// Main render pass done. Start second render pass that only reads depth buffer from main render pass.
{
    VkImageMemoryBarrier color_barrier = {.sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER};
    color_barrier.srcAccessMask = VK_ACCESS_MEMORY_READ_BIT;
    color_barrier.dstAccessMask = VK_ACCESS_COLOR_ATTACHMENT_READ_BIT | VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT;
    color_barrier.oldLayout = VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL;
    color_barrier.newLayout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;
    color_barrier.srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
    color_barrier.dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
    color_barrier.image = color_resolved_image ? color_resolved_image : color_image;
    color_barrier.subresourceRange = {VK_IMAGE_ASPECT_COLOR_BIT, 0, 1, 0, 1};

    vkCmdPipelineBarrier(command_buffer,
                         VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT,
                         VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT,
                         VK_DEPENDENCY_BY_REGION_BIT,
                         0,
                         nullptr,
                         0,
                         nullptr,
                         1,
                         &color_barrier);

    VkRenderingAttachmentInfo color_attachment = {.sType = VK_STRUCTURE_TYPE_RENDERING_ATTACHMENT_INFO};
    color_attachment.imageView = color_resolved_image ? color_resolved_image_view : color_image_view;
    color_attachment.imageLayout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;
    color_attachment.loadOp = VK_ATTACHMENT_LOAD_OP_LOAD;
    color_attachment.storeOp = VK_ATTACHMENT_STORE_OP_STORE;

    VkRenderingAttachmentInfo depth_attachment = {.sType = VK_STRUCTURE_TYPE_RENDERING_ATTACHMENT_INFO};
    depth_attachment.imageView = depth_resolved_image ? depth_resolved_image_view : depth_image_view;
    depth_attachment.imageLayout = VK_IMAGE_LAYOUT_DEPTH_READ_ONLY_OPTIMAL;
    depth_attachment.loadOp = VK_ATTACHMENT_LOAD_OP_LOAD;
    depth_attachment.storeOp = VK_ATTACHMENT_STORE_OP_NONE;

    VkRenderingInfo rendering_info = {.sType = VK_STRUCTURE_TYPE_RENDERING_INFO};
    rendering_info.renderArea.extent = {width, height};
    rendering_info.layerCount = 1;
    rendering_info.colorAttachmentCount = 1;
    rendering_info.pColorAttachments = &color_attachment;
    rendering_info.pDepthAttachment = &depth_attachment;

    vkCmdBeginRendering(command_buffer, &rendering_info);
}
////////////////////////////// Second render pass rendering...
{
    vkCmdEndRendering(command_buffer);

    VkImageMemoryBarrier color_barrier = {.sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER};
    color_barrier.srcAccessMask = VK_ACCESS_COLOR_ATTACHMENT_READ_BIT | VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT;
    color_barrier.dstAccessMask = VK_ACCESS_MEMORY_READ_BIT;
    color_barrier.oldLayout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;
    color_barrier.newLayout = VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL;
    color_barrier.srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
    color_barrier.dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
    color_barrier.image = color_resolved_image ? color_resolved_image : color_image;
    color_barrier.subresourceRange = {VK_IMAGE_ASPECT_COLOR_BIT, 0, 1, 0, 1};

    vkCmdPipelineBarrier(command_buffer,
                         VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT,
                         VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT | VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT,
                         VK_DEPENDENCY_BY_REGION_BIT,
                         0,
                         nullptr,
                         0,
                         nullptr,
                         1,
                         &color_barrier);
}
////////////////////////////// Second render pass done.

Another thing I am a bit confused about… maybe not related, or maybe it is.

When using VkRenderPass, I had dependencies setup to handle SYNC-HAZARD-WRITE_AFTER_WRITE as described in Synchronization Examples · KhronosGroup/Vulkan-Docs Wiki · GitHub . Also had similar dependencies for SYNC-HAZARD-READ_AFTER_WRITE to handle depth tests and reading of depth buffer in secondary render pass. When migrating to dynamic rendering I could not see how to translate this to image barriers. But since everything seemed to work without any warnings from the validation layer, I hoped things were still OK. Everything worked as expected until I added MSAA and tried it on the Nvidia GPU.

Nobody knows what could be wrong here? Do barriers in my code look correct?

I was leaving it for one of the Vulkan gurus that hang out here (I’m not one of them). If there’s a sync issue, they’ll be more likely to spot it.

My naive read suggests you could could probably do more to nail this down through more testing on the NVIDIA GPU. This testing might help to point out your bug (or nail down the nature of a driver bug, if in-fact one is coming into play). For instance:

  • Have you tried inserting sledgehammer synchronization between your rendering phases (vkQueueWaitIdle or vkDeviceWaitIdle)? Any difference?
  • What if you skip MSAA resolve on the depth and have your 2nd pass just do texelFetch() lookups into the rendered MSAA depth? Do you get reasonable results?
  • What if you do your own depth resolve? Do you get reasonable results?
  • If you display sample N in your original MSAA depth buffer as color, are the results reasonable?
  • If you display your resolved depth as color, are the results reasonable?
  • Have you tried using a different depth buffer format for the depth buffer?
  • What NVIDIA GPU and driver version are you seeing these results with? Have you tried updating to the latest release and/or beta driver?

Inserting vkQueueWaitIdle() or vkDeviceWaitIdle() would require splitting into multiple command buffers. Not sure this is a useful test since as mentioned in first post, I can render the depth buffer in a later render pass in a “debug overlay” (see screenshot above). Even when having everything in a single command buffer.

As also mentioned I can sample the depth buffer OK from pixel shader in second render pass but what does not work is depth testing. texelFetch() is actually used in the soft particles shader. Here is screenshot where I have disabled depth testing, render a single large particle, and just write the z value I calculate by sampling the resolved depth buffer with texelFetch():

depth_test_disabled

Changing to displaying sample N would require me to change to a sampler2DMS. Not sure I want to do that change since not sure test will be useful. Depth testing works in first render pass with MSAA and resolving depth buffer also seems to work (see previous screenshot and screenshot in first post).

I use VK_FORMAT_D32_SFLOAT. Changing to VK_FORMAT_D16_UNORM does not seem to change anything.

Problematic GPU is GeForce GTX 970. Using latest driver (530.41.03) on Linux.

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.