Memory synchronization across dependency chain

vmilea · September 17, 2023, 12:50pm

I’m struggling to understand why the validation layer is reporting write-after-write hazards when synchronization involves dependency chains. A simple way to explain is with a transfer command and explicit image barriers. Start with an image in VK_IMAGE_LAYOUT_UNDEFINED. We want to transition to VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, transfer the contents from a buffer, then transition to VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL:

VkImageSubresourceRange imageSubresourceRange = {
    .aspectMask = VK_IMAGE_ASPECT_COLOR_BIT,
    .baseMipLevel = 0,
    .levelCount = 1,
    .baseArrayLayer = 0,
    .layerCount = 1,
};
VkImageMemoryBarrier preTransferImageMemoryBarrier = {
    .sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER,
    .srcAccessMask = VK_ACCESS_NONE,
    .dstAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT,
    .oldLayout = VK_IMAGE_LAYOUT_UNDEFINED,
    .newLayout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
    .srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED,
    .dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED,
    .image = image
    .subresourceRange = imageSubresourceRange,
};
// [1]
vkCmdPipelineBarrier(commandBuffer,
    VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT,
    VK_PIPELINE_STAGE_TRANSFER_BIT,
    0,
    0, nullptr,
    0, nullptr,
    1, &preTransferImageMemoryBarrier);

VkImageSubresourceLayers imageSubresource = {
    .aspectMask = VK_IMAGE_ASPECT_COLOR_BIT,
    .mipLevel = 0,
    .baseArrayLayer = 0,
    .layerCount = 1,
};
VkBufferImageCopy region = {
    .bufferOffset = 0,
    .bufferRowLength = 0,
    .bufferImageHeight = 0,
    .imageSubresource = imageSubresource,
    .imageOffset = {0, 0, 0},
    .imageExtent = {IMAGE_WIDTH, IMAGE_HEIGHT, 1},
};
// [2]
vkCmdCopyBufferToImage(commandBuffer, stagingBuffer, image, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, 1, &region);

VkImageMemoryBarrier postTransferImageMemoryBarrier = {
    .sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER,
    .srcAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT,
    .dstAccessMask = VK_ACCESS_NONE,
    .oldLayout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
    .newLayout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL,
    .srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED,
    .dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED,
    .image = image
    .subresourceRange = imageSubresourceRange,
};
// [3]
vkCmdPipelineBarrier(commandBuffer,
    VK_PIPELINE_STAGE_TRANSFER_BIT,
    VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT,
    0,
    0, nullptr,
    0, nullptr,
    1, &postTransferImageMemoryBarrier);

The code above behaves as expected:

[1] vkCmdPipelineBarrier() - Transition image from VK_IMAGE_LAYOUT_UNDEFINED to VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL (layout transition finishes and memory is made visible for VK_ACCESS_TRANSFER_WRITE before VK_PIPELINE_STAGE_TRANSFER).
[2] vkCmdCopyBufferToImage() - Copy the staging buffer to image.
[3] vkCmdPipelineBarrier() - Transition image from VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL to VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL (copying finishes and VK_ACCESS_TRANSFER_WRITE memory is made available before the layout transition).

Now, I assume that memory dependencies can be chained, provided there is an intersection between the 2nd scope of the first dependency and the 1st scope of the second dependency. So barrier [3] may be split into two, with VK_PIPELINE_STAGE_TRANSFER serving as intersection:

VkImageMemoryBarrier postTransferImageMemoryBarrierA = {
    .sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER,
    .srcAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT,
    .dstAccessMask = VK_ACCESS_NONE,
    .oldLayout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
    .newLayout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
    .srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED,
    .dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED,
    .image = image
    .subresourceRange = imageSubresourceRange,
};
// [3a]
vkCmdPipelineBarrier(commandBuffer,
    VK_PIPELINE_STAGE_TRANSFER_BIT,
    VK_PIPELINE_STAGE_TRANSFER_BIT,
    0,
    0, nullptr,
    0, nullptr,
    1, &postTransferImageMemoryBarrierA);

VkImageMemoryBarrier postTransferImageMemoryBarrierB = {
    .sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER,
    .srcAccessMask = VK_ACCESS_NONE,
    .dstAccessMask = VK_ACCESS_NONE,
    .oldLayout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
    .newLayout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL,
    .srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED,
    .dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED,
    .image = image
    .subresourceRange = imageSubresourceRange,
};
// [3b]
vkCmdPipelineBarrier(commandBuffer,
    VK_PIPELINE_STAGE_TRANSFER_BIT,
    VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT,
    0, /* dependencyFlags */
    0, nullptr,
    0, nullptr,
    1, &postTransferImageMemoryBarrierB);

Expected behavior:

[3a] vkCmdPipelineBarrier() - Wait for copy to finish (execution finishes and VK_ACCESS_TRANSFER_WRITE memory is made available before VK_PIPELINE_STAGE_TRANSFER).
[3b] vkCmdPipelineBarrier() - Transition image from VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL to VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL (should form a dependency chain through VK_PIPELINE_STAGE_TRANSFER, so copying has finished and memory is available before the layout transition).

But now we see:
// Validation Error : [SYNC - HAZARD - WRITE - AFTER - WRITE] Object 0 : handle = 0xcb3ee80000000007, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x5c0ec5d6 | vkCmdPipelineBarrier: Hazard WRITE_AFTER_WRITE for image barrier 0 VkImage 0xcb3ee80000000007[]. Access info (usage: SYNC_IMAGE_LAYOUT_TRANSITION, prior_usage: SYNC_COPY_TRANSFER_WRITE, write_barriers: 0, command: vkCmdCopyBufferToImage, seq_no: 2, reset_no: 1).

So it appears my reasoning is flawed – the final layout transition doesn’t wait for the transfer writes to be made available. Why is that?

Alfonse_Reinheart · September 17, 2023, 2:40pm

TL;DR: stop doing this:

This is highly dubious in your initial code and completely broken in your second.

So the overall process you say you’re doing is this: you want to copy some data into an image, then use that image as a render target. However, what your dependency says is that you’re going to copy some data into the image, change its layout, and then make that data visible to no subsequent operation, then use the image as a render target.

It may be that you have a load-op in your render pass that does not care about the content of the image (ie: the data you copied in that was not made visible to the load-op). You could be clearing the image, for example. In that case, visibility is unnecessary, as the data you so carefully transferred in is completely discarded by the load-op.

In short: you got lucky. Your bad destination access mask was rendered irrelevant by your load-op.

Your second example breaks your lucky sequence of operations. Your 3a barrier makes the transfer visible to nobody, but it also doesn’t change the image’s layout. 3b tries to change the image’s layout, but that requires having visibility of the data that was transferred in. Which it doesn’t.

See, when a layout transition is part of a barrier, it automatically has visibility of everything in the source access scope of that barrier. But it only has visibility of that stuff. If previous barriers don’t make something visibile to the current barrier, then the layout transition can’t see it either. And your NONE destination access mask does not make the copy visible to anybody. Therefore, the layout transition has no visibility of the writes from the transfer. But it needs visibility of those writes to properly transition the image’s layout.

Hence the WAW hazard.

vmilea · September 17, 2023, 4:31pm

This is highly dubious in your initial code and completely broken in your second.

Indeed, not making the memory visible would be dubious in a real-world program. For this toy example it doesn’t matter, there are no subsequent commands and the goal was simply to reason about the hazard. I’ve uploaded the whole program here to be clear.

It may be that you have a load-op in your render pass that does not care about the content of the image (ie: the data you copied in that was not made visible to the load-op). You could be clearing the image, for example. In that case, visibility is unnecessary, as the data you so carefully transferred in is completely discarded by the load-op.

This is a bit out of scope but still interesting. Wouldn’t visibility be formally necessary even for clear load-op? VK_ATTACHMENT_LOAD_OP_CLEAR uses access type VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT, which implies WAW hazard unless the transferred data was made available and visible.

Your 3a barrier makes the transfer visible to nobody, but it also doesn’t change the image’s layout. 3b tries to change the image’s layout, but that requires having visibility of the data that was transferred in. Which it doesn’t.

“Available” and “visible” memory have specific meaning in the spec. Barrier 3a makes the memory available after the transfer. That’s what a layout transition requires, and visibility is implicit. Here is the relevant quote from spec:

When a layout transition is specified in a memory dependency, it happens-after the availability operations in the memory dependency, and happens-before the visibility operations. Image layout transitions may perform read and write accesses on all memory bound to the image subresource range, so applications must ensure that all memory writes have been made available before a layout transition is executed. Available memory is automatically made visible to a layout transition, and writes performed by a layout transition are automatically made available.

Hence the WAW hazard.

It’s unclear what you meant because of the availability / visibility mixup. My naive interpretation is: pipelines 3a and 3b form a dependency chain. Because 3a makes the memory available, 3b shouldn’t need to repeat .srcAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT. Then again, the validation error suggests otherwise.

vmilea · September 18, 2023, 6:30am

I’ve tested a simplified version without image layout transitions. It writes to the same buffer twice:

vkCmdCopyBuffer(commandBuffer, stagingBuffer, deviceBuffer, 1, &region);

VkBufferMemoryBarrier bufferMemoryBarrierA = {
    .srcAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT,
    .dstAccessMask = VK_ACCESS_NONE,
    .buffer = deviceBuffer,
    ...
};
vkCmdPipelineBarrier(commandBuffer,
    VK_PIPELINE_STAGE_TRANSFER_BIT,
    VK_PIPELINE_STAGE_TRANSFER_BIT,
    0,
    0, nullptr,
    1, &bufferMemoryBarrierA,
    0, nullptr);

VkBufferMemoryBarrier bufferMemoryBarrierB = {
    .srcAccessMask = VK_ACCESS_NONE, // writes are already available
    .dstAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT,
    .buffer = deviceBuffer,
    ...
};
vkCmdPipelineBarrier(commandBuffer,
    VK_PIPELINE_STAGE_TRANSFER_BIT,
    VK_PIPELINE_STAGE_TRANSFER_BIT,
    0,
    0, nullptr,
    1, &bufferMemoryBarrierB,
    0, nullptr);

vkCmdCopyBuffer(commandBuffer, stagingBuffer, deviceBuffer, 1, &region);

The validation layer does not report WAW hazard in this case.

Like before, the first barrier makes writes available, while the second barrier makes them visible (in previous example, if the image served as color attachment, we would have set dstAccessMask = VK_ACCESS_COLOR_ATTACHMENT_READ_BIT | VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT and dstStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT). In terms of spec (7.1), there is an execution dependency chain between the copy commands. ScopedMemOps1 of barrier A includes the transfer writes, making them available. ScopedMemOps1 of B is empty, but it doesn’t matter since the data is already available. ScopedMemOps2 of B includes transfer writes, making them visible before the second copy command.

So why is there a hazard reported only for layer transitions? The spec clearly says:

When a layout transition is specified in a memory dependency, it happens-after the availability operations in the memory dependency, and happens-before the visibility operations.

That should prevent WAW hazard. It smells like a false positive, so I’ll open an issue in Vulkan-ValidationLayers.

system · March 19, 2024, 6:31am

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.