Synchronize a vkCmdBlitImage before the render pass

#1

Hello,

I’m building a little game (using a homemade engine, using LWJGL), which :

  1. prepare an image with some compute shaders
  2. blit it into a swapchain image
  3. execute a render pass to draw the UI

1 is executed separately, with a compute queue. 2 and 3 in the same execute, by a graphic queue.

The image is perfectly blit. Unfortunately, the UI is not draw every frames, and appears to blink. Generally, it’s barely/not visible, but on my linux laptop (intel gpu), it’s annoying.
I fought with this problem for a while, try many synchronizations, with no success. No validation errors.

I tried to isolate the problem, if I just draw the UI: no problems.
I added a fence between the computation and graphic execution, no changes.

My feeling is that the blit is sometime made after/during the draw of the UI. I tried to add some barrier after the blit, and wire the subpass dependencies accordingly, but I still miss something.

My question is: what is the proper way to synchronize a blit into a swapchain, followed by a render pass ?

But maybe I miss something else. The code is abstract, but I’ll try to sum up what could be (maybe?) relevant:

for (int i = 0; i < commandBuffers.size(); i++)
{
    RenderCommandBuffer commandBuffer = commandBuffers.get(i);
    ImageView imageView = configuration.imageViewManager.getImageViews().get(i);

    commandBuffer.startCommand();

    blitToSwapchainImage(commandBuffer, imageView);

    commandBuffer.startRenderPass();

    ui.drawFrame(commandBuffer.getVkCommandBuffer());

    commandBuffer.endRenderPass();
    commandBuffer.endCommand();
}

The blitToSwapchainImage method:
srcImage is the image built by the compute shaders.
dstImage is the target swapchain image.

[CODE
// Prepare transfer from Image to Frambuffer
ImageBarrier barrier = new ImageBarrier(VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT);

// From VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL to VK_ACCESS_TRANSFER_READ_BIT
barrier.addImageBarrier(srcImage, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL, VK_ACCESS_TRANSFER_READ_BIT);

// From UNDEFINED to VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL
barrier.addImageBarrier(dstImage, dstImageFormat, 1, VK_IMAGE_LAYOUT_UNDEFINED, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, 0, VK_ACCESS_TRANSFER_WRITE_BIT);

barrier.execute(commandBuffer.getVkCommandBuffer());

VkImageBlit.Buffer region = VkImageBlit.calloc(1);
region.srcSubresource().aspectMask(VK_IMAGE_ASPECT_COLOR_BIT);
region.srcSubresource().mipLevel(0);
region.srcSubresource().baseArrayLayer(0);
region.srcSubresource().layerCount(1);
region.srcOffsets(0).x(0);
region.srcOffsets(0).y(0);
region.srcOffsets(0).z(0);
region.srcOffsets(1).x(srcImage.getWidth());
region.srcOffsets(1).y(srcImage.getHeight());
region.srcOffsets(1).z(1);
region.dstSubresource().aspectMask(VK_IMAGE_ASPECT_COLOR_BIT);
region.dstSubresource().mipLevel(0);
region.dstSubresource().baseArrayLayer(0);
region.dstSubresource().layerCount(1);
region.dstOffsets(0).x(0);
region.dstOffsets(0).y(0);
region.dstOffsets(0).z(0);
region.dstOffsets(1).x(extent.getWidth());
region.dstOffsets(1).y(extent.getHeight());
region.dstOffsets(1).z(1);

vkCmdBlitImage(commandBuffer.getVkCommandBuffer(), srcImage,
VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL, dstImage,
VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, region, VK_FILTER_NEAREST);

// Change layout again before render pass.
ImageBarrier barrierEnd = new ImageBarrier(VK_PIPELINE_STAGE_TRANSFER_BIT, VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT);

// From VK_IMAGE_LAYOUT_GENERAL to VK_ACCESS_SHADER_WRITE_BIT
barrierEnd.addImageBarrier(srcImage, VK_IMAGE_LAYOUT_GENERAL, VK_ACCESS_SHADER_WRITE_BIT);

// From VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL to VK_IMAGE_COLOR_ATTACHMENT_OPTIMAL
barrierEnd.addImageBarrier(dstImage, dstImageFormat, 1, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, VK_IMAGE_COLOR_ATTACHMENT_OPTIMAL,
VK_ACCESS_TRANSFER_WRITE_BIT, VK_ACCESS_SHADER_WRITE_BIT);

barrierEnd.execute(commandBuffer.getVkCommandBuffer());




[b]The creation of the render pass:[/b]

VkAttachmentDescription colorAttachment = VkAttachmentDescription.calloc();
colorAttachment.format(context.swapChainManager.getColorDomain().getColorFormat());
colorAttachment.samples(VK_SAMPLE_COUNT_1_BIT);
colorAttachment.loadOp(VK_ATTACHMENT_LOAD_OP_LOAD);
colorAttachment.storeOp(VK_ATTACHMENT_STORE_OP_STORE);
colorAttachment.stencilLoadOp(VK_ATTACHMENT_LOAD_OP_DONT_CARE);
colorAttachment.stencilStoreOp(VK_ATTACHMENT_STORE_OP_DONT_CARE);
colorAttachment.initialLayout(VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL);
colorAttachment.finalLayout(VK_IMAGE_LAYOUT_PRESENT_SRC_KHR);

VkAttachmentReference.Buffer colorAttachmentRef = VkAttachmentReference.calloc(1);
colorAttachmentRef.attachment(0);
colorAttachmentRef.layout(VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL);

VkSubpassDescription.Buffer subpass = VkSubpassDescription.calloc(1);
subpass.pipelineBindPoint(VK_PIPELINE_BIND_POINT_GRAPHICS);
subpass.colorAttachmentCount(1);
subpass.pColorAttachments(colorAttachmentRef);

VkSubpassDependency.Buffer dependency = VkSubpassDependency.calloc(1);
dependency.srcSubpass(VK_SUBPASS_EXTERNAL);
dependency.dstSubpass(0);
dependency.srcStageMask(VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT);
dependency.dstStageMask(VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT);
dependency.srcAccessMask(VK_SHADER_WRITE_BIT);
dependency.dstAccessMask(VK_SHADER_WRITE_BIT);

int attachmentCount = 1;
VkAttachmentDescription.Buffer attachments = VkAttachmentDescription.calloc(attachmentCount);
attachments.put(colorAttachment);
attachments.flip();

VkRenderPassCreateInfo renderPassInfo = VkRenderPassCreateInfo.calloc();
renderPassInfo.sType(VK_STRUCTURE_TYPE_RENDER_PASS_CREATE_INFO);
renderPassInfo.pAttachments(attachments);
renderPassInfo.pSubpasses(subpass);
renderPassInfo.pDependencies(dependency);

#2

So, I am not entirely sure how your ImageBarrier works. It takes different types of parameters each time it is called. E.g. if I take your barrier.addImageBarrier(srcImage, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL, VK_ACCESS_TRANSFER_READ_BIT);, how does that translate to srcLayout, dstLayout, srcAccessMask, and dstAccessMask?

Your last barrier seems to have wrong dst layout. You use VK_IMAGE_COLOR_ATTACHMENT_OPTIMAL, which is not a thing (and should not compile). And if you mean VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL, then the dst stage and access flag does not match this layout…

Otherwisely VK_ACCESS_SHADER_WRITE_BIT looks odd to me. Are you writing the image as Color Attachment, or as a Storage Image?

#3

Hello krOoze, thank you for your answer.

[QUOTE=krOoze;44134]So, I am not entirely sure how your ImageBarrier works. It takes different types of parameters each time it is called. E.g. if I take your barrier.addImageBarrier(srcImage, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL, VK_ACCESS_TRANSFER_READ_BIT);, how does that translate to srcLayout, dstLayout, srcAccessMask, and dstAccessMask?
[/QUOTE]

It’s an old code, but I find it more readable (maybe?). The arguments are in this order :
(srcLayout, dstLayout, srcAccessMask, dstAccessMask).
If no accessMask, it means 0.

My bad, I missed the copy/paste (again, the real code is too abstract). I will edit my post.

[QUOTE=krOoze;44134]
Otherwisely VK_ACCESS_SHADER_WRITE_BIT looks odd to me. Are you writing the image as Color Attachment, or as a Storage Image?[/QUOTE]

I try VK_ACCESS_SHADER_WRITE_BIT, because the swapchain Image will be used during the stage VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT (to draw the UI). I tried to pick something according this table, but maybe it’s not good ?

#4

Ok, I cannot edit the first post.

Basically, after the blit, I’m making a barrier that move the swapchain image
From:

  • stage: VK_PIPELINE_STAGE_TRANSFER_BIT
  • layout: VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL
  • access: VK_ACCESS_TRANSFER_WRITE_BIT
    To :
  • stage: VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT
  • layout: VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL
  • access: VK_ACCESS_SHADER_WRITE_BIT

I also tried to remove this barrier, and use the subpass dependency. But result is the same.

#5

The arguments are in this order :
(srcLayout, dstLayout, srcAccessMask, dstAccessMask).
If no accessMask, it means 0.

Well that I get. The problem is the above call does not match this signature. Only two params. The layout argument given is assumably the dstLayout, but then what is the srcLayout? Cannot be UNDEFINED because the srcImage contains the data created by compute shader that we want…
So, no accessmask means 0, but that also does not make sense, unless there was a Semaphore between this barrier and the previous compute output.

I try VK_ACCESS_SHADER_WRITE_BIT, because the swapchain Image will be used during the stage VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT (to draw the UI). I tried to pick something according this table, but maybe it’s not good ?

It is good (resp. valid), assuming you know what you are doing.

I need you to clarify how exactly are you using the dstImage. Is dstImage the swapchain image, or some temporary?
How exactly are you using the dstImage? As a Color Attachment, Input Attachment, or a Storage Image? You should be able to answer that; each use looks differently in the API, as well as in the shader.
Per the flags you use, it would seem as a Storage Image, but that seems suspicious to me and I would not expect that…

#6
  • stage: VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT
  • layout: VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL
  • access: VK_ACCESS_SHADER_WRITE_BIT

OK, that’s nonsense.
You probably want VK_ACCESS_COLOR_ATTACHMENT_READ_BIT and VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT, which should match your dstLayout == VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL here as well as your loadOp == VK_ATTACHMENT_LOAD_OP_LOAD.

In that case your VkSubpassDependency does not make sense either. You are doing all the work in the pipeline barrier, so this dependency should not be needed (and the stage flags seems wrong again). Or otherwisely, you can skip your barrier, and instead convert it to the render pass dependency…

That is assuming the dstImage is the Color Attachment on the subsequent Render Pass (and not a Storage Image).

#7

Yes, my first post is definitely not clear/accurate.
srcImage is the image produced by the compute shaders, in a previous execution.
trgImage is directly a swapchain image, that will be use after the blit to make some draw (bind pipeline, and vkCmdDrawIndexed).

Before the blit, I make two transitions :

  1. srcImage :
    From:
    VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT, VK_IMAGE_LAYOUT_GENERAL, VK_ACCESS_TRANSFER_WRITE_BIT
    To:
    VK_PIPELINE_STAGE_TRANSFER_BIT, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL, VK_ACCESS_TRANSFER_READ_BIT

  2. trgImage (swapchain image)
    From:
    VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT, VK_IMAGE_LAYOUT_UNDEFINED, 0
    To:
    VK_PIPELINE_STAGE_TRANSFER_BIT, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, VK_ACCESS_TRANSFER_WRITE_BIT

Then BLIT.

Then, two new barrier, to prepare back the srcImage for a new frame, latter (srcImage will not be used anymore in the current frame).
And another barrier to prepare the swapchain image to the vkCmdDrawIndexed:
From:
VK_PIPELINE_STAGE_TRANSFER_BIT, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, VK_ACCESS_TRANSFER_WRITE_BIT
To :
VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT, VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL, VK_ACCESS_SHADER_WRITE_BIT

After that, I start the draw of the UI. Something like :

vkCmdBeginRenderPass
vkCmdBindPipeline
vkCmdSetViewport
vkCmdPushConstants
vkCmdSetScissor
vkCmdDrawIndexed
vkCmdEndRenderPass

I need you to clarify how exactly are you using the dstImage. Is dstImage the swapchain image, or some temporary?
How exactly are you using the dstImage? As a Color Attachment, Input Attachment, or a Storage Image? You should be able to answer that; each use looks differently in the API, as well as in the shader.
Per the flags you use, it would seem as a Storage Image, but that seems suspicious to me and I would not expect that…

dstImage is the current swapchain image.
In the subpass, I declare it as Color Attachment

#8

[QUOTE=krOoze;44138]OK, that’s nonsense.
You probably want VK_ACCESS_COLOR_ATTACHMENT_READ_BIT and VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT, which should match your dstLayout == VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL here as well as your loadOp == VK_ATTACHMENT_LOAD_OP_LOAD.[/QUOTE]

Ok, I just tried, I replaced the barrier of the SwapchainImage after the blit by:
From:
VK_PIPELINE_STAGE_TRANSFER_BIT, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, VK_ACCESS_TRANSFER_WRITE_BIT
To :
VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT, VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL, VK_ACCESS_COLOR_ATTACHMENT_READ_BIT

Unfortunately, UI is still blinking :(. I tried to remove/change the subpass dependency, but it’s the same.

[QUOTE=krOoze;44138]
In that case your VkSubpassDependency does not make sense either. You are doing all the work in the pipeline barrier, so this dependency should not be needed (and the stage flags seems wrong again). Or otherwisely, you can skip your barrier, and instead convert it to the render pass dependency…[/QUOTE]

Yes, this is what I understood, and I tried both : using Barrier, or using subpass dependency. For now I stick a bit to Barriers because it’s slightly easier to implement/test, but I plan to change it to subpass dependency later.

Maybe I miss something else. I don’t know, maybe the presentation is made too early (and the ui draw is not finished) ? I just put one semaphore between the vkQueueSubmit and the vkQueuePresentKHR.

#9

Put a vkDeviceWaitIdle after the frame.

Actually put vkDeviceWaitIdle and HC barrier everywhere possible. You make a HC barrier with srcStage = dstStage = ALL_COMMANDS, with VkMemoryBarrier of srcAccessMask = dstAccessMask = VK_ACCESS_MEMORY_READ_BIT | VK_ACCESS_MEMORY_WRITE_BIT.
That should rule out synchronization errors.

#10

Should be VK_ACCESS_SHADER_WRITE_BIT instead.
This is weird. The layers should be able to catch this. Which SDK version do you have?

#11

This seems insufficient to prepare the srcImage for next frame. Either there must be a Semaphore between them, or the dst should also include VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT and VK_ACCESS_SHADER_WRITE_BIT.

Also how\where do you clear the srcImage?

#12

[QUOTE=krOoze;44141]Put a vkDeviceWaitIdle after the frame.

Actually put vkDeviceWaitIdle and HC barrier everywhere possible. You make a HC barrier with srcStage = dstStage = ALL_COMMANDS, with VkMemoryBarrier of srcAccessMask = dstAccessMask = VK_ACCESS_MEMORY_READ_BIT | VK_ACCESS_MEMORY_WRITE_BIT.
That should rule out synchronization errors.[/QUOTE]

I really tried to put synchronisations everywhere, waitIdle, Fences… But nothing change about the blink. If I make a bad configuration, the layers print the error.

[QUOTE=krOoze;44142]Should be VK_ACCESS_SHADER_WRITE_BIT instead.
This is weird. The layers should be able to catch this. Which SDK version do you have?[/QUOTE]

Bad CC again, yeah the layers catch that kind of error. Indeed, it’s VK_ACCESS_SHADER_WRITE_BIT.

I didn’t spoke about the srcImage, because I thought it could not be a problem, here the transition of srcImage I make after the blit:
From:
VK_PIPELINE_STAGE_TRANSFER_BIT, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL, VK_ACCESS_TRANSFER_READ_BIT
To:
VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT, VK_IMAGE_LAYOUT_GENERAL, 0

Ok, I sum up the full process :

1) Compute execution
Some compute shaders build srcImage. It's a bit complex, but the image is fully written at the end of this execution (and overwritten).

Between the two executions, I tried to put every Fence, WaitIdle, Barrier possible. Change nothing, the blink still appears.
But if I put a huge manual sleep, like 200ms, the blink disappear. (for info, the compute execution produce easily srcImage at 120fps on this laptop (however I use Fifo mode, 60 Fps)).

2) Graphic
Barriers to prepare srcImage and the swapchain image.
blit srcImage into swapchain image.
Barriers again, as we saw before.
(from here, srcImage is not used anymore)
start render
drawIndex for the UI.
end render

There is nothing shared between the compute execution and the graphic one, except the srcImage. I don’t see how the compute could impact the graphic but … maybe ?
The srcImage really looks good, and is rendered perfectly on the screen, so the blit works great at least.

The command buffers are recorded only one time for the test. The blink is random, and “seems” to depend on the load of the compute execution.

This architecture seems not too complicated. I don’t understand why the UI only is not draw sometime.

#13

Well, anything can be the problem. Mis-synchronization is undefined behavior too. That means it can pretend to work fine, or it can rip the fabric of space and time as we know it, or anything in between.

You are reusing the srcImage from previous frame. How are you making sure the previous frame does not still read the srcImage while next frame already wants to overwrite it? dst=VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT means no sync. That only makes sense if there is a Semaphore or Fence between the two frames; is there one?

HmMmMm, maybe make a VK_LAYER_LUNARG_api_dump of the first two frames. That should eliminate copy-paste errors and your abstractions.

#14

I tried to put Fence and WaitIdle. No change.

Haha, thank you, I didn’t knew API dump layer. Yeah, it will be more reliable… Thank you very much for your help :’).
The first dump I made was 10000 lines… So I removed as many stuff as I can (while keeping the bug alive) in my application (mainly a big piece of computation, and a part of the UI), the new dump is half size. Here the pastebin:
https://pastebin.com/0nwFZujj
By the way, I made a new release yesterday, if you are curious to see what we are talking about, here the link:


Again, thank you very much for your help :oops:…

#15

OK, your Semaphores seem wrong.

You take a signal-pending semaphore from vkAcquireNextImageKHR and on vkQueueSubmit you wait on it in pWaitDstStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT. But your first use of the swapchain image is the layout transition barrier to TRANSFER_DST_OPTIMAL, which has srcStage = COMPUTE.

You also take a signal-pending semaphore from the Compute cmdbuffer submission, but again wait on it in pWaitDstStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT, while first use is again a barrier from srcStage = COMPUTE.

And your compute seems mis-synchronized too. Last use of the compute image seems to be the barrier


TRANSFER => BOTTOM
TRANSFER_READ => 0
TRANSFER_SRC_OPTIMAL => GENERAL

And first use in the compute again a barrier:


TOP => COMPUTE
0 => SHADER_WRITE
UNDEFINED => GENERAL

And there seems to be nothing in between (no Semaphore waits).
Firstly one of the barriers is pointless, if you just then transition from UNDEFINED layout. Not to mention you transition to GENERAL twice.
Secondly there must either be a Semaphore wait, OR it cannot be srcStage=TOP_OF_PIPE.