FrameBuffer attachments clarification

VulkanBuilder · November 7, 2018, 4:09pm

I use two swapchain images, two command buffer, two framebuffers etc.
I allow for up to two frames to be in flight at a time.

If I want to attach a depth buffer, do I need a unique one per framebuffer since in theory the previous one could still be in use while rendering the new one?

I found it hard to locate proper information or examples for this.

Most example code don’t allow multiple frames to be ‘inflight’ and the code I seen sometimes share the depth buffer, so its hard to know if its valid.

I assume once the above has been answered it will be equal for other attachments as well, so if I attach another color buffer it will also need to be duplicated if the depth buffer has to be.

Can someone please clarify the exact requirements here?

My initial gut feeling is that like everything else that might still be in use you would have to duplicate things.

VulkanBuilder · November 7, 2018, 4:29pm

Some research I already did:

VkQuake seems to be sharing the depth buffer between inflight frames and also other attachments.
vulkan-tutorial.com seems to be sharing the depth buffer.
misc other samples seems to be duplicating it.

If attachments can be shared without explicit synchronization then why is that the case?

Alfonse_Reinheart · November 7, 2018, 7:50pm

in theory the previous one could still be in use while rendering the new one?

And how exactly would that happen?

The reason you need multiple presentable images is because one (or more) of them is in the process of being presented. During that time (which could take a while), you still want the GPU to be doing stuff. Since it can’t “do stuff” to the image that’s being presented, it would have to “do stuff” to some other image.

But you’re not presenting your depth buffer, are you? None of the above applies.

The only overlap that could happen is if there were no synchronization between frames. That is, if you don’t prevent the rendering of frame 2 while frame 1 is still being rendered. However, you usually do have such synchronization; after all, you can’t present an image unless you synchronize the presentation command with the queue operation that renders to it. So if the pipeline has already stalled, adding an extra bit of synchronization isn’t going to hurt.

VulkanBuilder · November 8, 2018, 5:45am

As I said I am trying to understand this issue as tutorials are unclear and conflicting in how they handle it. Most actually completely ignore the issue by their choice of per frame synchronization.
So for a new person learning he API he might not actually remember all of the Vulkan spec details or know exactly how the synchronizations occur inside of Vulkan.

Take something as simple as were to put the fence while drawing:

The vulkan-tutorial.com at https://vulkan-tutorial.com/Drawing_a_triangle/Drawing/Rendering_and_presentation#page_Frames_in_flight
Puts it just before the new frame is started and right before vkAcquireNextImageKHR.

Others put the fence just before they start using the command buffer again, which seems to have the same effect as vulkan-tutorial. (https://developer.samsung.com/game/usage)

However, the lunar examples at https://vulkan.lunarg.com/doc/sdk/1.1.85.0/windows/tutorial/html/15-draw_cube.html
Puts it before vkQueuePresentKHR and it implies that you need to wait for the command buffer before doing a vkQueuePresentKHR.

To my current understanding all of these will work, but “lunars” will be less efficient and its reason for waiting at the chosen point is wrong.
Is that true? (if so its kind of bad that such a high-profile example is wrong).

Now back to my original question and the answer above

If I only have two, I might see your point for my given example.
But what about if I had three or more buffers and allowed for the same number to be ‘inflight’ (this code was meant to be scalable from 1-x frames in flight).
Frame 1 would be presented on screen, frame 2 renders and is queued via vkQueuePresentKHR (as far as I understand the queue does not necessarily present right away, it’s kind of in the name), frame 3 starts to render

I am really trying to understand which call prevents the depth buffer from frame 2 from being overwritten when frame 3 starts (given the queued present of 2 has not been executed yet).

To provide more context I am using code similar to: Rendering and presentation - Vulkan Tutorial
This should clear up exactly how the app code I use do synchronization (it could be doing it wrong, I am open to that)

I might have misunderstood something which is why I asked for clarification and multiple people do seem to be confused about it.

I also read The Most Common Vulkan Mistakes" and he mentions “Command queues run independently of each other.”.
So, given my synchronization scheme in the app and the above example.

The result of command buffer 1 is being presented, command buffer 2 has been filled and queued.
Command buffer 3 is being filled and then queued.

Since command buffer 2+3 run independently of each other what would prevent both of them from drawing to the depth buffer at the same time if sharing a depth buffer.

Vulkan tutorial also duplicates its UBO’s for the inflight frames which is what I would expect to be necessary as there are multiple queues possible rendering at the same time.
Again, this leads me back to the original question if a depth buffer can be shared safely why can’t the UBO’s.

Leading me to conclude I am missing something, hence the need for clarification.

VulkanBuilder · November 8, 2018, 5:56am

Doh I just realized he wrote “command queues” not “command buffers” in common Vulkan mistakes.
Next he writes “When submitted to queue A, command buffers execute in the specified order”.

So I guess that is the reason why a depth buffer can be shared, since command buffer 1 will be completed before command buffer 2 can start executing!
As long as its on the same command queue.

Which again would imply only ‘things’ shared with the CPU needs to be duplicated.

Alfonse_Reinheart · November 8, 2018, 6:50am

frame 2 renders and is queued via vkQueuePresentKHR (as far as I understand the queue does not necessarily present right away, it’s kind of in the name), frame 3 starts to render

I am really trying to understand which call prevents the depth buffer from frame 2 from being overwritten when frame 3 starts (given the queued present of 2 has not been executed yet).

As I said, you presumably used some kind of synchronization between frames 2 and 3. Barriers, events, semaphores, fences, an external dependency from the renderpass that renders to it, any of those would work.

Vulkan tutorial also duplicates its UBO’s for the inflight frames which is what I would expect to be necessary as there are multiple queues possible rendering at the same time.

That’s different. With uniform buffers, you’re providing new data to each frame of rendering. And this data must be preserved until the render process that uses that buffer is finished with it. It’s not about preventing Vulkan from writing to the buffer again; it’s about preventing you from writing to the buffer again (or more specifically, allowing you to continue preparing the next frames’ rendering without breaking the current one).

By contrast, the contents of the depth buffer are its own. Its contents are generated by on-GPU processes: the renderpass load/clear operation, subpasses that use it as a depth attachment, etc. Once you’re finished generating an image with the depth buffer, you don’t need it anymore. So that buffer can be immediately reused by another rendering process.

Therefore, as I said, the only potential synchronization issue is that you have to prevent the next frame’s rendering commands from starting until the current frame’s commands have finished with the depth buffer. And since you usually have plenty of other reasons for imposing synchronization between frames, that synchronization should be sufficient.

VulkanBuilder · November 8, 2018, 8:18am

[QUOTE=Alfonse Reinheart;43935]As I said, you presumably used some kind of synchronization between frames 2 and 3. Barriers, events, semaphores, fences, an external dependency from the renderpass that renders to it, any of those would work.

That’s different. With uniform buffers, you’re providing new data to each frame of rendering. And this data must be preserved until the render process that uses that buffer is finished with it. It’s not about preventing Vulkan from writing to the buffer again; it’s about preventing you from writing to the buffer again (or more specifically, allowing you to continue preparing the next frames’ rendering without breaking the current one).

By contrast, the contents of the depth buffer are its own. Its contents are generated by on-GPU processes: the renderpass load/clear operation, subpasses that use it as a depth attachment, etc. Once you’re finished generating an image with the depth buffer, you don’t need it anymore. So that buffer can be immediately reused by another rendering process.

Therefore, as I said, the only potential synchronization issue is that you have to prevent the next frame’s rendering commands from starting until the current frame’s commands have finished with the depth buffer. And since you usually have plenty of other reasons for imposing synchronization between frames, that synchronization should be sufficient.[/QUOTE]

Hmm… as I posted before your answer.

Uber-general Vulkan’s GPU-side command execution rules:

Command queues run independently of each other.
When submitted to queue A, command buffers execute in the specified order
From http://32ipi028l5q82yhj72224m8j-wpengine.netdna-ssl.com/wp-content/uploads/2016/05/Most-common-mistakes-in-Vulkan-apps.pdf

So since I use one queue and each command buffer submitted on a queue executes in the specified order.
Then command buffers can reuse attachments like depth buffers since any command buffer that shares it on the same queue will have finished executing.

Isn’t that the correct reason and not because of any in app synchronization?

VulkanBuilder · November 8, 2018, 9:11am

So for other interested parties it basically boils down to:

[LIST=2]
[li]Command buffers submitted to a single queue respect the submission order Vulkan® 1.0.245 - A Specification This is the primary reason why it works.
[li]The fence protection on the app side that prevents the app from reusing the command buffer before its been executed.[/li][/LIST]

Suslik · August 3, 2019, 8:20am

@VulkanBuilder

Honestly this is such a relief to found this thread. It seems like everyone is either completely ignoring this topic or people provide very well-founded and elaborate explanations of how it’s supposed to work, that completely contradict to each other.

It’s incredibly frustrating, to find clear information from the officials on how something as basic as depth buffer is expected to be used in case of multiple frames in flight.

I came to similar conclusions as you did: it’s ok to use the same gbuffer/depth attachments because frames are submitted in different command buffers and because of that they are executed sequentially. (UPD: this can work, but you have to provide a barrier yourself) All cpu-coherent resources, however (mapped buffers and images, command buffers) have to be duplicated for every frame in flight because every submitted frame has to have its own set of such resources.

Alfonse_Reinheart · August 3, 2019, 11:09pm

I find myself confused how either of you came to the conclusion that this is in any way true:

I mean, I said very clearly: “you presumably used some kind of synchronization between frames 2 and 3” as well as “the only potential synchronization issue is that you have to prevent the next frame’s rendering commands from starting until the current frame’s commands have finished with the depth buffer”

I don’t know how you came to the erroneous conclusion that GPU commands cannot execute out of order absent some form of synchronization, or that putting things in different command buffers counts as synchronization somehow. But it’s really important that you discard such notions.

I don’t know why you find “very well-founded and elaborate explanations of how it’s supposed to work, that completely contradict to each other,” but the conclusions you’ve come to based on the information presented in this thread are wrong. Those conclusions are not what I’ve said and they’re not what the Vulkan specification has said.

Command buffers respect submission order, but outside of certain specific guarantees (such as blending and fragment writing), commands execute in an undefined unless explicit ordering operations (barriers, subpasses, etc) are employed. And those explicit ordering operations are what respect submission order (ie: everything submitted before a barrier executes before anything submitted after a barrier).

The reason reusing a depth buffer works is because you almost certainly already have some form of synchronization between the two render passes that act on the depth buffer. You would not be able to reuse a depth buffer if it were used in two render passes that do not have said synchronization.

Suslik · August 4, 2019, 1:54am

Yes, I planned to update this post today as I did some more research. I still want to emphasize how little information is available on this subject and how hard it is to decipher what the spec is saying by its implicit synchronization guarantees section. Basically, I’ll just sum up the facts that I consider to be true up to date:

Various sources recommend using 1 set of rendertarget resources (depth buffer, gbuffer, etc) per frame in flight: Depth buffering - Vulkan Tutorial , GitHub - Novum/vkQuake: Vulkan Quake port based on QuakeSpasm , etc and they justify it by saying that two frames in flight queue up on the gpu and are never actually rendered at the same time.
Approximately the same number of resources online recommend using a separate set of resources per frame: LunarXchange API without Secrets: The Practical Approach to Vulkan* - Part 1 and they justify it by saying that multiple frames in flight can in fact be rendered at the same time and so they need a separate set of resources.
Vulkan spec states:

Command buffers submitted to different queues may execute in parallel or even out of order with respect to one another. Command buffers submitted to a single queue respect submission order

“commands … respect submission order” in this case means that “commands start in submission order” it does not imply that later commands start after previous commands end. Basically it means that this alone is not enough to provide safety for reusing single depth buffer across frames.

There is a way, however, to provide such safety by having a barrier at the start of a frame that makes sure that depth buffer is no longer in use. One way to do this is set up an execution barrier:

        srcAccessPattern.stage = vk::PipelineStageFlagBits::eBottomOfPipe;
        srcAccessPattern.accessMask = vk::AccessFlags();
        srcAccessPattern.layout = vk::ImageLayout::eUndefined;
        dstAccessPattern.stage = vk::PipelineStageFlagBits::eLateFragmentTests | vk::PipelineStageFlagBits::eEarlyFragmentTests;
        dstAccessPattern.accessMask = vk::AccessFlagBits::eDepthStencilAttachmentRead | vk::AccessFlagBits::eDepthStencilAttachmentWrite;
        dstAccessPattern.layout = vk::ImageLayout::eDepthStencilAttachmentOptimal;

Keep in mind that this is a barrier you probably want anyway to initialize depth buffer, the only difference from what you might have already is this line:

srcAccessPattern.stage = vk::PipelineStageFlagBits::eBottomOfPipe;

as you might have eTopOfPipe here. Basically eBottomOfPipe in this case makes sure that all draw calls that use this depthstencil attachment will finish executing.

It’s possible to specify a more precise barrier if you keep information about attachment usage from previous frames:

        srcAccessPattern.stage = vk::PipelineStageFlagBits::eLateFragmentTests;
        srcAccessPattern.accessMask = vk::AccessFlagBits::eDepthStencilAttachmentWrite;
        srcAccessPattern.layout = vk::ImageLayout::eDepthStencilAttachmentOptimal;

however, in my rendergraph implementation this information is not stored across frames, so I assume the worst case scenario that it can be used at any stage on previous frame, basically.

The takeaway is that both 1) and 2) schemes can be correct under certain conditions, but it’s really important to understand exactly when. Basically if you provide a light barrier such as i provided above, it’s ok to reuse a rendertarget from a previous frame. In real-life scenarios this is pretty much what happens in previous-gen GAPI’s such as DX11 when you’re using double-buffering as you never create a separate set of rendertargets for multiple frames in flight. Easiest way, however, to avoid access hazards is to just use a separate set of resources instead and use them in a round-robin fashion with a fence that would wait on CPU for the oldest frame to finish rendering before reusing its resources. Keep in mind, however, that this is not very practical if you have a heavy GBuffer (such as 4k resolution or MSAA or without aggressive compression)
If there are multiple frames in flight, CPU-accessible resources (mappable buffers, images, command buffers) have to always be copied for every frame in flight because they have to co-exist simultaneously (opposed to gbuffer resources that are rendered from scratch for every frame in flight).

Please correct me if I’m wrong somewhere or add additional information because, again, I think it’s incredibly hard and frustrating to find information like this.

Alfonse_Reinheart · August 5, 2019, 1:41pm

Saying it like this makes it seem like this “queue up” process happens automatically, and not because of the synchronization operations that happen between frames. It’s more accurate to remind the person that a bunch of stuff happens between those two frames, some of which will undoubtedly introduce an execution dependency between the two render passes.

I know you go into detail later, but I think it’s important to start with the fact that the synchronization is there, then explain the details later. That way, you forgo any questions about why you need the execution dependency if this “queue up” process handled it.

I disagree. The reason the depth buffer works is because you need a synchronization there for the color buffer. You can’t present until the color buffer is finished, and you don’t start the next frame until the present has been submitted (since that’s what defines the frame), and you don’t start rendering until you have successfully acquired a presentable image. That’s two separate execution dependencies, which is one more than the depth buffer would need.

This is why it’s important to lead with synchronization, not write it off to some vague “queue up” process.

The problem with rules like this is that they can be extremely misleading. Vulkan is a complicated API because writing a rendering system is complicated. Trying to boil it down to ad-hoc rules that say “do X in situation Y” will lead to problems like this.

The thing I take exception to here is the notion that, because something is “mappable” or “CPU-accessible”, that means it must be “copied for every frame in flight”. They only need to be “copied” if you need to modify them while they are being used. That is, if you’re modifying them for frame X+1 while the GPU may be accessing them to render frame X.

If you’re doing streaming, that memory is GPU accessible, but you’re not going to try to stream into a block until the GPU isn’t using it anymore. So there’s no need for a “copy”.

And generally speaking, you wouldn’t “copy” them. You don’t “copy” a command buffer; you just make a new one. If you’re writing vertex data that changes on a per-frame basis, you’re not “copying” the old data at all. You’re generating new vertex data every frame, so you just need a different piece of memory to generate it into.