Updating uniforms via staging?

Hello guys,
i recognized the second time using staging for UBO’s instead of just using an host-visible buffer. (First on vulkan-tutorial.com and then here: https://www.khronos.org/assets/uploads/developers/library/2016-vulkan-devu-seoul/1-Vulkan-Tutorial_English.pdf).

I want to know what are the advantage of these approaches? Is it more efficient to update the data first in an host-visible buffer and then copy it into an device-local buffer every frame where the gpu has fast access to? What is the proper way to deal with uniforms?

Thanks in Advance
SH :slight_smile:

I am not sure, but I think that it depends.
If you do not have a lot of values, i guess the better way should be to not to use uniform buffer but push constants.
If you have a lot of values, it is surely better to use an uniform buffer with a staging buffer. But with few values, (but too many for push constants), I guess it can be faster to do not transfer and just use the staging buffer.
By the way, you have to profile it and there is no real answer to this kind of question. It can as well depend on which hardware you have.

There is no “the proper way”. The whole point of using a low-level API is that you want to make these kinds of decisions.

One important thing to remember is this: there is no guarantee that the Vulkan implementation will allow you to use host-visible UBO memory (see vkGetBufferMemoryRequirements). So even if you want to use host-visible memory, you have to be ready to do staging if the hardware doesn’t allow you to.

I don’t think this is right (at least I really hope not, but happy to be corrected!).

Section 11.6 says " If buffer is a VkBuffer not created with the VK_BUFFER_CREATE_SPARSE_BINDING_BIT bit set, or if image is a VkImage that was created with a VK_IMAGE_TILING_LINEAR value in the tiling member of the VkImageCreateInfo structure passed to vkCreateImage, then the memoryTypeBits member always contains at least one bit set corresponding to a VkMemoryType with a propertyFlags that has both the VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT bit and the VK_MEMORY_PROPERTY_HOST_COHERENT_BIT bit set. In other words, mappable coherent memory can always be attached to these objects. "

So I think there’s guaranteed to be host visible memory type available for UBOs.

Personally, I wouldn’t expect that copying UBOs to device local memory would yield any gains unless you’re in a situation where they can be copied once and used many times. I would certainly use host visible for a first implementation, then investigate the copy as a possible optimization afterwards - I think doing it the other way around might be premature optimization.

Fair enough; I didn’t notice that bullet point.

According to the “Moving forward with Vulkan” presentation (around minute 18 ), uniform buffers are best updated using vkCmdUpdateBuffers because it does not require an additional CPU to GPU memory transfer.

Um, what “additional CPU to GPU memory transfer” are they talking about?

A (presumably) non-device local staging buffer would involve a copy from non-device local memory to (presumably) device local memory.

When using vkCmdUpdateBuffers, that will provoke a copy from user-memory to whatever memory a CB uses to store this kind of thing. Then it will copy from that memory into the actual buffer. That’s two copies, not one.

Even if we assume that a user cannot generate their data directly into the staging buffer (and therefore must copy from user-memory into the mapped staging buffer), that’s still 2 copies vs. 2 copies.

I would much rather have complete control of the process, rather than relying on the vagaries of the implementation (ie: the properties of the memory selected for vkCmdUpdateBuffers). That is, after all, what Vulkan is for. Plus, if there is no distinction between memory types, then I wouldn’t need a staging buffer at all, thus eliminating the copy. You can’t do that with vkCmdUpdateBuffers.

My understanding of the advantage is that in the case of vkCmdUpdateBuffer, the data is copied into the command buffer at the moment the command is called. It requires only one barrier to wait until the data reaches the actual buffer.
When using staging buffers the data might be written into the staging buffer directly but there is still an additional barrier required to wait for the host writes until the data can be copied to the internal memory.

there is still an additional barrier required to wait for the host writes until the data can be copied to the internal memory.

In accord with 6.8. Calling vkQueueSubmit “defines a memory dependency with prior host operations, and execution of command buffers submitted to the queue.” Also, it says:

This makes it abundantly clear that so long as you finish performing the writes on the host before you submit the batches, there is no need for an additional memory barrier.