Validation error on adjacent VMA buffers

Hi,

I’m getting a synchronization validation error that I’d like to understand better and fix properly. I have 2 (vertex) buffers created with VMA, and checking the VmaAllocationInfo, they end up to be adjacent to each other into the same deviceMemory:

A) deviceMemory: 0x0000060000000006
offset: 1767168
size: 3072

B) deviceMemory: 0x0000060000000006
offset: 1770240
size: 3072

I fill buffer A (using a staging buffer and vkCmdCopyBuffer) and then use it for drawing without any issues. Then I’m trying to do exactly the same with buffer B, but when recording its vkCmdCopyBuffer, I get the following error:

vkCmdCopyBuffer(): WRITE_AFTER_READ hazard detected. vkCmdCopyBuffer writes to VkBuffer 0x19750000001975, which was previously read by vkCmdDraw.
No sufficient synchronization is present to ensure that a write (VK_ACCESS_2_TRANSFER_WRITE_BIT) at VK_PIPELINE_STAGE_2_COPY_BIT does not conflict with a prior read (VK_ACCESS_2_VERTEX_ATTRIBUTE_READ_BIT) at VK_PIPELINE_STAGE_2_VERTEX_ATTRIBUTE_INPUT_BIT.
Vulkan insight: an execution dependency is sufficient to prevent this hazard.

However, it’s the first time I’m touching buffer B and only buffer A was used for drawing. I have a basic tracking system for accesses and stages, so I do put a barrier before copying data to B (with dstStage = VK_PIPELINE_STAGE_2_COPY_BIT_KHR and dstAccess = VK_ACCESS_2_TRANSFER_WRITE_BIT_KHR), but since this is its first usage, the srcStage & srcAccess are both 0.

The offsets and sizes in the VkBufferMemoryBarrier seem to be correct for each buffer (offset = 0, size = 3072 or VK_WHOLE_SIZE).

I understand the validation error, but is this to be expected? I don’t understand why I should put an execution dependency to a buffer I’ve never used before, even though I kinda get that buffer A’s memory “state” is affecting B somehow, but I don’t get why. Is there some fact that I ignore or I have probably messed something somewhere else?

Please also note that if I use the VMA_ALLOCATION_CREATE_DEDICATED_MEMORY_BIT in the VmaAllocationCreateFlags, then no validation error occurs. I guess because with this the 2 buffers use a different deviceMemory, but I wouldn’t like to use this as my solution because it’s probably not optimal.

Thanks in advance for any help & insights!

I’m providing some more info that I gathered, in case I get lucky and they ring a bell to someone. I now suspect the stride of the vertices that I’m trying to draw.

For these buffers, VkUsage is
VK_BUFFER_USAGE_VERTEX_BUFFER_BIT | VK_BUFFER_USAGE_STORAGE_BUFFER_BIT | VK_BUFFER_USAGE_TRANSFER_DST_BIT | VK_BUFFER_USAGE_TRANSFER_SRC_BIT | VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT | VK_BUFFER_USAGE_ACCELERATION_STRUCTURE_BUILD_INPUT_READ_ONLY_BIT_KHR

VmaMemoryUsage is just VMA_MEMORY_USAGE_AUTO_PREFER_DEVICE and VmaAllocationCreateFlags is just VMA_ALLOCATION_CREATE_USER_DATA_COPY_STRING_BIT (probably irrelevant).

Calling vkGetBufferMemoryRequirements shows that the alignment is 256 bytes, and I don’t see anything wrong with that. I tried vmaCreateBufferWithAlignment with different alignment values to test:
512 & 1024 : Still getting the validation error.
2048 & 4096 : No error.

A few words about my data: These buffers have 251 vertices with only position data, so my actual data size is 251 * 3 * 4 = 3012 bytes for each, and due to the 256 bytes alignment the total buffer size that VMA allocates is 3072 (256 * 12). My intention is to draw 1 vertex every 25, so there are 11 vertices that can be drawn in that buffer. Their indices in the buffer are 0, 25, 50, 75, 100, 125, 150, 175, 200, 225, 250 (last vertex in the buffer).

I noticed then, that the validation error occurs only when I use the calculated stride to draw these 11 vertices That stride is 300 bytes (25 * 12), compared to the default of 12 bytes. If I draw the whole buffer with the default stride, then no validation error occurs.

I don’t know how to debug further, yet. To be honest, I feel very skeptical about the last vertex. This is the last piece of data in the buffer and there is no place for the “stride” of the last vertex after it. Because 11 vertices * 300 bytes stride each is a perceived total of 3300 bytes, which exceeds the buffer size of 3072 and could be indeed touching the memory of the next buffer which is adjacent (right after the 3072 bytes that the first occupies). But I don’t know, why try to transition 3300 bytes? This thought sounds a bit naive. Unless there is somewhere a spec definition/assumption that every piece of data must have the appropriate stride after it and explains it precisely, but I can’t find it.

Hmm, this sounds a little like it could be a bug/imprecision in the validation layer, where it treats the full range of number of vertices x stride as used, while in reality it is (number of vertices - 1) x stride + size of vertex.

It is indeed a bug in the validation layer, as investigated in Synchronization validation error on adjacent (VMA) buffers when drawing with stride · Issue #11093 · KhronosGroup/Vulkan-ValidationLayers · GitHub