Vulkan Synchronization

Hello,

I implemented double buffering and I have a synchronization problem (flickering, rogue triangles) with buffers who are frequently updated (particles, grass, water waves…).
There are no validation layer errors.

  • Vertex buffers usage: VK_BUFFER_USAGE_VERTEX_BUFFER_BIT | VK_BUFFER_USAGE_TRANSFER_SRC_BIT | VK_BUFFER_USAGE_TRANSFER_DST_BIT
  • Index buffers usage: VK_BUFFER_USAGE_INDEX_BUFFER_BIT | VK_BUFFER_USAGE_TRANSFER_SRC_BIT | VK_BUFFER_USAGE_TRANSFER_DST_BIT
  • Memory property is VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT

The synchronization works correctly if the vkQueueWaitIdle is executed (see code).
I don’t see the point of adding barriers if a vkQueueWaitIdle must is done.
If I understand well the barrier prevents commands form accessing vertices/indices until they are available to the GPU.

All the buffers requiring a double buffering synchronization are placed in a list (lstCopies in the code).
This code is executed after vkBeginCommandBuffer and before vkCmdBeginRenderPass on cmdPrimary.

void vk_sync_buffers(){
	if(!lstCopies.n) return;

	//util_log(LL_MSG, "%d buffer copies", lstCopies.n);

	BUFCPY					*cp;
	VkMappedMemoryRange 	vkrange{};
	VkBufferCopy 			vkcopy{};
	VkBufferMemoryBarrier 	vkbarrier{};

	vkbarrier.sType = VK_STRUCTURE_TYPE_BUFFER_MEMORY_BARRIER;
	vkbarrier.srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
	vkbarrier.dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;

	vkrange.sType = VK_STRUCTURE_TYPE_MAPPED_MEMORY_RANGE;

    for(int i = 0; i < lstCopies.n; i++){
    	cp = lstCopies.atp(i);

		vkrange.size = cp->sze;
		vkrange.memory = cp->mem;
		vkFlushMappedMemoryRanges(vkdevice, 1, &vkrange);

		if(cp->vks != cp->vkd){
			vkcopy.size = cp->sze;
			vkCmdCopyBuffer(cmdPrimary, cp->vks, cp->vkd, 1, &vkcopy);
		}

		vkbarrier.buffer = cp->vkd;
		vkbarrier.size = cp->sze;

		vkbarrier.srcAccessMask = VK_ACCESS_2_NONE;

		vkbarrier.dstAccessMask = VK_ACCESS_MEMORY_READ_BIT;
		vkCmdPipelineBarrier(	cmdPrimary,
								VK_PIPELINE_STAGE_ALL_GRAPHICS_BIT, VK_PIPELINE_STAGE_ALL_COMMANDS_BIT,
								VK_DEPENDENCY_BY_REGION_BIT,
								0, NULL,
								1, &vkbarrier,
								0, NULL);
    }
    lstCopies.reset();
    
    //vkQueueWaitIdle(glGraphicsQueue);	//flickering without this wait

}

Didn’t you access the buffer in some way? So maybe you should specify how you did that?

Buffers are being fed via the vkMapMemory pointer, all the buffers are persistently mapped.
They are always being fed before the vk_sync_buffers() call during a frame.
You see, it’s like doing a memcpy to a mapped buffer.
But it is like copied newly copied data is not available at vkCmdDrawIndexed time.
It is not a problem of writing/rendering to wrong buffers, I know this for sure.

The order of operations is:

  • Game writes to the working buffers
  • vk_acquire is called
  • rendering is done
  • vk_flush() is called, (alternation of buffers)
//Acquire swap image
void vk_acquire(){

	if(currBufferingFrame->swAcquired) return;

	//Double buffering...
	if(DOUBLE_BUFFERING)
		VKERROR_LOGMSG("vkWaitForFences", vkWaitForFences(vkdevice, 1, &currBufferingFrame->fence, VK_TRUE, VULKAN_TIMEOUT))

	//Get the next output frame buffer
	if(vk_critical(vkAcquireNextImageKHR(vkdevice, glSwapChain, VULKAN_TIMEOUT, currBufferingFrame->semaPresent, NULL, &currBufferingFrame->vkimageindex)))
		return;

	currBufferingFrame->swAcquired = true;

	VKERROR_LOGMSG("vkResetFences", vkResetFences(vkdevice, 1, &currBufferingFrame->fence))
	VKERROR_LOGMSG("vkResetCommandBuffer", vkResetCommandBuffer(currBufferingFrame->cmdPrimary, 0))

	glCurrViewport.width = glCurrExtent.width;
	glCurrViewport.height = glCurrExtent.height;
	glCurrScissor.extent.width = glCurrExtent.width;
	glCurrScissor.extent.height = glCurrExtent.height;

	currBufferingFrame->frameBuffer = lstSwapchainFrameBuffers.at(currBufferingFrame->vkimageindex);
	currBufferingFrame->vkRenderPass.framebuffer = currBufferingFrame->frameBuffer;
	currBufferingFrame->vkRenderPass.renderArea.extent = glCurrExtent;

	vkBeginCommandBuffer(currBufferingFrame->cmdPrimary, &currBufferingFrame->vkBegin);

	//Command vkCmdCopyBuffer must be used outside of render passes and outside of a video coding scope, which is now.
	//Place the dynamic buffer barriers outside of the rendering pass, which is now.
	//Games update and batch before render_3D_begin or render_2D_begin is called, which is now.
	//The buffers requiring synchronization for the rendering of this frame are thus all listed.
	//2D rendering never requires synchronization because updates always happen in the current vbo/ibo IDX_BUFFER.
	currBufferingFrame->vk_sync_buffers();

	vkCmdBeginRenderPass(currBufferingFrame->cmdPrimary, &currBufferingFrame->vkRenderPass, VK_SUBPASS_CONTENTS_INLINE);
	vkCmdSetViewport(currBufferingFrame->cmdPrimary, 0, 1, &glCurrViewport);
	vkCmdSetScissor(currBufferingFrame->cmdPrimary, 0, 1, &glCurrScissor);
}

//Flush swap image to screen
void vk_flush(){
	if(!currBufferingFrame->swAcquired) return;
	currBufferingFrame->swAcquired = false;

	vkCmdEndRenderPass(currBufferingFrame->cmdPrimary);
	vkEndCommandBuffer(currBufferingFrame->cmdPrimary);

	VKERROR_LOGMSG("vkQueueSubmit", vkQueueSubmit(glGraphicsQueue, 1, &currBufferingFrame->vkSubmit, currBufferingFrame->fence))

	//No double buffering, frame buffering is meaningless when VSYNC is on
	if(!DOUBLE_BUFFERING)
		VKERROR_LOGMSG("vkWaitForFences", vkWaitForFences(vkdevice, 1, &currBufferingFrame->fence, VK_TRUE, VULKAN_TIMEOUT))

	vk_critical(vkQueuePresentKHR(glGraphicsQueue, &currBufferingFrame->vkPresent));

	//Provide a new render buffer, same buffer when VSYNC is on
	if(glCurrVsync == VK_PRESENT_MODE_FIFO_KHR)
		IDX_BUFFER = 0;
	else
		IDX_BUFFER++;
	IDX_BUFFER = IDX_BUFFER % MAX_BUFFERS;
	currBufferingFrame = bufferingFrames[IDX_BUFFER];
}

Overall, you’re missing the point. The source access mask is how you tell Vulkan how the memory was modified. You said “NONE”, which tells Vulkan that you didn’t access the memory. But you did access the memory.

You lied to Vulkan; undefined behavior is the result. Tell Vulkan how you accessed the memory, and you may get better behavior.

When I said “specify how you did that”, I meant “correctly inform Vulkan of how you did that”, not tell me how you did that.

1 Like

I think I’m missing many points.
This is giving me Read after Write errors.
Just what I try to avoid.

SYNC-HAZARD-READ-AFTER-WRITE(ERROR / SPEC): msgNum: -455515022 - Validation Error: [ SYNC-HAZARD-READ-AFTER-WRITE ] Object 0: handle = 0x56232d070430, type = VK_OBJECT_TYPE_QUEUE; | MessageID = 0xe4d96472 | vkQueueSubmit(): Hazard READ_AFTER_WRITE for entry 0, VkCommandBuffer 0x56232f12fca0, Recorded access info (recorded_usage: SYNC_COPY_TRANSFER_READ, command: vkCmdCopyBuffer, seq_no: 7, reset_no: 106). Access info (prior_usage: SYNC_COPY_TRANSFER_WRITE, write_barriers: SYNC_INDEX_INPUT_INDEX_READ, ).

void VK_FRAME::vk_sync_buffers(){
if(!lstCopies.n) return;

//util_log(LL_MSG, "%d buffer copies", lstCopies.n);

BUFCPY					*cp;
VkMappedMemoryRange 	vkrange{};
VkBufferCopy 			vkcopy{};
VkBufferMemoryBarrier 	vkbarrier{};

vkbarrier.sType = VK_STRUCTURE_TYPE_BUFFER_MEMORY_BARRIER;
vkbarrier.srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;
vkbarrier.dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED;

vkrange.sType = VK_STRUCTURE_TYPE_MAPPED_MEMORY_RANGE;

for(int i = 0; i < lstCopies.n; i++){
	cp = lstCopies.atp(i);

	//flush static memory only
	if(cp->vks == cp->vkd){
		vkrange.size = cp->sze;
		vkrange.memory = cp->mem;
		vkFlushMappedMemoryRanges(vkdevice, 1, &vkrange);
		continue;
	}

	vkbarrier.size = cp->sze;
	vkbarrier.srcAccessMask = VK_ACCESS_NONE;

	//source buffer barrier
	vkbarrier.buffer = cp->vks;
	if(cp->swIBO)	vkbarrier.dstAccessMask = VK_ACCESS_INDEX_READ_BIT;
	else			vkbarrier.dstAccessMask = VK_ACCESS_VERTEX_ATTRIBUTE_READ_BIT;
	vkCmdPipelineBarrier(	cmdPrimary,
							VK_PIPELINE_STAGE_VERTEX_INPUT_BIT, VK_PIPELINE_STAGE_ALL_COMMANDS_BIT,
							VK_DEPENDENCY_BY_REGION_BIT,
							0, NULL,
							1, &vkbarrier,
							0, NULL);

	//destination buffer barrier
	vkbarrier.buffer = cp->vkd;
	vkbarrier.dstAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT;
	vkCmdPipelineBarrier(	cmdPrimary,
							VK_PIPELINE_STAGE_VERTEX_INPUT_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT,
							VK_DEPENDENCY_BY_REGION_BIT,
							0, NULL,
							1, &vkbarrier,
							0, NULL);

	//copy source to destination
	vkcopy.size = cp->sze;
	vkCmdCopyBuffer(cmdPrimary, cp->vks, cp->vkd, 1, &vkcopy);

	//destination buffer barrier
	vkbarrier.srcAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT;
	if(cp->swIBO)	vkbarrier.dstAccessMask = VK_ACCESS_INDEX_READ_BIT;
	else			vkbarrier.dstAccessMask = VK_ACCESS_VERTEX_ATTRIBUTE_READ_BIT;
	vkCmdPipelineBarrier(	cmdPrimary,
							VK_PIPELINE_STAGE_TRANSFER_BIT, VK_PIPELINE_STAGE_ALL_COMMANDS_BIT,
							VK_DEPENDENCY_BY_REGION_BIT,
							0, NULL,
							1, &vkbarrier,
							0, NULL);
}
lstCopies.reset();
//vkQueueWaitIdle(glGraphicsQueue);	//flickering without this wait

}

I don’t know what that code is trying to do, but it makes even less sense now. You have 3 barrier calls, and every single one of them has the same problem you had in the first piece of code: the source access mask being NONE.

“None” is not how you accessed the data you want to expose to the destination access. There is an actual operation which accessed that data before the barrier. You need to specify what that operation actually is.

//destination buffer barrier
vkbarrier.srcAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT;

It is trying to copy a source buffer to a destination buffer and avoid vkQueueWaitIdle, that is all.

(none is also used to copy images in, I think, official tutorials)

OK, then what are the other two barriers doing?

Barriers exist between two things: source operations and destination operations. Those first two barriers say that the source operation is “NONE”. So… what are they trying to do?

They are setting the source buffer to read only.
The source and destination buffers may not be altered any more after the copy during the current frame rendering.

source buffer => read only
destination buffer => write

copy source to destination

destination buffer = > read only

Setting the none to VK_ACCESS_INDEX_READ_BIT | VK_ACCESS_VERTEX_ATTRIBUTE_READ_BIT is not working.
Using a single VkMemoryBarrier instead of VkBufferMemoryBarrier barriers is also not working.
Perhaps the problem is somewhere else.

Anyway, this topic can be closed or removed.
Thank you.

… that’s not a thing. Images can be placed into read-only layouts by memory barriers, but buffers don’t have a “read only” mode that they get set into.

You seem to have a pretty fundamental misunderstanding of what a memory barrier does.

A memory barrier is placed between two operations. The operation before the barrier does something to some memory, which you want to make available to the operation after the barrier. The barrier needs to be there between those operations to make the memory visible and available to the operation after the barrier.

They don’t “set” something into buffers.

The problem was that I didn’t use a ‘staging’ buffer to send the data to the GPU.
Writing the data to a VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT buffer and vkCmdCopyBuffer this buffer to a VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT buffer solved the problems.
This was not apparent in the explanation and code I posted in this thread.