Why I am getting this validator message ? Memory buffer barrier

Hi,

the situation :

  • a compute shader writing stuff in an SBO
  • a vertex shader reading it and rendering

Case 1 :

  • compute shader and vertex shader ON THE SAME QUEUE ( graphic queue )
  • two separate CMD buffers, one for the “compute” one for the “draw”
  • No memory barriers
  • SBO pre-initialized with some contents
  • draw CMD executed before the compute CMD
  • compute CMD waits for a “graphic semaphore” to know “it’s done using my sbo” and signals a “compute semaphore” to tell “i am done” at vkQueueSubmit()
  • compute semaphore “pre-signalled” at start ( to be in a “signalled state” )
  • draw CMD waits the “compute semaphore” to know “It’s done computing” and signals a “graphic semaphore” to tell “I am done” at VkQueubmit

( well I omitted the ‘draw’ also waits for the usual “image acquired” semaphore of the swapchain and such )

In Case 1, everything seems to go without a problem.

Case 2 - ALL like in Case 1 , except

  • Compute shader is on a specialized “compute queue” ( index ‘2’ )
  • Draw is on the “graphic queue” ( index 1 )

The “compute” cmd buffer is recorded as such :

VkCommandBufferBeginInfo cmd_buf_info = {};

cmd_buf_info.sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO;
cmd_buf_info.pNext = NULL;
cmd_buf_info.flags = VK_COMMAND_BUFFER_USAGE_SIMULTANEOUS_USE_BIT;
cmd_buf_info.pInheritanceInfo = NULL;

	PRINTF("Recording COMPUTE command %d\n", i);

	result = vkBeginCommandBuffer(DeviceParams.compute_cmd[i][what_group], &cmd_buf_info);
	assert(result == VK_SUCCESS);

				VkBufferMemoryBarrier acquire_buffer_barrier =
				{
					VK_STRUCTURE_TYPE_BUFFER_MEMORY_BARRIER,
					nullptr,
					0,  // SRC
					VK_ACCESS_SHADER_WRITE_BIT, // DST access mask
					DeviceParams.queueFamilyIndex,
					DeviceParams.compute_queueFamilyIndex,
					Scene->SB1.buffer.buffer,
					0,
					Scene->SB1.total_size
				};

				vkCmdPipelineBarrier(
					cmd,
					VK_PIPELINE_STAGE_VERTEX_INPUT_BIT,		//SRC
					VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT,   // DST
					0,
					0, nullptr,
					1, &acquire_buffer_barrier,
					0, nullptr);

			vkCmdBindPipeline(cmd, VK_PIPELINE_BIND_POINT_COMPUTE, Scene->pipeline[view]);

			// attention here we could have 1 or 2 descriptor sets, depending on the scene

			int desc_num = (Scene->ls_layout_flags & LS_LAYOUT_CS_SET10_TEXTURE) ? 2 : 1;

			vkCmdBindDescriptorSets(cmd, VK_PIPELINE_BIND_POINT_COMPUTE, Scene->pipeline_layout[view], 0, desc_num, &Scene->descriptor_set[cb][view][0], 0, NULL);

			vkCmdDispatch(cmd, Scene->comp_dispatch_x, Scene->comp_dispatch_y, Scene->comp_dispatch_z);

				VkBufferMemoryBarrier release_buffer_barrier =
				{
					VK_STRUCTURE_TYPE_BUFFER_MEMORY_BARRIER,
					nullptr,
					VK_ACCESS_SHADER_WRITE_BIT,
					0,
					DeviceParams.compute_queueFamilyIndex,
					DeviceParams.queueFamilyIndex,
					Scene->SB1.buffer.buffer,
					0,
					Scene->SB1.total_size
				};

				vkCmdPipelineBarrier(
					cmd,
					VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT,
					VK_PIPELINE_STAGE_VERTEX_INPUT_BIT,
					0,
					0, nullptr,
					1, &release_buffer_barrier,
					0, nullptr);
result = vkEndCommandBuffer(DeviceParams.compute_cmd[i][what_group]);
assert(result == VK_SUCCESS);

However when this is run this happens :

ERROR: (COMMAND_BUFFER 0x25268ab0d58) [Validation] [ UNASSIGNED-VkBufferMemoryBarrier-buffer-00004 ] Object: 0x25268ab0d58 (Type = 6) | vkQueueSubmit(): in submitted command buffer VkBufferMemoryBarrier acquiring ownership of VkBuffer (0x183), from srcQueueFamilyIndex 2 to dstQueueFamilyIndex 0 has no matching release barrier queued for execution.

WHY ?

“In my reasoning” the second mem barrier I put the one called “Release Buffer Barrier” should do precisely that .

What I am mis-reasoning here ? Why do I get that warning ? So what is “the matching release barrier” for the one I marked as “acquire” ??

I really cannot make “a logic sense” out of this.

Thanks in advance.

This is called Queue Family Ownership Transfer. You always need two duplicate barriers (one is called “release” and the other “acquire”). Each queue family needs to know about the transfer individually.

Seems you forgot to put the release barrier into the graphics queue before acquiring, or you forgot to separate the two barriers with a semaphore.

Hi,
wait wait … you mean I need to put “a piece of the barrier” in one queue AND one in the other queue ?

Because in fact I was wondering, as it is, there’s a barrier ONLY In the “compute buffer CMD/Queue” and nothing in the “draw buffer CMD/queue” …

So “the validator” actually “looks” inside both queues to throw that error message ?

So I suppose “to make it right” the draw queue/cmd should have as well a barrier where the VS “waits for the compute to be done” and “tells the compute it’s done using it” ?

Cheers.

Yea, the both queues have to have the barrier. You are basically informing both queues the transfer is happening, otherwisely the two basically define a single barrier (e.g. you can get only one layout transition out of the two). The parameters to both barriers should conventionally be identical (though driver should ignore the parameters that are useless for the given barrier).

Yea, validator tracks these things. It is not particularly complex here. If a given resource was never ever had release barrier used on it, then it is safe to assume an acquire barrier is an invalid thing to do.

Barriers to not work across queues, and queues are asynchronous. You need to additionally make sure the release barrier happens-before the acquire barrier via a Semaphore or a Fence.

To make it right, you add the duplicate barrier in the graphics queue, and you add a semaphore if there is not one already. Or you could create the resource VK_SHARING_MODE_CONCURRENT, and then you only need the semaphore.

Hi,
“this story of the semaphore escapes me a little”, in fact, ( took from an example that I admit I haven’t fully understood ) , there’s a semaphore in the compute queue “that is signalled when the compute cmd is done” and one in the “draw queue” that is signalled when the rendering is done.

Now ok that the compute queue/etc. could contain multiple compute shader running maybe using the same SBO/etc. and so needing more barriers/etc.

But “if there’s already the compute semaphore the draw waits on” and the “draw sempahore the compute waits on” … kinda makes you ask “so why do you need the barriers too ?”

Ok if I understand correctly “the semaphore” can be signalled only when ALL the various things in the queue have finished their stuff so I suppose when the bottom of the queue is reached, while the mem barrier is “more fine grain” and is “command by command” ?

Thanks a lot for your explanations, it’s starting to make much more sense to me now.

Cheers.

“why” is bit of a design question, and I am not the author of Vulkan. Ownership of a resource is simply per queue family in Vulkan and that is how it is. Presumably it has something to do with limiting the update of cache hierarchies. The layouts could also have different meanings across families. Maybe also different families have different book-keeping/metadata they need to keep in updated state.

That’s what VK_SHARING_MODE_CONCURRENT can get you. Though the spec warns you it may have lower performance on some GPUs than you telling the driver explicitly when queue family handoff happens.

Hi,
it’s not ended yet … I still have a thing I am cracking my head on and I can’t make a sense out of it …

So basically I’ve been studying/based my thing on this sample :

Sasha compute_particles.cpp

The “thing that cannot make it work” is the part where it goes :

// Build a single command buffer containing the compute dispatch commands
	buildComputeCommandBuffer();

	// If graphics and compute queue family indices differ, acquire and immediately release the storage buffer, so that the initial acquire from the graphics command buffers are matched up properly
	if (graphics.queueFamilyIndex != compute.queueFamilyIndex)
	{
		// Create a transient command buffer for setting up the initial buffer transfer state

Fundamentally if I understand correctly there’s the problem “of the first command” , it’s a bit a chicken/egg situation.

The CS waits for the VS to have done rendering before to fill (write) again that SBO, the VS waits for the CSO to be done with the SBO before to read from.

But … there’s the “start condition” where the SBO is let’s say pre-loaded with some stuff at creation but the CS has not run yet.

Now in my case, like Sasha example, the VS runs BEFORE the CS.

The point is, it would appear “no matter what I do”, I always get for 1 time an error message that “there’s a no matching release barrier queued for execution”.

If I do like in the Sasha sample, precisely after recording the “CS command” and I create a “new command in the compute queue” and try to acquire/release and run it before ANY “graphic command” is ever run I get the moment it hit this :

	result = vkQueueSubmit(DeviceParams.compute_queue, 1, &compute_submit_info, VK_NULL_HANDLE);

when trying to execute that “transfert_cmd” this error message :

ERROR: (COMMAND_BUFFER 0x1f9d651ad38) [Validation] [ UNASSIGNED-VkBufferMemoryBarrier-buffer-00004 ] Object: 0x1f9d651ad38 (Type = 6) | vkQueueSubmit(): in submitted command buffer VkBufferMemoryBarrier acquiring ownership of VkBuffer (0x183), from srcQueueFamilyIndex 0 to dstQueueFamilyIndex 2 has no matching release barrier queued for execution.

And … yeah … “but you did not even left me time to submit anything in the graphic queue”.

If I DO NOT do that part of code simply ignoring it totally then when the first time it encounters :

		vkQueueSubmit(Platform.drawQueue, 1, &submit_info, render_fence);

NOTE at this point “nothing is been submitted in the compute queue yet”, this error happens :

ERROR: (COMMAND_BUFFER 0x186068004c8) [Validation] [ UNASSIGNED-VkBufferMemoryBarrier-buffer-00004 ] Object: 0x186068004c8 (Type = 6) | vkQueueSubmit(): in submitted command buffer VkBufferMemoryBarrier acquiring ownership of VkBuffer (0x183), from srcQueueFamilyIndex 2 to dstQueueFamilyIndex 0 has no matching release barrier queued for execution.

So first “0 to 2” has “no matching” and/or then “2 to 0” “has no matching” …

It seems a snake biting its own tail … HOW do you solve this dilemma ? Why the “use a command and execute it” does not work because it still gives the same error.

After this first error … “at this point the loop/cycle is started” so you don’t get errors any more … but HOW you can avoid the first one ?

My theory :

  • you need another set of CMDs for “the first frame”.
  • “the first time” one of the two does NOT use any barrier
  • from the second frame onwards you can use the barrier

Or at this point “there’s still something I am not getting it” and PLEASE LET ME TELL NO, THE Vulkan documentation IS NOT CLEAR AT ALL on the subject or I am really too thick to understand it !

I am getting nuts over those kind of things.

Any help appreciated.

Cheers.