Most common Vulkan mistakes vs. the world.

AMD held a talk about the “The most common Vulkan mistakes”. Video is sadly unavailable, but the slides are. I downloaded them to have a look. But I found some things in them to be… dubious.

The most oddball claim was this, as a problem:

Apps often re-record command buffers every frame.

Their solutions were:

‒ Move all parameters that affect the rendering logic to images / SBs / UBs.
‒ Pre-bake all command buffers once per each swapchain image, if necessary.
‒ Use indirect dispatch/draw commands if they improve command buffer reusability

My question is this: is AMD insane?

From prior GDC talks about Vulkan, we saw that the UE4 engine’s implementation is constantly rebuilding command buffers. They devote multiple tasks across multiple threads specifically to command buffer building. Each frame.

Yet here’s AMD, saying that this is wrong. That the Unreal Engine is wrong. That we should instead be willing to make Herculean efforts to avoid building command buffers. I have no idea how that would even be possible in most game scenarios, since the nature of what you’re rendering changes on a frame-to-frame basis. AMD seems to believe that we should be using compute shaders to walk through CPU entity data structures or something to decide what to render.

Then there are just oddball statements. Like a discussion of “sparse descriptor sets” (descriptor sets with holes in them)

The app is inefficient, dummy bindings negatively affect performance.

The main reason to have dummy bindings is to avoid breaking layout compatibility with something else. Again, the GDC presentations suggested that layout changes were a huge deal performance-wise.

I don’t know; is AMD implementing the same Vulkan that everyone else is using?

[QUOTE=Alfonse Reinheart;40237]The most oddball claim was this, as a problem:

Apps often re-record command buffers every frame.

Their solutions were:

‒ Move all parameters that affect the rendering logic to images / SBs / UBs.
‒ Pre-bake all command buffers once per each swapchain image, if necessary.
‒ Use indirect dispatch/draw commands if they improve command buffer reusability

My question is this: is AMD insane?[/QUOTE]

I don’t think that this is aimed at all scenarios but rather at applications that record the same command buffers over and over again, like the early (?) LunarG samples did (and my samples too ;)). Creating command buffers is pretty fast (no matter what GPU), especially if done in the background using multi threading, so pre-recording everything wouldn’t make any sense and would be a complete overkill for complex visibility scenarios.

I don’t think that this is aimed at all scenarios but rather at applications that record the same command buffers over and over again, like the early (?) LunarG samples did (and my samples too ).

Their proposed solutions do not sound like solutions for sample applications. I mean, would your sample applications need to move data into memory objects or use indirect calls, just to stop rebuilding command buffers? Earlier in the slides, they also said, “A frame can be rendered with just two commands!” with Vulkan.

AMD seems pretty serious about this.

One mistake I see in just about every example is transitioning the swapchain images from undefined to present right after creating the swapchain. This is explicitly not allowed.

Instead you need to either track which images you have seen already. Or always transition from undefined to color_attachment. This is always possible and implies that you don’t care about what was in there before.

[QUOTE=Alfonse Reinheart;40239]Their proposed solutions do not sound like solutions for sample applications. I mean, would your sample applications need to move data into memory objects or use indirect calls, just to stop rebuilding command buffers? Earlier in the slides, they also said, “A frame can be rendered with just two commands!” with Vulkan.

AMD seems pretty serious about this.[/QUOTE]

No matter what else you need at least 3 commands to render a scene: acquireNextImage, queueSubmit and queuePresent. This is not counting getting data to the GPU which requires fence management and flushing mapped memory.

so even if they don’t count acquireNextImage and queuePresent then you still need at least 3 calls for anything non-static (checking fences, flushing mapped memory and queueSubmit).

[QUOTE=Alfonse Reinheart;40237]
The main reason to have dummy bindings is to avoid breaking layout compatibility with something else. Again, the GDC presentations suggested that layout changes were a huge deal performance-wise.
I don’t know; is AMD implementing the same Vulkan that everyone else is using?[/QUOTE]

Descriptor sets are supposed to map to some entities in a GPU RAM, so I suppose too much unused NULL pointers in those entities will hurt cache efficiency when used irresposibly. But few things in those slides do feel like pre-emptive optimization for some really unrealistic use-cases. Like, I cannot even imagine how many things may go wrong if we allow multithreaded work submission.

[edit] incorrect post but can’t delete.

[QUOTE=ratchet freak;40240]One mistake I see in just about every example is transitioning the swapchain images from undefined to present right after creating the swapchain. This is explicitly not allowed.

Instead you need to either track which images you have seen already. Or always transition from undefined to color_attachment. This is always possible and implies that you don’t care about what was in there before.[/QUOTE]

it appears they didn’t even know that it was a mistake… It's not allowed to transition image without first acquiring them · Issue #2 · GPUOpen-LibrariesAndSDKs/HelloVulkan · GitHub

That’s how bad habits and workarounds in drivers start.