Vulkan Synchronization (Best practices, Common & Exotic Application, vs. OpenGL)

Introduction

VULKAN gives the developer the possibility/responsibility to use synchronization primitivies.
Core questions

Is it right, that OpenGL is really totally hiding synchronization in the driver? If not, can someone give examples, where you also can use synchronization with OpenGL.
Is Synchronization just an annoying sideeffect of VULKAN's new driver architecture and API? If not, where does it really shine?
Are there common scenarios, where to use VULKAN's Synchronization primitives (Boilerplate code) and are there exotic scenarios, where one can really push the limits of his GPU?
Are there best practices like e.g. the reuse and grouping of Synchronization primitives? Are there common pitfalls when using synchronization?

Synchronization is really one of the worst documented features in the VULKAN specification and I would be glad, if you could shed some light onto this and provide code snippets, that support the understanding of concepts.

Since VULKAN is also relatively new, I would support and moderate this thread with the intention and hope, that it may become an structured overview/discussion providing answers to the above questions. I would appreciate it, if the overview prepares community knowledge, that goes beyond the current VULKAN synchronization tutorials.
Example Semaphores

When to use?

Typically for inter-queue synchronization.

Common application(s)

Synchronization of rendering and presentation engine/queues.

Best practives

Since there are bugs deleting semaphores and the creation of synchronization is often using a lot of ressources, I suggest to create as many semaphores as needed in the whole application and reuse them.

Example:

Create as many pairs of render/present semaphores as we have images in the swapchain.

Alternatives

Synchronization via Timeouts.

It’s strange that you seem to have forgotten from your Stack Overflow question that Vulkan is not an acronym.

Is it right, that OpenGL is really totally hiding synchronization in the driver?

For the most part, yes.

If not, can someone give examples, where you also can use synchronization with OpenGL.

Async pixel transfers, as the name suggests, are asynchronous, but only somewhat. You can’t break them; if you try to read from/write to the buffer before the pixel transfer completes, you will get a CPU/GPU sync. So it’s still effectively a synchronous API, but you can decide when you’re ready to pay the price for a CPU/GPU sync. Fence sync objects allow you to mitigate this to some degree, letting you test to see when the transfer has completed.

Then, there’s the entire set of incoherent memory accesses: SSBOs, image load/store, etc. There are also ways to do read/modify/writes to textures while rendering to them.

But all of these are fairly advanced techniques. For the majority of users (and pretty much for every tutorial you find online), OpenGL acts in as if in a synchronous fashion.

Is Synchronization just an annoying sideeffect of VULKAN’s new driver architecture and API? If not, where does it really shine?

Define “sideeffect[sic]”. You could consider the need for explicit synchronization a mere side-effect of Vulkan’s low-level nature.

Vulkan is a low-level, explicit API. That means it’s intended to be as close to the hardware as reasonable, but no so close that hardware cannot effectively optimize what you’re doing for their specific devices. Having direct control of synchronization is part-and-parcel of being an explicit API.

It “really shines” in that you cannot be ignorant of the cost of what you’re doing.

In OpenGL, you can render to a texture, then immediately read from it. This is incredibly bad performance-wise, but the API makes it appear to be no different from binding any other texture.

Because Vulkan requires explicit synchronization between writes and reads from memory, it is impossible for you to write valid Vulkan code without being at least somewhat aware of the fact that you did something different from just binding a texture for reading. You have to be aware that what you’re doing will not be fast, and therefore, you can take steps to make it fast (putting more work between the write and the read, for example).

Are there common scenarios, where to use VULKAN’s Synchronization primitives (Boilerplate code) and are there exotic scenarios, where one can really push the limits of his GPU?

You should never attempt to write Vulkan code via copy-and-paste. That ought to be true of many APIs, but Vulkan in particular is one where you absolutely need to understand what you’re doing.

Having “boilerplate code” leads to performance traps in your code. For example, you could have some boilerplate texture uploading code that always issues a barrier after the upload. But since that function has no idea how you intend to use that texture, it must be a full barrier, across all stages and across all access types. That’s needlessly bad, performance-wise. Furthermore, if you call this code to upload to two different textures, it will issue two barriers instead of one which covers both textures. That’s also needlessly bad, since the second upload cannot start until the first has ended, even though they’re to two completely different regions of memory.

Really, if you’re going to do that sort of thoughtless boilerplate thing, you may as well go use OpenGL. You won’t have to bother with such things, and you’ll get decent enough performance.

Now, there are ways to do the kind of boilerplate you’re talking about, but they require cooperation with the user. For example, take the image upload case. The uploading function would not issue a barrier, but it would return a VkImageMemoryBarrier, and it will have filled out most of the fields for this structure.

Your job is to fill in specifics like what image layout you want to transition to after the transfer is complete, what your destination access form is, etc. And most importantly, it’s on you to fill out the actual vkCmdMemoryBarrier call.

Are there best practices like e.g. the reuse and grouping of Synchronization primitives?

The best practice is to do the minimum synchronization that you absolutely need, not to have some general sync that always happens just because. Mindlessly syncing things is bad.

The best practice is to have a strategy for dealing with synchronization, one that’s tailored specifically to the needs of your application.

Are there common pitfalls when using synchronization?

The most common pitfalls are either not doing it when needed, or doing too much synchronization. The easiest way to avoid the former is to use debugging layers. Avoiding the latter however… that’s harder.