My understanding on pipeline stages.

Over the last week I got a epiphany on how I believe pipeline stages and the sync operations work. Given what a mental leap I had to make I made a bit of a write up on it.

https://gist.github.com/ratchetfreak/62b9e795d452f93bcbdcee3664acca0e

The tl;dr of it is that each queue isn’t just one queue of execution but each stage is (mostly) independent. Once I got that realization and with my previous experience with syncing job queues, the rest just clicked into place.

If this general idea is true, then there ought to be an explicit listing, for each vkCmd of which stages it uses. Yes, the rendering commands will use stages based on what pipeline is currently active. But overall, it would be very handy to be able to look at a command and know what stages it employs.

Section 6.5.2 is a good start (particularly for transfer commands), but it is not sufficient.

[QUOTE=Alfonse Reinheart;40262]If this general idea is true, then there ought to be an explicit listing, for each vkCmd of which stages it uses. Yes, the rendering commands will use stages based on what pipeline is currently active. But overall, it would be very handy to be able to look at a command and know what stages it employs.

Section 6.5.2 is a good start (particularly for transfer commands), but it is not sufficient.[/QUOTE]

And for each bit of memory it touches in which stage it is actually touched (if a command uses more than one stage) and which accessmasks are appropriate.

I mean it’s implicit which ones are appropriate when rendering but I’d like an explicit paragraph for each point in the pipeline which stage and access mask bit(s) are involved.

I get the impression you overthink it.

What’s there is basicaly Pipeline, Queue and Device.

Things across Devices and across Queues a are done independently/asynchronously (if not explicitely synchronized by a user in the case of Queues).

Things in single Queue are done “synchronously”, but out-of-order and can overlap.

There’s is a direct access to memory. In Vulkan people are exposed to command dependency and memory barriers they are unfamiliar with(they are usually shielded from those on CPU side: x86 memory barrier instructions)

There are more Pipeline types and their stages are mashed together in the enum. Transfer Pipeline with pretty much one stage. Host pseudo-Pipeline with one stage. Familiar Graphics Pipeline with programable stages VS->TCS->TES->GS->FS. And Compute pipeline, most likely just reusing FS stage under the hood. And all stages which means pretty much whole pipeline (e.g. will not start VS of dependent command before FS of dependency command)

Those are the things that can overlap. Not unlike as on common super-super processor. There may be more pipelines, and they may overlap in computation

By barrier you are just controling to what point things may overlap (which commands to what stage can progress before some stage of other commands may start). Also (separately) by the same command you declare memory barrier, just controling which memory needs to be flushed from cache/local memory and a deadline by which it must be flushed (ready to be read).

Perhaps but I don’t have much experience programming or optimizing for deeply pipelined architectures. So If you explode out the single queue into multiple execution units that conceptually run semi-independently (which after the driver’s optimization they very well could appear to) with their own job list then the sync model is easier to understand (for me at least).

To clarify we are speaking software/driver architecture. HW-wise Vulkan can run on any hardware that supports float and int and can emulate the core set of commands/functions.

Well whatever works to understand it…

My favorite interpretation ATM is:

  1. action Command is a workload that need one or more passes through a pipeline

  2. single Queue is just a concatenated sequence of commands that leads to a (single) Pipeline (of each type)

  3. single Pipeline can pick any command to execute from the Queue, if it follows synchronization elements therein.

  4. single Pipeline does… well pipelining. Its work can overlap. If work progresses to the next stage, new work can begin in the previous stage (again with the exception of synchronization elements - it will either pick the command from Queue that can fill the void, or if it can’t then stalls/executes nops)

That’s not so different from my interpretation TBH. Except that mine is less sequential.

Not to mention “The work involved in performing action commands is often allowed to overlap or to be reordered” will allow the device to appear like my model.

Well there are slight differences between out-of-order pipeline and true parallelism (e.g. latency, startup and finish lag), but those won’t surface on regular use…

Also things would be called bit differently e.g. Mutex/Lock instead of a Barrier.