Command Buffer Execution Environment(s)

Hello again.

Another curious question to think about:

Why are command buffers used as execution environments, where states are defined within a command buffer, but states are not preserved across command buffers.

The device should be the reference for modelling execution states.
Vulkan already exposes necessary synchronizations to make this practical.

I believe there is optimization potential in allowing command buffers to update global states that affect all subsequent commands; A possibility for a complete set of undefined vkCmd sequences, without perceived redundant vkCmd’s.

A global execution state makes using command buffers more intuitive, especially when we are made to believe that a command buffer is a buffer for commands.

Maybe not the most impactful optimization, but an obvious one; making driver algorithms more transparent.

OK: what are those opportunities? What you seem to be talking about is something like D3D12’s “bundle” command lists. But even in D3D12, the pipeline state object’s state is not inherited by command lists. So every bundle has to bind a PSO, which may well be the same PSO as a previously issued bundle bound.

So it’s not really clear what the benefits would be.

My use case is very petty lol, I would like to bind a descriptor set that will be used for subsequent compute pipelines.

I could make a command buffer with all the pipelines in sequence, but the pipeline permutation is undefined (specific to parameters of the frame). To make a command buffer for each permutation would add complexity to the architecture of my code and management system.

I could make a unique commandbuffer for each pipeline-descriptor call, which is what I currently do.

But considering this is for the boot function of my operating system (where minimizing instructions improves generalization of my codes reach), my hyper optimistic mind is concerned that it delays performance both for start and run time.

At the very least, from an information metric, a global state would simplify the vulkan API.

Not sure how drivers are handling descriptor bindings, or if descriptor updates even change performance at all.

A driver could seamlessly circumvent a descriptor bind call if it: detects no state change, and doesn’t clear previous states. But it’s not an assumption a local state supports.

Global state!

I think you have an incorrect idea of what CBs are for in Vulkan.

CBs don’t have to be static and unchanging; indeed, the general usage of them is that you mostly rebuild them every frame.

The primary purpose of splitting a conceptual sequence of commands into separate command buffers is parallelism. That is, you have multiple CPU threads that each need to write part of that sequence, so to avoid inter-thread synchronization, you have each of them write to their own CB. Then a single thread submits all of them in order.

Unless you’re doing that, unless you are iterating over your sequence of GPU tasks on multiple CPU threads, there just isn’t much reason not to simply record all of the commands directly into the CB as you need them.

Each of the hypothetical inheriting CBs you want to create consists of a pipeline binding call, maybe some push constant setup, and one or more dispatch operations, possibly concluding with some form of event setting or other synchronization primitive so that consumers can access it. Instead of building a CB for just those few commands and using execute buffers to execute them later, just write those 3 or so commands when you want to execute them.

CBs aren’t meant to be the Vulkan equivalent of a macro; they’re not just dumb holders of vkCmd calls.

Why are you trying to run a compute shader at OS boot time? Are there boot-up tasks whose algorithms are parallelizable such that a GPU implementation is warranted? What are those tasks?

Um… how? The API itself wouldn’t even change.

Descriptor updates and descriptor binding are separate processes. Also, it seems very odd to suggest a change to improve performance when you don’t even know if it would meaningfully help performance.

I think the reason global states are not standard is because of issues regarding semaphore synchronization effects, and local states are enforced to protect you from a driver side optimization.

In which case, (assuming global states standard) you could just resort to the local approach and guarantee your command buffers have the appropriate states.

I love when there’s 2 problems, and 1 solution solves them both; This is good design by nature.

Just tryna make vulkan more sexy alongside my algorithms; I love it. aha

As the gpu is heavily integrated with the OS, the threads running the gpu are also os critical. And is at the center of integrating cpu and gpu os functions. Rewriting commands to a buffer each frame could be apparent, depending on os parameters and demands. Thus I want to avoid rewrites wherever possible. But i also don’t want to over complicate the code, so that people can make changes more easily where they see fit.

Global state would help me do this. But no1 will die if it changes global or not. lol

The gpu is integrated into the operations of the os. yes.

I consider the API and how it’s used (the spec) the same body.

Sure, performance improvement would not be guaranteed by this change, but it’s a simplification on the app side of the api, as I’ve mentioned, it would help me simplify my code and education material. There is no performance guarantee with anything defined by vulkan, as you put it, a driver can play whatever games it wants with things.

This is an improvement that enables me to do what I want for once. :frowning:

Global state is one of the top things I hated about OpenGL. If state is global, it is not state, it is variable cesspool then.

1 Like

aha Why put it that way, you make me smell bad.

Thanks for the insight. I realized I don’t like the smell. I convinced myself that, when multiple command buffers are queued, local states better encourage drivers to hit performance opportunities.

Local states are the best option to make most of control vulkan provides (assuming your goal is to get as much work done per time).
Perhaps if vulkan evolves to model actual core states (and model more driver like functionality), a device based global state would make more sense.