Relative costs of state changes (PipelineLayout compatibility)

A while back there was this great OpenGL talk by Cass Everitt where he discusses the “Relative costs of State Changes” for OpenGL.

I’m sure most of this is still applicable in Vulkan. However, with PSOs a few of those items have been combined. Has there been anything like the above published (or just community/personal experience) which shows the costs of changing PSOs which are:

  1. Fully PipelineLayout compatible with prior binding
  2. Mostly PipelineLayout compatible with prior binding (sets 0,1,3 compatible, set 2 not)
  3. Not compatible at all with prior binding

Just to get a sense of the relative costs in those circumstances? The specification sounds like there would be expected cost savings the more compatible these are but I’m curious about real world experience.

I don’t know how to express this but here’s a rough attempt. Looking at the OpenGL bar graph above, from my understanding PSOs encapsulate:

  • Program
  • ROP
  • Vertex Format

Which are all part of the same vkCmdBindPipeline call (ignoring dynamic states for now). In the worst case scenario, where the prior PipelineLayout is not compatible (e.g. sets [1, N) are different layouts) does the relative state change costs become (in decreasing order):

  1. Render Target
  2. PSO
  3. Texture Bindings (ex. descriptor set 1)
  4. UBO Bindings (ex. descriptor set 2)
  5. Vertex Bindings
  6. Uniform Updates

Where descriptor set 0 would be frame global data and not changed.

But when the PipelineLayouts are fully compatible, would it be expected that PSO binding would drop significantly in cost, displacing the texture and UBO bindings or would it still be more expensive than texture bindings and for the most part remain at the #2 spot?

Assuming PipelineLayouts fully compatible, but I create PSOs which effectively disable certain attributes. Say, I have some objects which do not require normal mapping, and I’ve turned off the T/B attribute. In every other way the PSO is the same but the vertex format differs. Would binding this PSO only cost as much as a vertex format change or be just as expensive as a Program change?

Thanks

Yeah, I’m pretty sure it’s not. In fact, the structure of pipeline objects basically makes that chart meaningless, since each pipeline encompasses almost all of those things.

It should also be noted that a not-entirely-insignificant part of the state change cost in OpenGL comes from the cost of verifying the new state. Vulkan doesn’t do that.

That being said, 1.3 changes things, as almost every aspect of the pipeline object can be dynamic and therefore changed independently. How this affects performance would require profiling, as it would vary from hardware to hardware.

In any case, the main takeway should be this: don’t do redundant state changes where reasonable, avoid creating needlessly many pipelines that you have to change all the time, and try to have as few descriptor sets as you can get away with. Anything deeper would require specific profiling for specific hardware. None of what you’re asking about can be known a priori in Vulkan.

Also:

That’s part of the render pass. Even with render pass-less rendering in 1.3, you can’t change render targets without ending the render pass. So worrying about that is kind of pointless when you’re already having to incorporate a heavy-weight operation like that.

Isn’t there a concern that this leads back to an issue OpenGL experiences where the Pipeline might need to “re-compile” the underlying shader if, say, you enabled some state which the HW doesn’t support and so it gets “patched” into the underlying shader(s) – or is this a non-issue since the spec requires those dynamic bits to have HW support?

Currently, I’m tinkering around in a Vulkan 1.2 project but there is no requirement preventing me from upgrading. I’ll give Vulkan 1.3 spec a read.

I suppose its the “needlessly many pipelines” where I second guess myself. A basic example I can think of is normal mapping. However this could apply to other material settings which can be toggled on/off at offline SPIRV compile and runtime pipeline creation. I’ll just focus on normal mapping though.

Hypothetically, let’s say my scene has roughly 50% of the materials with normal maps and 50% without. I could do the following for the “disabled normal maps” case:

  • Offline, compile SPIRV with defines to disable normal mapping
    • Vertex attributes for tangent/bitangent omitted from Input declarations
    • Preprocessor omits code which does any normal mapping
  • Runtime, build pipelines
    • Reference the related ShaderModules from the above offline process
    • Omit tangent/bitangent vertex attributes from VertexInputState

In that case, is it an unnecessary permutation? I could technically have a pipeline which always has the vertex attributes enabled, use a default normal map texture (1x1) and use a flag in a material uniform block which indicates if normal mapping is enabled. Then only perform normal mapping in that case (dynamically uniform).

Would that be a typical approach to reducing pipelines and just expand this out to the other optional textures/features? Then continue building upon this style only until a performance issue is actually encountered. Which, when/if that occurs you at least have specific data on which to base your optimization. Or is there a clear performance issue in the above approach and should be avoided?

I guess I don’t have a good sense of when to make the trade off and am veering into premature optimization territory.

I suspect there is a general rule of thumb somewhere in here. For example, if a given pipeline permutation would service >10% of draw calls then it might be approaching a level where the ROI on optimization would be worthwhile. But like you said, profile.

Regarding 1.3, dynamic state, and “many pipelines”, this is also worth a read:

Vulkan tries to discourage it by giving as many tools as possible so the driver does not have to resort to it. Trivial in-place patching should not be a performance problem. Full pipeline recompilations should not be happening on good drivers (might be happening on bad drivers). Sometimes there might be a sentiment to report support for something the HW supports in a strained way. There is always some judgement call to be made by the vendor, which obviously is biased towards exposing more as to look more capable than the HW actually is.

That was an interesting read. I also checked out the linked to posts on “shader permutations” which was also a great read.

This quote from part 2 is where my current struggle is:

In particular it’s a lot more reasonable to try to make sure that all of your permutations have a reasonable register count and occupancy…at least assuming that you have the tools available to determine these things. There are also some things that really do require permutations since they are static properties of the shader itself, and thus can’t be replaced with branching. In particular this includes the set of inputs or outputs of a shader (interpolants, render targets), usage of discard, per-sample execution, and forced early-z.

Being a solo project/hobby I’m not looking to unnecessarily create more work for myself. Although I’d like to spend the time to learn the elegant solutions or at least understand the trade offs I’m making with more clarity. Given my example above it seems like it’d qualify for its own permutation. In your opinion, would you make that same choice (to permute given 50% of my materials have no normal map) or would you opt to “fake” out the normal map with a 1x1 texture and call it a day?

Anyway thanks for your response and to @Alfonse_Reinheart too. I’ve learned a lot just from spectating on this forum.