Replacing glUniform for Vulkan transition

Greetings,

I continue my work on porting an OpenGLES based engine to Vulkan. Now it has come to setting uniforms within a shader. Currently, when the engine user wants to change a uniform he writes the new value into a central buffer. Shortly before the drawcall the engine then takes the buffer and issues a glUniform* call for every entry.

I want to use GL_KHR_vulkan_glsl to compile the existing GLSL shaders into Spirv so as little as possible changes for the engine user.

Problem is that GL_KHR_vulkan_glsl doesn’t allow uniforms outside a uniform block which I currently use heavily. Thus, my intention is to replace the glUniform mechanism with something GL_KHR_vulkan_glsl compatible. Currently two methods come to my mind:

  1. Just wrap all uniforms in something like a default block which is then updated from an ubo once a drawcall.

  2. Replace all glUniform calls by push constants.

The generation of the GLSL code is partially automated, at least when it comes to the generation of the declarations of the uniforms. The GLSL code that accesses the uniforms on the other hand is written by the engine user. That means using a method that merely changes the declaration is acceptable if the way they are accessed within the shader does not change.

I expect several hundred drawcalls a frame with up to 12 uniforms of which roughly half may change from drawcall to drawcall.

Can you tell me if one of the above options is optimal from a performance viewpoint as far as you can tell or do you have a different proposal?

Regards

No, the problem is that Vulkan doesn’t allow that.

That’s not going to work; at least, not as you described it. Vulkan isn’t OpenGL; you can’t modify memory, issue a rendering command, then modify the same memory and expect the previous rendering command to not notice. OpenGL will do the synchronization needed to make that work. Vulkan doesn’t; it forces you to be aware that said synchronization is in fact very expensive.

You would need to use a new area of buffer storage, likely specified by a dynamic UBO or a similar mechanism.

Since each object is going to have to have its own range of per-object memory for its draw call(s), you may as well have your “buffering” mechanism actually write that data when you “buffer” the values. So each object has an associated region of per-object storage within the buffer. Then the draw call just sets the dynamic UBO to the proper offset for that draw.

Of course, because you’re constantly changing which regions of memory you’re writing to, there is no “memory” of unchanged values. So if you want to have unchanged values, you’ll have to do that manually by storing a CPU-side copy, modifying any changes, and memcpying it into the per-object structure.

The push constant equivalent may not need such a thing. But push constant storage tends to be tiny, at most 256 bytes (with many implementations only offering 128).

Thanks for your answer. So how about this:

  • Two UBOs per drawcall. One staging buffer, one device local memory. The UBOs are managed by the engine object representing the shader.
  • When I start setting the uniform data, the staging buffer is mapped and the data is directly written to the mapped staging buffer memory. It is unmapped before rendering starts. Unmapping also triggers synchronization with the device local memory.
    Or could I let the staging buffer be permanently mapped and trigger synchronization when command buffer generation starts?
  • After that, render command buffer is built using the device side ubo.

I have several scenarios where uniforms take less than 256 bytes. So for that case, I could just replace the uniform declarations by push constant declarations, right?

I could still access the uniforms in the same way inside the shader so I won’t have to change this, right?

Ok, since I now read little bit further I would like to reformulate my question a bit:

Imagine a scenario where I set off ten drawcalls. Between the drawcalls, only the texture binding changes. What is the recommended way in Vulkan to solve this scenario? I can think of the following ways at the moment:

  • Update and synchronize the descriptor set between drawcalls. As far as I know, this is supposedly very expensive.

  • Allocate a new descriptor set every time one variable changes between the drawcalls, delete the sets at the end of the frame via pool reset.

  • One large descriptor set for all potential drawcalls, then vkCmdBindDescriptorSets with pDynamicOffsets.

Any recommendations?

You cannot change a descriptor set while it is part of a command buffer. So you would have to submit a single draw call, wait on the CPU for it to finish, then submit a new draw call.

FYI: you cannot delete a descriptor set until after it has finished being consumed by the GPU. So it would have to be deleted at the beginning of a later frame.

But in any case, the better way to do this is to create a set for each object’s per-object data, and that set should be associated with that specific object. So you only allocate a set if a new object is needed, and deallocate a set if an object goes away.

Dynamic offsets are for buffer resources, not textures. They’re offsets into the buffer objects in question.

So, most of these are not functional solutions to your problem.


Your question is essentially about how to provide per-object data, with a particular look at textures. Generally speaking, there should be some relationship between certain classes of shader data and the objects behind that data. Some shader information potentially changes per-object, while others are scene-based (camera matrix, etc).

For buffer-based data, Vulkan offers two options to avoid having to create and bind independent descriptor sets per-object. These are dynamic buffer descriptors and push constants. Push constants are a tiny amount of storage that shaders can access, but is stored as part of the command buffer. Dynamic descriptors allow you to jump around within a buffer object, so you could give each object a different buffer region to use. You have to re-bind the descriptor set, but you’re not modifying the descriptor set or using a different one.

For textures, things get more complex. Neither of the previous mechanisms work, since they’re for buffer data. So you generally have 3 options: give each object its own set for these kinds of data, give them an array texture so that each object can pick the texture of interest by index from that array, or give them an array of samplers so that each object can pick the texture of interest by index from that array.

In the latter two cases, you need to provide an index on a per-object basis, which the shader will use to fetch the correct texture. So you’ll need to use one of the techniques I mentioned above for feeding per-object buffer data to a shader. Note that the per-object index can serve multiple purposes: you can use it to index into an array of matrices to get the per-object matrix as well as getting the index of the texture. Or any other per-object data.

Array textures have the limitation of requiring all textures in the array to have the same size and format. Arrays of samplers only require that the textures in the array are of the same type (2D, cube-map, etc), but since the index into the array is not a compile-time constant, you must ensure that the Vulkan implementation supports shaderSampledImageArrayDynamicIndexing.

Push descriptors might be a fourth option, but they’re not universally available.

1 Like

What you are basically saying is “treat the descriptor sets like buffers and keep them on the gpu side for reuse” right?

Providing descriptor sets on a per object basis would be complicated for me as the descriptors depend on the shader configuration from the pipeline and so far the object stuff has been pretty independent from the rendering stuff. It would come down to caching descriptor sets for every drawcall I assume.

Regarding the dynamic allocation method, currently the system only allows 3 unfinished frames and will wait after that. So my plan was to provide 3 descriptor pools and once the frame finishes reset the pool with all descriptor sets created in the meantime getting deleted. Descriptors sets are then allocated/set per whenever the drawcall variables change. How much slower is this compared to creating everything in advance always?

What if my number of UBOs/Textures for a drawcall is very small (<= 4)? Then I could just create an individual descriptor set for every UBO/Texture and bind them all before the drawcall, right?

You should probably avoid that. Otherwise, you’ll be doing a lot of needless pipeline bindings.

Why would you delete an object, for the sole purpose of creating a new one that has the same value as the deleted one? I don’t see how this is a good idea, unless there is no fixed mapping between object and texture, so you’ll be changing textures with some frequency.

Otherwise, it seems to me that you wrote yourself into a situation where it’s just more convenient for how your code is written to do things this way.

Lastly, I wouldn’t bother with any of these, because you should just use one of the mechanisms I outlined for handling this.

I added another sentence which you might not have seen.

… how exactly would that work? Even if we assume that you can bind that many descriptor sets at once, descriptor set indices are hard coded into the pipeline. They cannot be changed after pipeline building. So instead of binding different sets per object, you’re binding different pipelines per-object. Pipelines that differ only by which descriptor set indices they use, not even by how they use them.

I fail to see how that’s an improvement. Just do things the reasonable way.

At the moment, everyone who uses the engine can, in theory, set the texture freely between drawcalls because OpenGL allows this via glBindTexture / glUniform1i. I would like to preserve this for now because it would break the code of a large number of people otherwise.

shaderSampledImageArrayDynamicIndexing is not available btw.

Are you saying that your current rendering engine allowed/expected(?) users to make OpenGL calls behind the engine’s back? If that’s the case, I don’t think there’s anything you can do in general outside of writing a Vulkan wrapper around the OpenGL API. At least, that’s where your thinking is heading.

And that’s a big project.

Even for this specific case, you’d have to expose at least the OpenGL texture APIs, including some way of mapping texture image units to descriptors. That’s going to get very complicated, very quickly.

In any case, if they’re not actually making OpenGL calls behind your back, and your engine therefore knows when the user is playing around with textures, but you allow users to define the association between their conceptual objects and how they get rendered, then you’re just going to have to accept that your abstraction isn’t in an ideal place for Vulkan. You’re going to have to manufacture descriptors in the middle of a frame of rendering and apply them on a per-draw basis. So that means having a lot of descriptor pools and the like.

That’s semi-common for mobile GPUs.

Yes, it’s a mobile GPU.

And they can’t call OpenGL directly, the uniform values (also textures and ubos) are collected in a central object that is translated to glUniform calls directly before the drawcall.

Ok, so I made my way to push_constants now. I now have one push_constant block in the vertex and one push_constant block in the fragment shader (GLSL), something like this:

Vertex:

layout(push_constant) uniform pushConstants {
    <former vertex uniforms>
};

Fragment:

layout(push_constant) uniform pushConstants {
    <former fragment uniforms>
};

I then use two vkCmdPushConstants calls before each drawcall to update them from different byte ranges. I get the following error in the validation layer:

vkDebug: Validation: 0: Validation Error: [ VUID-vkCmdPushConstants-offset-01796 ] Object 0: handle = 0x555557f81248, type = VK_OBJECT_TYPE_COMMAND_BUFFER; | MessageID = 0x6a367c89 | vkCmdPushConstants(): stageFlags (0x10, offset (0), and size (44), must contain all stages in overlapping VkPushConstantRange stageFlags (0x1), offset (0), and size (64) in VkPipelineLayout 0x3f000000003f[]. The Vulkan spec states: For each byte in the range specified by offset and size and for each push constant range that overlaps that byte, stageFlags must include all stages in that push constant range’s VkPushConstantRange::stageFlags (https://vulkan.lunarg.com/doc/view/1.2.154.0/linux/1.2-extensions/vkspec.html#VUID-vkCmdPushConstants-offset-01796)

First of all the question, do I have to keep the push_constant blocks from vertex and fragment shader apart manually using layout/offset or is there a more convenient way? Or would I have to use one push_constant block that is shared between vertex and fragment shader? Would this affect performance negatively? Note: I use spirv_cross to get all the resource layout.

Shaders can share the same push-constant ranges, so if you want them to be distinct, that’s something you have to do. And that includes within the shaders themselves.

Is there any reason to not make all push_constants visible to all shader stages? Like, performance?

Ok, I now moved all push_constants to the vertex shader but then my fragment shader won’t compile.

Do I have to declare the same push_constant block twice, once in the vertex and once in the fragment shader to make it work? And then one vkCmdPushConstants call with both stage flags set?

Edit: Now the validator complains with PushConstantOutOfRange because apparently the push_constants the vertex shader doesn’t use are optimized out and thus not recognized by spirv_cross.

Can anyone name me a coding example for declaring and updating one common push_constant block GL_KHR_vulkan_glsl?

… so what? If “spirv_cross” has optimized those push constants away, and it’s also telling you how many resources your shader can use, the size of the push constant range is presumably part of that data, right? So even if it optimizes some of it out, the range it gives you ought to be fine for the VS. And the FS range would be a different push-constant range.

That is, spirv_cross is supposed to give you a pipeline layout, which should contain accurate push constant ranges. And if those push constant ranges are inaccurate, then that’s a problem with spirv_cross.

It seems more likely to me that there’s something you’re doing wrong here, but we can’t see what it is because we don’t have access to your code or your SPIR-V.

I now made everything for two stages, identical push_constant declaration, everything with both stage flags set, one write, now it seems to work.

Thank you!