Scheduling a secondary device queue?

I’m working on some dynamic global illumination that isn’t quite real-time, but instead spreads the calculation out over a number of frames. Subdividing this task into equal parts to be processed each frame is a bit difficult.

The time it takes the process to complete doesn’t matter, but its impact on the GPU does matter. Ideally I would like to create a secondary device queue, send it a command buffer, and tell it “take no more than 20% of the GPU power to do this, however long it takes”.

The goal is to prevent erratic frame rendering times, but also prevent the “background” queue from hogging the GPU.

Any advice?

It actually looks like the queue priority does what I am asking. The spec does not provide any information on the relative value of priority values, but the fact it is a normalized number perhaps implies a percentage of available resources:
https://www.khronos.org/registry/vulkan/specs/1.3-extensions/html/vkspec.html#devsandqueues-priority

Queue priorities are just a hint.

There are no anti-starvation guarantees.

The priority number has no defined meaning, except that higher number means same or higher “priority” (for unspecified meaning of “priority”).

Your GPU only has so many computational resources. They are shared by all queues that use shaders of any form. As such, while the two queues can have separate streams of commands, they cannot fully execute their workloads independently. For the GI queue to process something, it must take time some computational resources away from the rendering queue.

Exactly how the GPU does load balancing for these kinds of thing will vary from GPU to GPU. The priority is just a suggestion; how exactly it will be respected is up to the implementation.

Yeah, I tried it with queues from two different shader families, one for rendering+compute the other supporting just pure compute, and I’m still seeing the same error on vkWaitFences(). It works fine if I lighten the load of the compute shader.

It’s disappointing I can’t run a big compute shader without stalling out the rendering queue.

GPU is an Nvidia GeForce 1660.

Should it be disappointing, though?

In order to be able to do that, the compute queue would have to be able to have its own computational resources distinct from those on the graphics queue. Which means that those resources would not be available to graphics operations.

Isn’t that bad? I mean, if you have 10 compute cores, and 3 of them are dedicated to compute operations, but you’re only able to fill 2 of them, while your graphics operation really could be faster if it could get 8 cores… isn’t it better to be able to let the graphics operation fill up that core instead of it going unused?

Dynamic load balancing is always going to give better overall performance compared to static load balancing.

Is your GI operation a single, large dispatch call or is it broken down into multiple dispatch operations? Are those dispatches in separate batches? Basically, is it possible to submit your GI operations in smaller pieces, once every frame?

1 Like

I figured multiple queues on a GPU would be similar to running multiple threads on a single-core CPU. I can think of many uses for that, but I don’t know anything about the details of how the hardware would handles it.