Parallel rendering in multiple threads


In my application, I’m going to use the UI on ImGui to control the scene, for this I decided to render and manage the rendering of the UI and the scene in separate threads. I will also use separate logical resources and command queues for the UI render and scene threads.

I will have 3 threads:

  1. The basic thread of the application that will take the rendered data from other threads and display it.
  2. Thread for the UI render.
  3. Thread for rendering the scene.

How can I implement such a system?
As far as I understand, I should render the framebuffer and immediately give it to the main thread, right?

And the second question concerns how I can combine the results of the render.

In this case, I need both renderers not to overwrite each other.
For example, if the scene is rendered first, and then the UI, then the UI should not overwrite the scene. I suppose I can protect myself from this by using scissors, right?

If you do not know how to implement it, then that puts a bit into question if you arrived to that architecture by weighing actual engineering concerns.

While scissors help, you might run into other problems such as keeping the layout of the image consistent, and load op clears. Of course you could do those in the “main” thread, but that is suboptimal (having N render passes where one would suffice) and defeats the whole purpose of the separation. Also you would be limiting what layout the offshoot threads can use.

Even so, I think ImGui can just be added simply by an extra subpass. What do you stand to gain by having it an extra thread?

Do you think that I should not bother with multiple threads?
Hmm, yes, perhaps this will not really give a special increase in performance, but it will significantly increase the complexity of the program.

Should I use render subpass? 1 subpass for UI rendering and 1 subpass for scene rendering? Or how?

Well multithreading is only useful if you have something that is not tightly coupled, or you need the asynchronicity. ImGui is dependent on your swapchain anyway, so I don’t see what could be gained by asynchronicity. Additionally if it shares virtually all the resources (queues, swapchain images, …), then it is also tightly coupled.

You still need to somehow synchronize the swapchain image, so you need to serialize that somehow. And you need to guard access to the queue by mutex. That looks like pretty serial work, so I am not sure what you hope to get by the threads.

Should I use render subpass? 1 subpass for UI rendering and 1 subpass for scene rendering? Or how?

IIRC that is how ImGui generally works. It asks you to put it as a subpass somewhere. Best just to append it at the end of already existing render pass.

In your division of labor, thread 1 does not need to exist. At least, not in relation to threads 2 and 3.

If you’re going to start threading your rendering processes, you should be thinking not in terms of threads, but in terms of tasks.

You have some number of pre-rendering tasks to perform, tasks that cannot overlap with the rendering tasks (and some tasks that don’t care about overlap with rendering). You then have a UI task and a “render space” task. And finally, you have a submission task that takes the results of the UI and render-space tasks and submits them to the queue (along with presentation requests).

How you divide these tasks up between available threads is up to you. And the more tasks you want to have, the more you will want to have these tasks parceled out by some dedicated system. And you need to track dependencies between tasks.

Considering just the render-space, UI, and submission tasks, the submission task could run on the same thread as either the render-space or UI tasks, so long as the other task already completed. What matters is that submission cannot run until both of the other tasks are done.

You could imagine a system where you have many rendering tasks. Indeed, “render the scene” could be so large a task that you want to dynamically break it up into smaller tasks so that each one could happen in its own thread. This means the submission task would wait on all of them.

The more you want to take advantage of wide CPU parallelism, the more you’re going to want a dedicated, flexible task-based system that can handle things like this. At which point, the question of how to parcel the tasks out is simple.

Thank you very much for the detailed answer, this topic has become much clearer for me.

I understood.
Apparently, I will not use a multithreaded render yet, I will limit myself to sub-passes.