Issues with trying to increase performance with multithreaded recording

So I recently made the switch from OpenGL to Vulkan and have started working on a renderer for a game engine I’m building. I got to a point where I could draw a bunch of stuff while having relatively good frame time, so I tried increasing the number of things I was drawing from 3 to 3000 and performance dropped from ~1000 fps to about ~23 fps.
I should also mention that my command buffers are recorded every frame (to account for change in push constants/new entities being added to the world). Anyway I saw some articles saying that if you record commands to multiple secondary command buffers in separate threads, you can increase your performance (at least that’s the message I got). So I implemented a thread pool system that would direct separate “jobs” to individual threads, and evenly distributed the draw commands to these threads.
Each thread’s job was to record X amount of draw commands to its own secondary command buffer. After setting all of this up I found that my frame rates turned out to be a lot more inconsistent compared to when I was recording with a single thread, for example a single thread would consistently draw X objects at around ~240 fps while when the work was split up into multiple threads each frame would take varying amounts of time (3 frames would have framerates of 150, 220, 431fps ), and the more threads I sent jobs too, the slower the framerate would be.
All of my mesh data is stored in a single vertex buffer, and all indices are stored in a single index buffer.
All draw calls draw meshes using offsets into the vertex/index buffer so I don’t have to rebind them when I switch between entities with different models.
Have I understood something wrong or do I need to look for a way to speed up my threading?

Personally I think this is an issue with the way I set up my thread pool, however I also feel like I have misunderstood some concepts of Vulkan as well

Drawing process (per frame)

  • Begin primary command buffer
  • Begin render pass
  • Start threads that record the secondary buffers
    { - Inside each recording thread:
  • Begin secondary command buffer
  • Bind graphics pipeline
  • Bind VBO
  • Bind IBO
  • Draw X entities
  • End Recording
    }
  • Wait for the buffers to finish recording
  • Execute Secondary command buffers
  • End Render Pass
  • End Primary Command buffer
  • Submit Primary Command buffer
  • Wait for frame to finish drawing
  • Reset Command Pools

Of course, all command buffers are recorded for the next frame, while the current frame is being displayed.

I am still relatively new to Vulkan so I’m not 100% familiar with all of the concepts involved, any help would be greatly appreciated thanks!!!

Hey there!

I’m also experimenting with parallelization in Vulkan. It’s a tricky business indeed.
From what I’ve read you are doing a lot of stuff right, like using a threadpool.
I can’t really point out what your problem is, however I will leave my collection of links for that topic here:

First of all, pleas watch Adam Sawicki’s video.
DD2018: Adam Sawicki - Porting your engine to Vulkan or DX12
https://youtu.be/6NWfznwFnMs?t=33

He also suggests to implement a framegraph (or rendergraph) as a generic solution for a Vulkan renderer. I can only recommend this. A graph-based abstraction of the entire rendering procedure fits Vulkan nicely.

Other links:
https://community.arm.com/developer/tools-software/graphics/b/blog/posts/multi-threading-in-vulkan

https://developer.nvidia.com/blog/vulkan-dos-donts/

Common mistakes when using Vulkan API:

http://32ipi028l5q82yhj72224m8j.wpengine.netdna-cdn.com/wp-content/uploads/2016/05/Most-common-mistakes-in-Vulkan-apps.pdf

1 Like

I found a few issues with my thread pool, mostly related to how threads started and how I waited for them to finish. I have worked out these problems, however my threads are still taking an unexpected amount of time to finish their work, could this be due to how the thread accesses data provided to it (yes this is more of a c++ question). When a job is given to the thread pool, it is supplied with a pointer to the command list, the index into the command list that the target buffer is located, and how many draw commands to record. Could data access be slowing my threads down?

Hi
I think this question is difficult to answer without a code snippet from you.
In general, load balancing in a threadpool is a very challenging task. I am currently experimenting with using taskflow library for this. It has a state of the art work-stealing queue build into it (based on some research papers from 2015) and a nice API in general.

I can only recommend it for this.

Furthermore, you should think of some way of visualization of your jobs. (see for example GCAP 2016: Parallel Game Engine Design - Brooke Hodgman - YouTube) Then you can see what is taking so long. Taskflow also has a build in profiler for this (see GitHub page).

best regards,
Johannes

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.