Best way to render with viewfrustum-culling?

Twanks123 · June 1, 2016, 12:36pm

Hello guys, me again^^

So i had another thread (Recording Command Buffers every frame slow as hell - Vulkan - Khronos Forums) where i had the problem that recording command buffers before submitting them to the queue make my rendering engine pretty slow. (Because i implemented view-frustum culling, i have to rebuild them continously. Before that i always prebuilded them)
My solution to that was: Record them only in a fixed times interval (for me 60x per second) which gave me a reasonable result (i loose like 0.3ms) but im not really happy with this approach.
Because i have one “gigantic” Command-Buffer for each swapchain-image, i cant do things like multi-threaded command-buffer generation in the future if i want.

So i thought i do it like Sascha Willems in this example: https://github.com/SaschaWillems/Vulkan/blob/master/multithreading/multithreading.cpp
He do it in this way: He has one (secondary) command buffer for each object and “execute only those who are in the view-frustum” every frame.

I implemented the same approach in my engine (with a small optimization) and this is a lot slower than recording the cmds in a fixed time interval.
Besides that if i have to many objects, allocation of new command buffers throw the error “Out of Host Memory” and having thousands of (secondary) command buffers seems pretty weird and not intended from the vulkan developers to me.

So what do you think? How should i implement view-frustum culling? Do you have other ideas what i can do?
Otherwise i will stick with the fixed time interval rerecording

Thanks in advance

krOoze · June 1, 2016, 1:27pm

Hi again,

I am just gonna assume you have a reason not to use Vulkan culling and clipping.

Thousands CBs does sound excessive as a replacement for re-recording one lil innocent CB.
BTW are you sure now the recording is the problem, and not your culling code?
BTW2 how do you feed it with the data? Do you by any chance push it (the visible objects) every frame (and then delete them again)?

Maybe the drawIndirect could be the good solution (supplying it 0 vertices in cull case through a memory). Or better yet multi DrawIndirect if supported.

Twanks123 · June 1, 2016, 1:59pm

[QUOTE=krOoze;40324]Hi again,

I am just gonna assume you have a reason not to use Vulkan culling and clipping.

Thousands CBs does sound excessive as a replacement for re-recording one lil innocent CB.
BTW are you sure now the recording is the problem, and not your culling code?
BTW2 how do you feed it with the data? Do you by any chance push it (the visible objects) every frame (and then delete them again)?

Maybe the drawIndirect could be the good solution (supplying it 0 vertices in cull case through a memory). Or better yet multi DrawIndirect if supported.[/QUOTE]

I’m a poor little student and my professor wants Viewfrustum-Culling, that’s why i implemented it
The culling code is definitely not the problem.
I dont know exactly what u mean with ur second question, can u explain it a bit more? thanks

krOoze · June 1, 2016, 2:21pm

I see you enjoy masochism too and going straight for Vulkan instead of copy-pasting some invented “wheel” (in OGL perhaps).

Not a problem as in how many ms of the cpu time for that render loop iteration?

My second question assumes you are going somewhere with the culling. E.g. not having to push some data to the gpu. What I am asking is, if you are streaming your 4k textures or whatever naively to your GPU each frame (because you do not know which will be visible. I assume some action has to be done per frame and not in setup). That’s something I would do in prototype, so just asking to be safe.

Maybe seeing some of your code or pseudocode or scheme (of your render loop) could be beneficial to the discussion.

Twanks123 · June 1, 2016, 2:52pm

[ATTACH]110[/ATTACH]

This is my main loop. I update the application in a fixed time interval, which updates currently objects in the SceneGraph, the Input System and the uniforms from the current used shader, do the view-frustum culling and rebuild the command buffers. The renderer.draw() method just submit the appropriate Command-Buffer and present the image to the presentation engine.

Performance Check:
(VS 2015, Release Mode, Disabled debug layers)
Viewfrustum Culling Enabled (with CB-Rebuilding) : 0.33 - 0.34ms / Frame
Viewfrustum Culling Disabled (fully prebuilded CB) : 0.315 - 0.325ms / Frame

(VS 2015, Debug Mode, Enabled debug layers)
Viewfrustum Culling Enabled (with CB-Rebuilding) : 0.75 - 0.8ms / Frame
Viewfrustum Culling Disabled (fully prebuilded CB) : 0.35 - 0.4ms / Frame

Alfonse_Reinheart · June 1, 2016, 3:38pm

Never post code as images.

krOoze · June 1, 2016, 4:05pm

I meant the update() and draw() of yours, which are indicative of resulting performance.

You could measure those two separately too (could be interesting to see, how much you over-spend on update() and spared on draw() by this culling).