OpenGL -> Vulkan transition. Design considerations

Greetings,

I’m tasked with porting an engine form GLES to Vulkan.

Can you tell me if there is a list somewhere of vulkan functions that are essential for harvesting the performance benefits the api promises that do not trivially map to OpenGL functions and thus would likely trigger a major engine redesign (assuming the engine is designed around OpenGL)?

My first ideas would be command lists and pipeline objects. Anything else?

Regards.

Vulkan development is not so simplistic as “use this function to get performance”. It’s a matter of how you use the API and how you can’t use OpenGL.

If your program is a high-performance application that’s blocked on CPU performance, then the principle way Vulkan alleviates this is by allowing you to thread the building of command buffers. Doing so efficiently basically requires “a major engine redesign”.

Oh, it’s easy enough to throw all of your rendering into a single thread separate from the rest of your program. But that’s not scalable; it doesn’t perform better if the user has 8 cores rather than 4. If you want your engine’s CPU performance to be scalable, then you will need to take advantage of threading.

I mean, you can probably get something out of Vulkan by doing a 1:1 mapping. At the very least, you can maybe lower your CPU burden and possibly extend battery life or something. But if you’re really interested in performance, you’re going to have to work for it.

Ok, so the way to go is distributing the command buffer creation over multiple threads, but:

Is command buffer building really so expensive? They are usually preallocated so pushing a command shouldn’t be much more expensive than pushing a small object into a preallocated std::vector. How many commands are we talking about that distributing command generation makes sense?

Or is there a certain function that is related to command buffer generation that takes the bulk of the load?

Hi,

you should watch this:

best regards,
Johannes

1 Like

For the kinds of applications that, under OpenGL, can saturate a CPU from its rendering commands, yes. Thousands of state changes for drawing different objects, thousands of drawing commands, etc. And that doesn’t even mention the time it takes to walk your scene graph (possibly several times) to retrieve the information it takes to actually get said data.

Being able to thread this work across multiple cores is helpful.

Again, that’s the wrong question. The right question is this: is your program’s performance blocked on the CPU?

Yes, it is. The bulk of the load is in glDrawArrays calls mostly in functions that track state changes.

You don’t happen to know how expensive several drawcalls with only stencil ref value changing via vkDynamicState are compared to several drawcalls with glDrawArrays with only stencil ref value changing are, do you?

Then threading your rendering system will help alleviate that bottleneck (along with Vulkan just being a better API).

The reason why I’m confused is that, in theory, Vulkan would still have to assemble the several command buffers into one sequence, then evaluate the parameters for each drawcall. Wouldn’t this last step necessitate all the expensive checks OpenGL does?

No, it wouldn’t. First, Vulkan does exactly zero validity checks. When you do something wrong in OpenGL, you get an error telling you that the command failed. Vulkan doesn’t do that; if you do something wrong, without validation layers telling you otherwise, you could get all kinds of problems, up to and including GPU resets.

Second, the costs associated with submitting command buffers are intended to be as minimal as possible. Not trivial, but nothing like OpenGL’s overhead. In the best case, all the submit call has to do is to write a few tokens into the GPU’s command queue, pointing to where to look for command data. That data having already been assembled into the form the GPU can read from directly.

In less perfect cases, any fix-up to the command buffer data would just be resolving a few GPU pointers or somesuch. If CB data needs to be copied into command queue memory or something, then the copy can happen with a fast copy or DMA. And so forth.

There are many more optimal ways to submit commands when you are the driver than when you’re living outside of the driver. That’s why the driver is responsible for translating your vkCmd* functions into GPU-readable tokens, and then shoving those GPU-readable tokens into the GPU’s command queue.

I see.

Would you mind answering the question regarding stencil / vkDynamicState problem I posted before? That would be a concrete scenario I’m interested in.