Fatal error to free command buffer(s) from pool that has been destroyed or 0x0000? Crash, no layer feedback

drkif · August 31, 2019, 11:58pm

Is it possible CommandBuffer allocation has a missing error scenario / error return value? Still learning vulkan and coding app to let us test every eventuality (without crashing if possible) and I see some functions do not have a result to return as they are just assumed to work (destroyX, freeCommandBuffers). And if spec says don’t then don’t, but…

Freeing command buffers from pool handle 0 crashes without layer feedback, as does freeing from a command pool handle to a pool just destroyed. Am I right in thinking this is not actually a driver bug but a rare case of the spec advising don’t do it (instead of do it and test the result?) Spec says no, was trying to let users test destroying the command pool then freeing the buffer but I zero out the command pool handle so the crash I get is from freeing from command pool 0x00000000.

Happy to avoid doing it, but was writing an app to help teach vulkan and its annoying I can’t tell if I’ve got a really painful bug to track down in my own code. Better to disallow it and give the user an alert? (…warning what will happen in the field if you do this or any other bad pointer or divide-by-zero type fatal error situation?)

And finally I request one queue from the transfer family but try to create a command pool for the other queue family type reported to be available (but haven’t created the device with any queues of that family requested) and DO get an error BUT allocate command buffers from that pool that didn’t get created (actually a command pool family of 0x0000 since trying to create the command pool did not work I would think it would not yield a handle? if it did it’s not valid) and even though it returns a result it returns success. Instead of um which error is it now? Spec says you get success or out of host memory or out of device memory errors.

(Ubuntu, Geforce 960M 430.40, sdk is 1.1.101.0 latest available at runtime creating device is 1.1.99.0)

Alfonse_Reinheart · September 1, 2019, 2:17am

Welcome to programming in Vulkan! Enjoy your stay

It is disallowed; it’s a violation of the Valid Usage rules. Calling vkDestroyCommandPool is equivalent to calling vkFreeCommandBuffer on every CB in the pool. As such, all of those CBs no longer exist, and it is not valid usage to call vkFreeCommandBuffer on a CB that doesn’t exist.

Vulkan error codes are not for misuse of the API. Just look at the error codes that VkResult provides. Out of host/device memory, mapping failure, device lost, most of those things are cases that are beyond your control. They’re (usually) not “you did the wrong thing”; they’re mainly for “the system just couldn’t handle that”.

Error codes are for times when the system simply couldn’t processes the request. Usage errors are a matter of the valid usage rules, and are only verified by validation layers.

Validation layers not catching invalid command buffer deletion is not surprising. After all, since vkDeleteCommandPool performs the equivalent of calling vkFreeCommandBuffers on all of the command buffers for that pool, there are no actual command buffer objects left for the validation layer to key on to. It’d be like passing a random pointer to the function; the validation layer can’t tell if a pointer used to point to an object.

No, there is no “but”. Once you have violated a Valid Usage rule, you are in Undefined Behavior land. Anything and everything can happen, and there’s no specification to guide you. There can be no specification of an error state for a situation whose behavior is already undefined.

You broke the world. It can’t be broken further.

The idea with valid usage rules is this. OpenGL implementations had to spend a huge amount of time validating inputs. Time that is pointless for a properly functioning application. Vulkan puts the responsibility of validating inputs on the user (where it belongs). This means that if you break a valid usage rule, the program keeps going. And whatever consequences there are will happen to you.

Including… no (apparent) consequences whatsoever.

For example, if an implementation’s command pool/buffers genuinely do not care about the queue family they’re attached to, then it’s not going to bother looking at that value. To that implementation, all pools for all queue families are equal. So your code “works”.

But it doesn’t work for Vulkan. And that’s the point of the validation layers.

If you get a validation layer failure, that (usually) represents a bug in your program which you should fix. It doesn’t matter that your program appears to work correctly in spite of the violation. You violated the Vulkan specification, and that means undefined behavior has happened. Which you shouldn’t do.

drkif · September 1, 2019, 10:11am

Many thx Alfonse! This answers not only these questions but solves many remaining ones. Cheers.

(Was briefly relying on non-zero handle meaning it exists instead of creation/use error return values)

So basically I’m misusing vulkan in a way that can’t be handled like deref of a nullptr when no guard exists for it, or to do something with the 3rd item in an empty array. Got it. I actually want to, sort of. Thx for clarifying the difference between what validation is there for, and this (which I’m only allowing so users can try anything with a vulkan instance) but that is more like when the instructions are for driving on roads, how going off-road will lead to a near 100% reduction in roadsigns traffic certainly but also road and possibly ground. And most of those rules of the road like drive on this side become meaningless. So yes after doing something wrong leads to ‘undefined behaviour’ (some internal state is invalid), we can go no further.

90% of time learning vulkan so far was setting things up - that’s where time is taken and guards / checks exists. Run-time - not so much, thanks for further clarifying (vulkan material DOES cover that, I can only hold a couple of pages of spec in my head at once so I need the app I’m writing!)

I see now I MIGHT have to have my app launch a child test app and handle that (would rather not, better to do everything natively and emulate the crash and explain why that would fail or worse still be this corner case of not complained about but also not right).

Or I write this app a more like a layer with data / work in a layer injected between main and vulkan api, which emulates (ie doesn’t do what would crash and instead says ‘if you had tried to do that it would have crashed because’) and so plug that gap myself.

Q: Would that be something layers are there for, and that companies might well do to ensure validation to how they are using shaders or doing descriptor sets etc in development?

Handling / continuing after error is tricky. I had missed that slight distinction about implementation-specifics, layer transition meant nothing for nvidia hardware so too queue families can be a thing to some hardware and meaningless hence ignored for others.

And destroy pool automatically frees allocated command buffers, yes I should have spotted that - even api calls layer would not show the extra steps behind the scenes so I’ll have to embody that in on-screen text explaining what’s going on, and have my app work along those lines - not allow users to do that / put up a what would happen specifically for those cases.

Will have to read the spec very carefully for issues like this. (Wanted to log every user activity and every vulkan activity, so special handling is what’s needed, I’m trying to wrap and report everything so students (like me) will need these less obvious consequences highlighting). Thx

system · October 19, 2021, 1:57pm

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.