Vulkan property enumeration: Should we loop and, if so, should we delay?

The Vulkan API offers us various functions to enumerate properties, for example vkEnumerateInstanceLayerProperties which returns global layer properties. According to the official documentation the result of available layers may vary from one call to the next. Quote:

The list of available layers may change at any time due to actions outside of the Vulkan implementation, so two calls to vkEnumerateInstanceLayerProperties with the same parameters may return different results, or retrieve different pPropertyCount values or pProperties contents. Once an instance has been created, the layers enabled for that instance will continue to be enabled and valid for the lifetime of that instance, even if some of them become unavailable for future instances.

As I understand it, we first request the property count, we then allocate memory to retrieve the properties in, and we then request the properties with given count and pointer to allocated memory. In case the amount of properties changed, I take it we should redo the entire process, correct?

I looked at a lot of example code all over the Internet, yet nowhere do I see any change checking and looping performed. Should we indeed loop or just assume nothing changes, eventhough the documentation explicitly warns about it? If we should be looping, should we then also include a loop delay for like 10ms orso to give the system time to get things in order? If we should be looping, is there an advised number of attempts or a certain time constraint?

To clarify my thought process, here’s one of my plain-C helper functions (to keep it short, return results have been simplified):

int vkuGetInstanceLayerProperties(uint32_t* pPropertyCount, VkLayerProperties** pProperties)
{
    VkLayerProperties* props;
    VkResult           err;
    uint32_t           count1;
    uint32_t           count2;
    uint32_t           retries = 2; /* Advised number of attempts? Should we even loop? */

    while (1)
    {
        err = vkEnumerateInstanceLayerProperties(&count1, NULL);
        if (VK_SUCCESS != err)
        {
            return -1;
        }

        if (!(props = calloc((size_t)count1, sizeof(VkLayerProperties))))
        {
            return -1;
        }

        count2 = count1;
        err = vkEnumerateInstanceLayerProperties(&count2, props);
    
        if (VK_INCOMPLETE == err || (VK_SUCCESS == err && count1 > count2))
        {
            free(props);

            if (retries--)
            {
                /* Should we delay code execution here, like Sleep(20)? If so, any advised duration? */
                continue;
            }

            return -1;
        }
        else if (VK_SUCCESS != err)
        {
            return -1;
        }
        
        break;
    };

    *pPropertyCount = count1;
    *pProperties    = props;
    return 0;
}

Really, I don’t see the point in bothering. After all, if a layer was available the first time, but the second time it wouldn’t be, you will only care if you’re using that layer. And you will find out that this happens when you try to create an instance that uses that layer.

If you feel the need to loop, then that is where your loop needs to be. After all, even your above looping code doesn’t guarantee that the layer will be available by the time the instance is created.

Thank you for the reply. I am actually more concerned about the other situation, where more (use-case required) layers become available. But yes, you’re right in that if one wanted to loop, such loop-check should be relocated further into the code. The above just served as a short example to express additional explanation to my words.

As others (on other boards) have mentioned there usually is no need for looping to begin with, as device changes should be exceptionally rare between two rapidly succeeding enum-calls anyway. However, a) the documentation explicitly warns about it, and b) one would be making quite an assumption there about the applicable hardware and about the reasons why something might change - the official documentation makes no statements about that. I assume Vulkan isn’t supposed to be limited only to run on ye typical PC/console components as we know them today.

While the documentation says nothing about current and future devices, be they physical or virtual, I can imagine there are or could be devices in a suspended state until they receive some signal. Such devices could wakeup further suspended devices upon their activation through such signal. This is turn would lead to an initialization process of unknown duration during which the amount of layers and extensions might be changing. In this example, which doesn’t seem all that odd of a scenario in nowadays eco-focus and power-limited mobile devices, changing results suddenly isn’t so exceptionally rare anymore.

Perhaps I’m just overthinking it, I’m not sure, but I have the feeling it should not simply be ignored, especially since the specification explicitly warns about it and since there’s a vast array of use-cases on a vast array of varying hardware nowadays, leave alone in the future. But nevermind I suppose I’m just taking it too far…

Well I happen to loop in such cases:

(seeing your code I should resize based on count1 > count2 too)

Only the extreme of extreme corner cases would trigger it. I mean what are the chances the exact layer you need gets installed in the exact time between the two commands that would trigger the VK_INCOMPLETE. Besides, the layer could get uninstalled between the querry and instance creation anyway even if you robustly loop, so you would need another retry loop if it fails.

You should not need to retry on count1 > count2. You already got all available layers (but some got uninstalled between the calls).

Delay should not be needed. If it returns VK_INCOMPLETE it means the new internal state is already updated.

Sometimes there’s some actual spec guarantee the result won’t change, and so one query should be enough.

But there’s no harm in doing so. It is as easy as changing if -> while. And it hurts nothing.

It is not really specific to VK_INCOMPLETE. Same would apply to e.g. VK_ERROR_OUT_OF_HOST_MEMORY. Though if you are lazy as me, then you simply write “tough luck user” to log and quit app.

Thank you for your reply, it helps on bettering my understanding.

The reason I performed that count1 > count2 check upon success is that it signals there were changes in the system (some layer was removed), meaning something was going on, and meaning more might be due. Hence also the idea of imposing a delay to give the system some time. Also, if changes had finished it would make little sense to return a too largely allocated memory block. While the general use-case would be to quickly free the memory afterwards, someone for some reason might keep the data around during execution. I believe such generic utility functions shouldn’t be making assumptions on what the API-client will do with returned data, but instead be reliable and return the smallest result possible, esp. so for (embedded) devices with limited resources. One extra reallocation during initialization time, which takes relatively long anyway, hurts the least in such case.

In your linked code, a nice approach on your device enumeration btw (while( errorCode == VK_INCOMPLETE )). To prevent the risk of potential process or even system hanging I would certainly include some form of an exit condition there though. A bit as I mentioned in my OP already, perhaps a number of max. attempts or some timeout. I’ve seen too many silly hardware and software issues to be trusting anything but my own code stability when it comes to hardware and/or driver communication. The calloc in my code, rather than malloc, is a subtle hint of that.

Thanks, though I uglyfied it in it the next update into a template so it can be reused (to apply DRY principle). :stuck_out_tongue:
There’s no point to introduce exit condition.
If the driver wanted to hang you, it would simply never return from that command. It would not need to make such an elaborate trap.
You never have a choice but to trust an API