vkQueuePresentKHR CPU busy-loops

vkQueuePresentKHR blocks, as expected (FIFO, 2 image, vsync required), but on all the OS’s and hardware (all NVidia) i have access to (which is very limited) - it does that by busy-looping the CPU to hell. Not yielding to other threads/processes (the only sane option) nor using CPU friendly wait instructions (like what idle process does in Windows) just plain dumb polling loop. This makes Vulkan unusable.

Which begs the question - what am i doing wrong? As evidently people do use vulkan. I must be missing something.

So far i have used a horrid-hack-workaround by setting an event after present related rendering is done and wait for it at CPU side before proceeding to call vkQueuePresentKHR. Unfortunately, for reasons i can not fathom, Vulkan does not support waiting for events at CPU side. Which makes the hack particularly ugly and unreliable.

Current flow:

Collect data for rendering - among other things it will vkAcquireNextImageKHR setting wImageAvailable.

vkQueueSubmit that contains:

  • early pathes of heavy work
  • presentable patch that waits for wImageAvailable and signals wRenderFinished
  • more pathes or other work - where the very first thing is signalling an event hackCanPresent

My workaround hack: polling hackCanPresent and guesstimating reasonable delay for SetWaitableTimer/WaitForSingleObject (which does not kill the CPU).

vkQueuePresentKHR that waits for wRenderFinished.

How can i fix this?

You can test events on the CPU with vkGetEventStatus. You can’t block on an event, but you can detect it.

But if you want to use CPU mutex-like behavior, the thing you ought to be waiting on is a fence. For example, you could wait on the fence from acquiring the image.

Yes, that is what i meant.

So, waiting on fences works properly? Their uses are limited (whole submit only) - but at this point i am willing to do some acrobatics to get this long ignored niggle fixed.

I use infinite timeout for acquiring an image - it never blocks (and if it does then i assume it does the wait properly). I have wondered whether it just returns immediately what it knows to become available next and postpones the actual acquisition. I doubt it. So, i doubt using a fence from that would be of any use. Right?

Using a fence on the whole queue submit would be an option, but would need me to chop it into two submissions (to not include the trailing patches that occasionally are quite time heavy - mostly streaming in new/changed stuff that i likely need in near or immediate future). I will try that, hopefully some time today.

edit: hmm, since the vsync is the root of the issue - i do not actually see hat use any fence would be.

What I have is a semaphore from AcquireNextImage(signal) to QueueSubmit(wait), where I have a fence for the command buffers to wait for themselves from past QueueSubmit to complete and then a semaphore from the QueueSubmit(signal) to QueuePresent(wait)… Not sure if it is right, but it seems to work and not spinlock the CPU (I think thats a problem with nVidia drivers, they were spinlocking even for v-sync in openGL for me)

I can not remember anything in Vulkan that can wait on a fence - i assume you are doing the wait somewhere?

I was not able to parse what you said :/. Could you elaborate, in a timeline form, what you are doing? Are you on NVidia (the problem, AFAIK, seems to be indeed NVidia specific based on similar complaints found on the web)?

I can not see any way any of the synchronization tools that are available in Vulkan can solve this problem (vsync causing spin-lock in vkQueuePresentKHR). When i concocted my workaround for the base Vulkan framework, some YEARS, ago i assumed it was some driver bug or me being dumb, put it has never been fixed and i have not found a way to un-dumb myself on this issue either. This is not a tiny corner case inconvenience - it is a major bug that should affect essentially everyone (using NVidia at least). That is a tough pill to swallow - therefore i am inclined to assume i am being dumb…

So, please, someone - un-dumb me. How do you make Vulkan usable without ugly hacks?

A few clarifying words on the workaround i use:
I measure frame time and consume the unneeded part with SetWaitableTimer/WaitForSingleObject leaving only a guesstimated safety margin for timer inaccuracies and for vkQueuePresentKHR to do its job in time. Previously mentioned event is used only as a debug helper (Cannot wait on it to consume some of the time as Vulkan does not support waiting on events. Using a fence, as suggested, would help a little, but cannot go the whole way - not even close - i can not, nor wish to, saturate my etalon/target GPU).

edit: Worth to mention, my drivers are of course up to date. Last checked a few days ago (currently installed release was from a few weeks ago).

If that were true… what would be the point of the semaphore/fence it signals? The function returns when it knows what image will become available, not when it actually is available.

Since you’re using double-buffered FIFO (for… some reason) if you just acquired and presented image X, then the image that will be available next is image Y. So there’s no reason for vkAcquireNextImage to wait on anything.

The semaphore/fence will be signaled when the image is actually acquired. And it can only be acquired when the presentation image is finished presenting it. Which means that vsync will have passed.

No, it only affects people who are using double-buffered FIFO. Personally, if an implementation makes triple-buffered mailbox an option, I don’t see a good reason to use FIFO. And if you can’t get mailbox, it’s probably best to triple buffer when using FIFO, assuming your frame time nearly matches vsync.

Options i have tried and used at various times:

  • FIFO, 2 images (if i feel restrict)
  • FIFO, 3 images (to better handle minor GPU over-utilization)
  • FIFO RELAXED, 2-3 images (if i do not particularly care about tearing)
  • MAILBOX, 3 images (if it is winter outside, see note)

note: The last option currently has the problem that i have nothing in place to prevent it from endlessly re-rendering (great if image latency is your primary concern - which i could not care less about). So, not only will it burn the CPU, it also does the same for GPU - which is way worse than the problem i started width.

Can i somehow use a fence from vkAcquireNextImage to prevent unnecessary re-render? I can not think of any way :/. What else could i do to make MAILBOX useful?

If you want to throttle the GPU based on some time, you can just not submit more stuff to it. You can look at a timer and decide that not enough time has passed to bother doing a render loop and instead do something else. Granted, I prefer to avoid mutexes whenever possible and maintain control over when tasks stop and start.

But if you’re looking for an OS-governed way to do that, be aware that such a thing may not actually exist. I don’t know how vsync gets implemented, but I would assume that it is implemented as an interrupt, not a OS synchronization object. And if it is a synchronization object, it would probably be exposed either by the present command or by waiting on the fence from vkAcquireNextImage, both in FIFO mode.

Note that NVIDIA themselves suggest waiting on the fence from acquire.

vkAcquireNextImageKHR has the option:

now I don’t use it because in all the examples and info I have seen it was communicated that it is not necessary and that makes good sense to me if one uses the semaphore option.

but I do use a fence for the command buffer and vkQueueSubmit.
you could always generate new command buffers (ideally releasing them eventually when safe or reset the pool) but it has only helped me in the past when I did mess with a buffer that was still in use. it’s the only fence I use so I don’t see it as a big deal nor is it the thing that would slow down or block the render loop in any remarkable way.

maybe the hack causes a problem which might be solvable another way.

then again I don’t know your code and the need for events - my own workflow is so simple enough that I don’t require more synchronization for the render pass. I pretty much stick to the acquire - submit - present with triple buffering. the only thing I had to pay attention to is that I don’t work with the wrong indices.

here is one of a few tutorials out there (with focus on render loop): Mainloop Code - Vulkan Guide

or this one: Introduction - Vulkan Tutorial has code too.

last but not least, did the validation layer reveal anything interesting. it’s always worth a look, frankly, it saved me many times…