Performance hiccups, SwapBuffer stalls

Hello,
I’m having some periodic hiccups in my game that I can’t fix.
The same behaviour shows on two machines, both are running on Windows 10.
A desktop with a core i7 6700 and AMD 6600 ( driver 24.6.1) GPU.
A laptop with a core i5 and a nVidia 1050 GPU (use of dedicated GPU is forced)

I’ve read these post with similar problem, However, none of the solutions seem to work:

Here are the facts:
On the desktop computer, the game runs at ~150 fps, and is clearly GPU bound when looking at profiling data.
However, from time to time there is a nasty stuttering and screens appears to be frozen for about one frame.

Every time I have this freeze, the frame rate first drops then climbs up:

The same thing occurs on the laptop:

By using an external CPU profiler, I’ve spotted that every time there is a frame where the SwapBuffers call takes more time than average followed by a frame where the SwapBuffers is quasi instantaneous.
Here I highlighted results for three different frames: the “normal” one, the “high time” one and the “low time”

The same behaviour is visible both in Windowed and Fullscreen.
It also occurs with Vsync even if the framerate then drops between 40 and 60 fps (I thought I would obtain be a constant 60fps, but I might be wrong).

I also did some tests by adding a glFinish after the SwapBuffers call and while framerate is much lower (as expected) I still have these spikes

By forcing the game to run belows 60fps using a sleep, I still have the spikes, although less often

But while there are less spikes I still “feel” some frames are frozen from time to time, even if there is no visible surge in the timings.

I guess this is happening because sometimes the driver stalls to stay in pace with the CPU frames, however how could I mitigate this ?

Great first post! Lots of good info here. Based on your attention to detail, I’m sure you can fix it. We just need to figure out what Windoze setting or GL usage or driver behavior is hosing your performance.

Interesting. So probably not GL driver specific. More likely a global Windows setting/behavior thing or a app GL usage thing.

(Blindfold shot over the shoulder:) The first thing that comes to mind when I look at your frame time perf screenshots is this extremely annoying Windoze setting:

Power Plan

  • Control Panel → Hardware and Sound → Power Options :
    • Choose or customize a power plan = High performance

If you instead have this set to Balanced (the default), it will cause intermittently-slow frame submission behavior like what’s shown in your screenshots.

That is, CPU perf cooks along smoothly, with a consistent frame rate. But then you’ll have 1 or 2 frames that will run in slow-motion on the CPU – for apparently no reason! – potentially causing frame misses. And then magically, you’re back to the normal frame.

NOTE: This happens even on a normal desktop PC with no battery that’s plugged into the A/C line 100% of the time!. Ridiculous behavior. Thanks, Windoze! This dynamic CPU frequency control is brain-dead, at least in this case.

Given this, I’m assuming on the NVIDIA system you’ve tweaked your NV driver settings to match those described in the 2nd post … and this had no effect on this perf glitch you’ve noticed.

Your frame rate graphs are consistent with one frame taking too long and missing, allowing GPU rendering to get ahead. So your next frame doesn’t have to wait at start or end for swap chain images, and completes in less frame time than normal. Then you’re back to your baseline frame time.

Or, possibly your OS is knee-caping your CPU’s performance randomly.

Thanks for the kind words.

Yes and I also tried to play with the various AMD driver settings without success.

I was indeed in balanced mode. I had huge hopes this would be the silver bullet, but unfortunately it wasn’t :(. I even think it’s worse in high-performance mode, the hiccup is much more frequent than in balanced mode.

The game also run on Linux, I’m gonna do some tests to see how it behaves on a different OS.

Thank you.

Allright, I did some benchmarks on linux and it seems the issue isn’t there, or if it is there it’s much less noticeable.
Most of the time I obtain a stable 7+/-1 ms, with a small spike from time to time but not as pronounced as on Windows, I think I wouldn’t notice it if I hadn’t the timing data under my eyes.

Sometimes, rarely, there is a drop across for about ten frames, but I don’t think it’s related, it could be anything.

I’m not sure the comparison with Windows is fair though, as my Linux System has a lot less processes running in background.

Hmmm, ok. So there’s something else consuming CPU occasionally … on Windows anyway.

I would start by disabling anything in your application that may vary the rendering results frame-to-frame. Get it in a steady-state. See if you can reproduce the problem in that state. Make sure it’s not something your render thread is doing that’s causing it.

Then look for any other threads your app might have that might be contending with the render thread for CPU or CPU memory. For testing, disable as much heavy background work as you can.

If the problem persists, make sure you’re running in Fullscreen Exclusive mode (fullscreen + focus) with single GPU feeding the desktop. In GL, this should bypass as much of the DWM render bloat/inefficiency as possible, short of using Vulkan direct display. The DWM composition path can cause slowdowns and misses, depending on the system load and rendering demands, even if your app is doing everything right.

If the problem persists, you might run Process Explorer or Process Hacker and look for processes that are consuming non-trivial CPU, CPU memory, GPU, Disk, or Net bandwidth while your app is running. Then progressively disable more and more needless background processes, starting with those that you know are eating CPU or GPU. For instance, I can tell you one VR vendor’s software (which spawns a ton of processes and threads) eats a significant slice of GPU and CPU performance even when their system is not actively in-use! Shut down all unused junk like that. Antivirus is another good candidate to try, along with Web Browsers, E-mail clients, MSVS and all other dev/debug/profiling environments, etc.

Eventually, you’ll figure out what’s popping up and stealing the CPU, CPU memory and/or GPU from your application, possibly by pre-empting your process off the CPU to service its threads, and causing your render dispatch loop to run more slowly.

For best results, I’d suggest tracking this on a desktop (not laptop) with an NVIDIA GPU. Why: There’s less likely to be odd laptop power saving / downclocking interactions and I have no recent experience with AMD GPUs/drivers and their perf characteristics. Failing that, I’d work this on your desktop with AMD since you think it’s not dependent on the GPU or GL driver.

I was about to tell you I had already disabled all my threads until I noticed one that I forgot: I have a file system watching thread that periodically looks for changes in some folders that I forgot to disable… I disabled it and TAH-DAH, everything is fine now. I wasn’t expecting it to have such an impact, yet it is.

I feel dumb now, I should have double checked all my threads.

Thank you for your time.

2 Likes

Cool! Nice find!

So, care to share what OS API(s) this “file system watching thread” is calling? Is it one of these?:

Is it a thread you wrote?

Why I ask: I dev on WIndows at work and also dev real-time GPU rendering apps. So I’m very interested in what you found that exhibits this behavior!

And I’m sure I’m not alone! Others will find this thread in the future and want to know all the details!

Thanks!

Nope, that’s not a thread from me :slight_smile:

I use a very basic implementation, not relying on platform specific code.

Ah! I see. So it polls everything recursively in an entire subdirectory, with:

calling last_write_time() on every file, and exists() on files that it’s seen before. Then it sleeps for a few msec before doing it all again.

Yeah, I can see how that might hit your performance. :slight_smile: Lots of file I/O kernel calls + disk/cache I/O in one tightly clustered burst.