Performance issue when swapping buffers

At the end of my render code I call glfwSwapBuffers(), it is a GLFW function which swaps back/front frame buffer. I surround that call with my performance measuring code:

performance.start();
GL11.glSwapBuffers();
performance.end();

My update_interval is 0.008333333 sec, which is 1/120 sec (120 FPS). Once per minute, it takes about 10*update_interval to glSwapBuffers() to unblock.
My vsync is off via glfwSwapInterval(0). Follow up question, how to check if my vsync is really off?

Can anyone please give me an idea why is this happening, if you need more information I will provide it. Please help, thanks.

Your first question was implicit. So let me clarify what I think your questions are:

  1. Why when I “think” I’ve got VSync OFF is the call to SwapBuffers() taking 10 * 8.33 msec to return once per minute?
  2. How do I check that VSync is really OFF?

Is that pretty close?

And more info would be good.

  1. What are you ultimately trying to determine with this test?
    (As-is, it’s probably not doing that.)
  2. What OS, GPU, and GL driver are you running on?
  3. By “my update_interval is …120 FPS”, do you mean your display scan-out rate is 120Hz? Or some internal CPU timer you’re running is set for 120Hz?
  4. So you’re saying that calling SwapBuffers() is occasionally blocking for 10 * 8.333 msec once per minute?
  5. And during this period you think you have VSync OFF?
  6. How long does it take to call and return from Swapbuffers() on all of the other frames in that minute?
  7. Is the OS window you’re drawing to full-screen exclusive or no?
  8. What are your driver settings related to frame queuing, swap chain config, and display synchronization?
    (e.g. Vertical Sync, Fixed Refresh display, Max Refresh Rate, Preferred Refresh Rate, Power Management Mode, Low Latency Mode, Triple Buffering, etc.)

I’ll wait for your answers to offer much in terms of advice.

But even before then, you need to know that measuring the CPU time required to merely queue SwapBuffers() alone really isn’t that useful. And the timings you get are going to be display mode, performance, driver config, driver, and even OS specific. Which is what leads me to believe you’re not doing what you ultimately want to be doing here.

Is that pretty close?

Yes.

What are you ultimately trying to determine with this test? (As-is, it’s probably not doing that.)

Stop the lag, I have it for a looong time and I am trying to find what is the main cause.

What OS, GPU, and GL driver are you running on?

OS - Windows 10 x64
GPU - I tested on multiple PCs and it has same problem, it’s very simple 2D game
GL - http://i.imgur.com/V6gA7jU.png this is from GPU-Z tool (this site won’t let me post links)

By “my update_interval is …120 FPS”, do you mean your display scan-out rate is 120Hz? Or some internal CPU timer you’re running is set for 120Hz?

No not monitor related, I am just updating game logic (calling update() in game loop) at the speed of 120Hz.

So you’re saying that calling SwapBuffers() is occasionally blocking for 10 * 8.333 msec once per minute?

Yes.

And during this period you think you have VSync OFF?

I learned that swapBuffers() is waiting for monitor refresh rate so I disabled it, but not sure how to confirm this.

How long does it take to call and return from Swapbuffers() on all of the other frames in that minute?
This is printed once per second. Way less then 8.333 msec…

glSwapBuffers() : 0.000065100 sec
glSwapBuffers() : 0.000037700 sec
glSwapBuffers() : 0.000050000 sec
glSwapBuffers() : 0.000045800 sec
glSwapBuffers() : 0.000070200 sec
glSwapBuffers() : 0.000035000 sec
glSwapBuffers() : 0.000035400 sec
glSwapBuffers() : 0.000055300 sec
glSwapBuffers() : 0.000040100 sec
glSwapBuffers() : 0.000033000 sec
glSwapBuffers() : 0.000040300 sec
glSwapBuffers() : 0.000035600 sec
glSwapBuffers() : 0.000041100 sec
glSwapBuffers() : 0.000049900 sec
glSwapBuffers() : 0.000051000 sec
glSwapBuffers() : 0.000054100 sec
glSwapBuffers() : 0.000051800 sec
glSwapBuffers() : 0.000116600 sec
glSwapBuffers() : 0.000041200 sec

And when the lag happens:
skipped frames : 6
glSwapBuffers() : 0.065856200 sec

Is the OS window you’re drawing to full-screen exclusive or no?

It’s not fullscreen, windowed mode always.

What are your driver settings related to frame queuing, swap chain config, and display synchronization? (e.g. Vertical Sync, Fixed Refresh display, Max Refresh Rate, Preferred Refresh Rate, Power Management Mode, Low Latency Mode, Triple Buffering, etc.)

Hey I had this problem since I started my game/engine more then two years ago. I think I fixed it yesterday. I tested game now and I have best performance ever. This was problem:

GL11.glEnable(GL11.GL_TEXTURE_2D);

I took advice from here:

“One thing that you may see in some web pages is that the first thing we must do is enable the textures in our OpenGL context by calling glEnable(GL_TEXTURE_2D). This is true if you are using the fixed-function pipeline. Since we are using GLSL shaders it is not required anymore.”

I will still focus on trying to understand this problem for the next month or more, I want to understand the problem better.

Ok.

Ok. Since this is just some app CPU update rate, then it probably doesn’t matter here.

Time your entire frameloop, from just after this point:

SwapBuffers()
glClear( window framebuffer )
<STOP TIMER FOR FRAME N>
<START TIMER FOR FRAME N+1>

If the measured “frame times” have nothing to do with the GPU display rate (the rate at which the GPU is scanning out images to the monitor), then you have VSync OFF.

Moreover, this “frame time” metric is a MUCH better time to capture and profile with than merely how long your app blocks in the driver while queueing SwapBuffers() alone. Please collect these times and report back what you see.

It could very well be that the slowdown you occasionally see with queueing SwapBuffers() is just the graphics driver “catching up” on a surge of GL work that you’ve given it to process in that frame, or in a very recent frame. Things like texture or buffer uploads, mipmap regeneration, etc.

Ok, so if your CPU frame queuing is blocking at some point normally, it not blocking here. Except for…

So suspiciously close (within 1.5%) of 66.666 msec, which is 4 VBlank periods at 60Hz or 8 VBlank periods at 120Hz.

Welcome to Windows DWM.

In windowed mode, the image your app obtains and renders to isn’t directly scanned out to the display. It is obtained from and then routed through DWM (Desktop Window Manager) and then composited on the windows desktop with all of the other app windows using DWM’s internal VSync rendering. Only at the tail end of that process do you finally get to see your image. With windowed mode, you’re much more at the mercy of DWM queuing and compositing behavior. There are also graphics driver options that you cannot make use of when in windowed mode.

In full-screen exclusive mode however, your rendering takes a fast-path through the display system and graphics driver, and you can make use of some graphics driver options that are only allowed in full-screen exclusive mode (aka flip mode). This is a more efficient, lower overhead path through the display system.

For comparison purposes, I would definitely recommend testing with full-screen exclusive mode as well to see if it has any bearing on the blocking behavior you’re occasionally observing with queueing SwapBuffers(). While it could be that you’ve just given the driver more work to process in that (or an immediately prior) frame that it’s catching up on, the stall might just be occasional latency with DWM in providing your app an image to render to.

Your problem could also be explained by some process preemption, where either your app and/or DWM isn’t getting all the CPU cycles it would otherwise for some reason.

It’s spam prevention for new users. Keep using the site, and the ability to post URLs and images will be automatically unlocked for you. For now I’ve fixed up your image links so that they work.

Suggested changes:

  • Power management mode = Prefer Maximum Performance
    (I have hit and debugged performance issues with NVIDIA graphics driver power management dynamically throttling back the GPU clock rates/power. This can lead to annoying, periodic frame time spike behavior like you describe here. Prefer Maximum Performance is most likely to avoid that annoying behavior.)
  • Low Latency Mode = Ultra
    (This probably doesn’t make any difference in windowed mode, but in fullscreen, this helps reduce the amount of queue-ahead the driver does and gives you lower latency between CPU queuing and GPU execution. Essentially, it shortens the length of the “rope” between the CPU and the GPU.)

Are you saying that adding glEnable( GL_TEXTURE_2D ) got rid of your occasional ~64 msec lag in calling SwapBuffers()?

Another thing you might try is plugging in a glDebugMessageCallback() function and printing out the messages the NVIDIA driver is feeding back to your application. It could be you are using the driver inefficiently, and it’s trying to tell you that (as well as possibly how to fix it). For copy/paste code, see:

Thank you for all this information. I will research this in the next weeks.

Moreover, this “frame time” metric is a MUCH better time to capture and profile with than merely how long your app blocks in the driver while queuing SwapBuffers() alone. Please collect these times and report back what you see.

These are not consecutive frames.

// data for one frame
update+render : 0.000302500 sec // this is what you asked for, game logic time + frame rendering time
SwapBuffers() : 0.000038400 sec
update interval : 0.008333333 sec // doesn’t change, just for reference
// end

update+render : 0.000386299 sec
SwapBuffers() : 0.000060201 sec
update interval : 0.008333333 sec

update+render : 0.000213700 sec
SwapBuffers() : 0.000033900 sec
update interval : 0.008333333 sec

update+render : 0.000410000 sec
SwapBuffers() : 0.000142100 sec
update interval : 0.008333333 sec

update+render : 0.000526900 sec
SwapBuffers() : 0.000129500 sec
update interval : 0.008333333 sec

update+render : 0.000237899 sec
SwapBuffers() : 0.000038500 sec
update interval : 0.008333333 sec

update+render : 0.000236199 sec
SwapBuffers() : 0.000036000 sec
update interval : 0.008333333 sec

update+render : 0.000270500 sec
SwapBuffers() : 0.000046501 sec
update interval : 0.008333333 sec

It could very well be that the slowdown you occasionally see with queuing SwapBuffers() is just the graphics driver “catching up” on a surge of GL work that you’ve given it to process in that frame, or in a very recent frame. Things like texture or buffer uploads, mipmap regeneration, etc.

My game is currently very simple, I think it’s not this, not sure… See gameplay gif below:

http://gfycat.com/forsakengreedyflicker

It’s old gif, but game still looks the same, different game logic only.

Are you saying that adding glEnable( GL_TEXTURE_2D ) got rid of your occasional ~64 msec lag in calling SwapBuffers()?

Sorry didn’t write correctly in my post. Wanted to say that REMOVING that line got rid off “occasional ~64 msec lag in calling SwapBuffers()”. But unfortunately that was a lie. Lag is still here, for some reason it just delayed first lag, but after that it happens once per minute, sometimes ~3 times per minute, so same as before.

Sure thing. And I’ll be interested to see the results of the other suggested tests, as you have time to run them.

One other thing related to NVIDIA Control Panel settings. I would set:

  • Threaded Optimization = OFF
    (Have seen the multithreaded driver cause odd perf problems.)

That’s actually not what I suggested (apologies if I was unclear). We want the total time elapsed from just after SwapBuffers() + glClear() of the window to the exact same point just after the next SwapBuffers() + glClear() of the window.

The glClear() of the window is important.

Ok, so fractions of a millisecond normally required on the CPU for the app to queue the work required for a single frame. We’re missing the glClear here though.

Note that in general, that doesn’t say anything about the time required on the GPU and in the back-end driver and display system to actually execute this queued work and provide you another window swap chain image to render to.

That’s part of the reason for measuring the full CPU frame time, as described above.

Why does this keep coming up? You’re running VSync OFF here. And as a developer, most of the point of that is to time how much time your entire frameloop takes, without any idle waiting time.

You’re not doing manual sleeps or waits in your frameloop to pace it, are you? If so, remove them, run VSync ON, and let the driver do this pacing for you.

I think I gave you what you asked for.
“update+render” measures one loop iteration which means EVERYTHING. From key input to updating game logic to rendering (glClear(), swapBuffers(), world.render(), gui.render(), etc…), of course game initialization happens before this loop but that doesn’t matter. You can also see it in pseudo-code:

public void loop() {
	long lastTickStart, currentTickStart = System.nanoTime();
	// game application doesn't leave this while-loop
	while (!window.shouldClose()) {	
		lastTickStart = currentTickStart;
		currentTickStart = System.nanoTime();
		loopDuration = currentTickStart - lastTickStart;
		
		// "update+render" = loopDuration
		performanceMeasure.setLoopDuration(loopDuration);
		
		UPDATE();
		checkForLag();
		RENDER();
	}

	Assets.deleteAsset();
	glfwTerminate();
}

You’re not doing manual sleeps or waits in your frameloop to pace it, are you? If so, remove them, run VSync ON, and let the driver do this pacing for you.

I don’t have manual sleeps/waits. My while loop never stops, it loops like billion times per second and I have condition at the start of the UPDATE() method, if condition is passed only then game state is updated. Here is tutorial that I followed for designing game loop:
gameprogrammingpatterns{.}com{/}game-loop{.}html     // remove {}

I don’t know if this pattern/design is bad.

I just tested full-screen and it has same lag pattern. Unfortunate.

Ok. I was confused because you itemized out SwapBuffers() separately from update+render. So I think you’re saying here that SwapBuffers() is part of the update+render time and that that’s everything in the frameloop. Sounds good.

Dumb question: Do you see this occasionally blocking SwapBuffers() behavior with VSync ON? That’s really the case that counts.

Also, have you tried getting rid of all of your logic in update+render except for glClear() of the window and SwapBuffers()? Does that get rid of this occasional block in SwapBuffers()?

What I’m considering with that last test is that you might be providing some work to the GL driver that frame (or a very recent frame) that causes some extra CPU-side processing and/or implicit sync, such as shader reoptimization, buffer update sync, etc. Again, patching in a GL debug message callback here might provide clues.

Great! Just making sure.

Darn. Was worth a shot anyway.

Good news, now there is definitely no more lag!

You solved problem some time ago, but I made a mistake not applying nVidia Control Panel settings, I changed it before but didn’t clicked “Apply” button that pops off. When I set Power management mode, Threaded optimization and Low latency mode correctly lag disappears, everything is taking more time now, for example SwapBuffers() is taking 10 times longer every time, but it’s stable at least.

Old results as we saw before:

0.000033900 sec // SwapBuffers() usually
0.000142100 sec // SwapBuffers() occasionally

New results with nVidia settings:

0.000251900 sec // SwapBuffers() always

Also, have you tried getting rid of all of your logic in update+render except for glClear() of the window and SwapBuffers()? Does that get rid of this occasional block in SwapBuffers()?

This is with default nVidia settings (PM mode, TO, LL mode reverted). Yes I tried and it does not get rid of it, same lag pattern is still there, but it’s 50% shorter. What does this mean? I will definitely test again on better PC next week. Final question. Is my PC problem (GeForce 710 GT, AMD FX 6100 Six-core 3.3GHz, 8GB RAM Patriot 1866 CL9, Win 10 x64 and MSI 970A-G46 motherboard) or it’s my source code?

Excellent!

Yeah, that Power Management mode (PowerMizer, whatever they’re calling it this year) has had me chasing periodic frame spike ghosts too. It’s my bet that this tweak is what got rid of your issue. If not that, then disabling Threaded Optimization did it.

Ok. You may have already gotten the jist, but I wouldn’t put too much stock in how long it takes the driver to merely queue SwapBuffers() on the CPU. This doesn’t necessarily have any meaning. I would look more at how long it takes your entire frame to queue including SwapBuffers() + glClear() of the window and infer based on that.

That’s more of what I was just talking about. Don’t try to infer anything by just how long it’s taking the driver to queue SwapBuffers() on the CPU alone. This may be near-instantaneous, or not, depending on how the driver is written, whether you have VSync ON, or GSync, or how it implements these, or how many images are in your swap chain, or how many frames of queue ahead are configured, or how long it took you to queue the frame (and possibly recent frames), or how much GPU work is required to execute that frame (or those frames) or which CPU/GPU/driver you have installed, etcetc…

I would suggest timing the entire frame as I suggested above, from/to the point immediately after the SwapBuffers() and glClear() of the window. And use that as your primary frame timing statistic.

Nope, no problems with this at all.

I think the problem here is in assuming that there’s any intrinsic meaning behind how long it takes the graphics driver to queue SwapBuffers() alone on the CPU.

1 Like