Why SwapBuffers slows down the performance?

wolfman · January 10, 2004, 12:02am

tried to measure performance on Cirrus Logic 2mb video card(PCI) and saw that function SwapBuffers itself(without any drawing) slows down performance terribly!!!.
It is the worst thing that can happen.
How can I optimize if it just one function call?
What can I use instead?

I have windows xp

Korval · January 10, 2004, 1:19am

Turn off V-sync.

MikeC · January 10, 2004, 4:27am

Originally posted by Korval:
Turn off V-sync.

And, ideally, refrain from double-posting to this forum and the OpenGL under Windows one…

BTW, disabling vsync is only really useful for benchmarking. You can’t have more visible frames per second than your monitor has refreshes, so it just wastes time and causes ugly tearing for no good reason.

ZbuffeR · January 10, 2004, 4:44am

>>>>BTW, disabling vsync is only really useful for benchmarking.

Not completely true.

>>>>You can’t have more visible frames per second than your monitor has refreshes, so it just wastes time and causes ugly tearing for no good reason.

True when you have more fps than the refresh rate of the monitor.
But when you have less, it is much nicer for the eyes to turn off vsync, trust me.

I would really like a “half-vsync”, which would wait only when drawing faster than refresh rate…

AdrianD · January 10, 2004, 6:18am

wolfman, you are trying to measure the opengl performance of a 2-MB Cirrus Logic graphicscard ???
this is a joke, right ?
this card does not have any 3D acceleration, so you are benchmarking your CPU…
and your slowdown is simply because your backbuffer is in system memory, and it must be transferred over the PCI-bus every frame to a very,very,very old graphicscard with very very slow videomem.

orbano · January 11, 2004, 4:04pm

turning on y-sync can lower the “framerate” of other subsystems like physics and input. I run games @60hz, but feel the difference between 60hz and 120hz for example (mouse move is smoother and faster) (of course usually this applies on programs that require more graphical calculations than others)

ZbuffeR · September 20, 2007, 1:02pm

[WARNING: I just revived a very old thread]

Darn, I really should have patented or whatever my idea of “half-vsync” at the time !!

I would make millions by suing Epic about Gears of War lol :

There are hybrid solutions to VSYNC. Gears of War uses VSYNC whenever a frame takes less than 33ms to render and immediately displays the frame if it took more.

This means we VSYNC > 30 FPS (and hence clamping to 30 FPS) and don’t drop down to ~20 FPS (32ms + 16ms) just because the framerate might be 29 FPS in rare cases.

Taken from :
http://forums.epicgames.com/showthread.php?p=24608843#post24608843

Zengar · September 20, 2007, 1:08pm

This will teach you

BTW, does anyone know, did Terry Welsh (mogumbo) actually patented his parallax mapping? About every game uses it now…

Mikkel_Gjoel · September 20, 2007, 2:00pm

Excellent idea - so how would you go about implementing that in reality? Does it make sense to use wglSwapIntervalEXT on a per-frame basis?

ZbuffeR · September 20, 2007, 5:45pm

That is a good question ideed. There were talks (some months ago) about an extension or something to generalize NV fences that might allow to do that.
I can not find it anymore, if anybody has ideas …

knackered · September 21, 2007, 4:41am

can be done efficiently with a one frame latency…

if (lastFrameTime<16 && !vysnc)
{
   wglSwapInterval(1);
   vsync=true;
}
else if (vsync)
{
   wglSwapInterval(0);
   vsync=false;
}
SwapBuffers();

Zengar · September 21, 2007, 5:47am

Or similar code over the average of last N frames

ZbuffeR · September 21, 2007, 7:49am

@Zengar: why averaging ? To the contrary you want to be as reactive as possible, else the result would be ruined by tearing.

@knackered: to be tested, but I would said the precision is not enough in your snippet.

The goal it to be certain that when below target refresh rate, there is not vsync at all. From the CPU side, it is hard to know precisely when the GPU is done rendering (without doing flushes I mean).

Maybe this extension would help, but it looks like NV only for now :
http://www.delphi3d.net/hardware/extsupport.php?extension=GL_EXT_timer_query

I will have to test, but having a baby don’t help to have time for that .

Zengar · September 21, 2007, 11:53am

I thought that SwapBuffers included an implicit Finish? So you can just measure the FPS after SwapBuffers?

Lindley · September 21, 2007, 1:48pm

EXT_timer_query is usually pretty good about giving accurate timings for minimal slowdown. But it’s not perfect. Adding glFinish() calls between queries can still affect some results.

Jan · September 21, 2007, 2:05pm

Maybe you could do a glFinish, then check how much time your frame already took and then decide to enable or disable vsync, before you actually call SwapBuffers:

t1 = time
…
frameupdate
rendering
…
glFinish
tdiff = time - t1
if tdiff < …
EnableVSync
else
DisableVSync
SwapBuffers

Though i am not sure, whether that would actually work, at all.

Jan.

Overmind · September 22, 2007, 2:49am

Yes, this would effectively reduce the wasted time in SwapBuffers to zero

Jan · September 22, 2007, 3:22am

That was my thought. It would mean, that SwapBuffers would be doing nothing more, than actually swapping the buffer. However, it would either sync this to the monitor refresh, or it wouldn’t, depending on the time it took to render the frame.

However, one might need to have a dedicated render-thread, since the thread itself will wait for SwapBuffers to return and thus would waste CPU-cycles.

Any other ideas? I am not really convinced by my solution, myself.

Jan.

tamlin · September 22, 2007, 6:52am

Please note that SwapBuffers hasn’t neccessarily “finished” all it should, as we may be led to believe.

I have a performance testing piece that has displayed a professional vendors implementation (one of the big two, with probably quite common h/w) sucking over ten million CPU cycles (whether due to spinning on a spinlock or actually doing work I don’t know, and it really was closer to 13e6 IIRC, which @2.4GHz clock was over 5ms!) worth of time for glClear(color|z) after a successful swap unless I added a Sleep() for a while after swapping (artificial test, only to test the speed and overheads of the implementation). If the code did Sleep() between swap and glClear, the clear call “only” took ~13k CPU cycles IIRC (at least it wasn’t anywhere close to even 1e6 CPU clock cycles).

I wrote this to display that what I have previously believed (and likely most of you) that calling glFinish or swapping has measured all of the time - it may not be true. In the case I observed, the time is somehow “amortized” until after the following glClear is completed.

(should there be interest in trying this code locally, if for nothing else to gloat “Haaa haaa, you made an error here!” :-), I could probably boil the source down)

Overmind · September 22, 2007, 7:22am

That was my thought. It would mean, that SwapBuffers would be doing nothing more, than actually swapping the buffer.
In case my sarcasm got lost: If you call glFinish, you actually empty the pipeline. This is practically the worst thing that can happen to your performance.

SwapBuffers won’t take much time after glFinish because glFinish has already taken much more time than SwapBuffers would have. You are ruining overall performance just to be able to measure the frame time more accurately…