Why SwapBuffers slows down the performance?

Brolingstanz · September 22, 2007, 1:54pm

Seems there’s a variation of Heisenburg’s principle at work here: the more precisely you try to measure something, the more intrusive the measurement necessarily becomes, and the more likely you are to adversely affect what you’re measuring.

Simon_Arbon · September 23, 2007, 12:36am

I read somewhere (I will post a link when i find it) that the operation of Swapbuffers depends on the driver you are using.
Very old drivers did do a glFinish when Swapbuffers was called, but in order to increase pipeline performance most modern drivers put the swapbuffers command in the command queue and return immediately (ie. they DONT do an implicit glFinish).

Some drivers (NVIDIA?) then let you continue writing to the command queue and only block if you try to do a second swapbuffers before the first one finished, or if the command queue is full.

Other drivers will block on the next OpenGL command sent after swapbuffers, if the swap has not completed yet.

TAMLIN: This is why you are measuring 13k CPU cycles in the glClear, the driver returns from swapbuffers early so the CPU can do some non-OpenGL tasks while waiting for the queued commands to execute (and optionally the next VSync).
When you send the next OpenGL command, the swapbuffers is still pending, so the driver blocks your thread until the swap has completed.

glFinish should NEVER be called on modern hardware, it will destroy your performance by stalling the GPU.
The only way to accurately measure frame timings on a pipelined system is by using a fence.

system · September 23, 2007, 5:56am

No, drivers should not call glFinish. That would kill parallelism between CPU and GPU.
When you call SwapBuffers, the driver pushes all GL commands to the GPU because it is imperative that the current frame be completed. This is a “glFlush”, meaning empty the command queue.

Other drivers will block on the next OpenGL command sent after swapbuffers, if the swap has not completed yet.
Not on a GL call. GL calls always go to a queue in RAM. They will bock on the SwapBuffers call.

Simon_Arbon · September 24, 2007, 1:27am

V-man: drivers should not call glFinish. That would kill parallelism between CPU and GPU.
Thats what i just said.

V-man: When you call SwapBuffers, the driver pushes all GL commands to the GPU because it is imperative that the current frame be completed. This is a “glFlush”, meaning empty the command queue.
Yes, thats what i believe happens with all modern NVIDIA drivers, but i have seen various posts such as www.experts-exchange.com or www.gamedev.net which suggests that ATI drivers may work differently.
Tamlin’s results certainly suggest thread blocking during a glClear.

TAMLIN - Are you using ATI or NVIDIA ?

The only way to prove how a various drivers handle swapbuffers is to do some profiling to see where the thread spends its time.

Mikkel_Gjoel · October 5, 2007, 5:42pm

Just for the record - for my personal, immediate purposes, the following setup seems to work fine

if( dt>24) {
  if(vsync) {
    wglSwapIntervalEXT(0);
    vsync=false;
  }
}
else if( dt<12){
  if(!vsync) {
    wglSwapIntervalEXT(1);
    vsync=true;
  }
}
SwapBuffers();

dt being the time between the start of the last render, and the start of the current - ie. one frame lag.

dorbie · October 8, 2007, 1:48pm

If you want to get 60Hz just as smooth as 120 Hz then reduce transport latency from input through to swap. All the rest about smoothness is B.S. By all means run your physics (collision really) at a higher rate if you insist (that might help for other reasons), but you will find that input quality is more a function of latency once you attain 60Hz (or whatever your monitor refresh rate is) than any other factor. Driving graphics at an even higher frame rate is the least efficient way of reducing transport latency, but it gives good benchmark.

There’s an awful lot of ignorance promulgated about this subject. It seems to have percolated up from benchmark running gamers who live with a full frame of buffered draw data implemented by everyone in the industry to keep throughput up. Throughput is not everything and that should be obvious when people advocate running at a frame rate higher that the display can drive get quality interraction.

One day graphics professionals may actually start implementing the lessons from a quarter of a century ago.

For now the tail still wags the dog.

Hampel · October 9, 2007, 12:18am

@Dorbie: but how do you implement such low latency with all this queuing and multi-threading in the graphics drivers?

dorbie · October 9, 2007, 3:23pm

You can time your input and runtime loop to minimize transport delay through your simulation (game code), particularly the POV, implement vsync while eliminating the buffering of a frame (which will make things worse if you don’t) and that means blocking to drain the fifo with a glFinish.

Exactly where depends on the implementation details and you have choices like synching post clear at differing costs.

You can actually get really sophisticated about this and time input and draw kickoff relative to vsync, if you do this you may even avoid the need to block to drain the FIFO (something the card makers hate but too bad).

Using fences would be a useful way of managing this but ultimately I think you really do need to key of the vsync timer.

Note to driver writers who might chime in. I’m sure you have something more valuable to contribute than the usual “don’t block and run balls to the wall” so stretch yourself before posting benchmarketing advice. Too many games are written and run as if they were benchmarks.

ZbuffeR · October 9, 2007, 5:03pm

Mikkel Gjoel: you are right, you snippet works very well !
This is really nice to the eye, even with black+white vertical stripes in variable high-speed horizontal scroll, my worst case as far as display refresh is concerned.
I don’t get why your first threshold value works so well, but I could not get better results with a smaller value. However 85 Hz monitor would mean around 11-12 milliseconds, not 24 right ?

Jan · October 9, 2007, 5:45pm

I am not that familiar with the details of D3D, but in many games you can select a resolution bundled with a refresh-rate (e.g. 60 Hz or 85 Hz). Does this have anything to do with vsync? I mean, why would i want to select 60 Hz for my monitor refresh-rate, when the game could run faster than that? And why would a game want to set a different refresh-rate than what is set from the OS already?

Jan.

ZbuffeR · October 10, 2007, 2:15am

Why ?
Because this refresh rate depends on the monitor, not on the game.
LCD are typically 60 or 75 Hz only.
CRT can usually do better, but it depends on the resolution, so it is nice to be able to choose it.
(the bundled list presentation is not the most convenient, I agree on that)
Having more than 60Hz is only useful :

to reduce flicker (on CRT only)
when you need a particular refresh rate, such as a multiple of 24 Hz to mimic original movies rate.
to have even smoother animations in ultra high speed cases (the Quake1-3 games come to my mind) but as pointed Dorbie, other latencies come into play, such as the mouse events rate. In quake 3 you can select among 3 different mouse filters, (rougly: no filter, interpolation filter, or extrapolation filter)

Overmind · October 10, 2007, 3:19am

and that means blocking to drain the fifo with a glFinish
Of course, glFinish is not generally bad. But calling glFinish between rendering and SwapBuffers, with no simulation code in between, is about as bad as it can get in terms of performance

knackered · October 10, 2007, 3:39am

to get smoother frame rates, you should really be interpolating between the last frame and the current one using r2t and alpha blending.

Simon_Arbon · October 11, 2007, 1:09am

@Jan: Does this have anything to do with vsync? I mean, why would i want to select 60 Hz for my monitor refresh-rate, when the game could run faster than that? And why would a game want to set a different refresh-rate than what is set from the OS already?

There are 3 different effects that need to be accounted for: Flicker, Jerkiness, and Strobing.
Flicker is caused by the physical characteristics of the monitor, as the image is re-drawn on the screen it will be brightest at the most recently drawn lines and will have faded in intensity at other parts of the screen.
This mostly occurs with a CRT, and if the field rate is too low then the screen will seem to flash on and off.
The same effect occurs with movie film as the shutter blocks the light when the film is advanced to the next frame.
For most people a field rate of 50Hz is enough to prevent flicker (This is the minimum acceptable monitor refresh-rate for these types of displays)

Jerkiness is caused by the rate at which the human eye will perceive a series of still images as a continuous movement.
TV has a frame rate of 25 or 30Hz, while movie film only operates at 24Hz per frame.
This is enough to prevent the jerkiness that can be seen in very old movies (which had a 16Hz frame rate) or games run on hardware that isn’t fast enough.

TV displays 2 interlaced fields per frame so it can meet the minimum required 25Hz frame rate and 50Hz field rate to prevent both of these effects, while movie film shows each frame several times.
Computer monitors are usually run slightly faster than this as it reduces eyestrain (and non-interlaced so field rate = frame rate).

Unfortunately there is another effect (Strobing) that effects computer games more than video.
Video cameras capture an image of a moving object during most of a frame period, hence it will be MOTION BLURRED.
It is this blurring of moving objects that prevents strobing.
With a computer animation however, we are generating a series of perfectly clear still images with no blurring, and even at 60Hz a moving object will look like its strobing (Similar to a disco with a strobe light flashing).
If you have a camcorder with a variable ‘Shutter speed’ feature, try filming a fast moving object with both the minimum & maxmum settings to see what i mean.

For most OS programs this wont happen so the monitor can be set to 60Hz, but when a game starts-up it may need to change this to 100Hz or more to prevent the strobing effect.
The only other way around this is to add your own artificial motion-blur when rendering.

A game should be run as fast as possible, but still locked to VSYNC, so if your maximum monitor frame rate is 100Hz (and the GPU can keep up) then that is what the game should run at.
If it finds itself skipping frames then it should switch to a slower frame rate that the GPU can keep up with.

The_Fiddler · October 11, 2007, 7:03am

Thanks for the detailed post, an interesting read that clears many things up.

While this is veering off-topic, do you know of any papers that describe a (semi-) physically correct simulation of a camera suitable for realtime rendering?

dorbie · October 12, 2007, 8:28pm

Originally posted by Overmind:
[quote]and that means blocking to drain the fifo with a glFinish
Of course, glFinish is not generally bad. But calling glFinish between rendering and SwapBuffers, with no simulation code in between, is about as bad as it can get in terms of performance [/QUOTE]The problem with your observation is it focuses on a single performance metric. When you consider transport delay you might arrive at a very different conclusion.

That said there are ways to be smart about this. You just have to apply yourself to the problem. You’ll notice I mentioned fences, you could use one to trigger input and POV update, but run all sorts of physics etc in the mean time, it really depends a lot on the details of your scenario.

dorbie · October 14, 2007, 1:06am

Originally posted by knackered:
to get smoother frame rates, you should really be interpolating between the last frame and the current one using r2t and alpha blending.
Unless you’re talking about motion blur for intraframe acumulation I disagree.

When you have the next frame show it, anything else will hurt interraction and you can already see ghosting if you drive swaps at less than refresh, making that artifact even more persistent won’t help.

FYI in the past when swap was below refresh rates I’ve implemented a dynamic video pan intra frame to smooth pitch and heading rates on an SGI infinite reality. It is not without its own drawbacks if you have moving targets in the scene.

knackered · October 15, 2007, 7:31am

To be honest I was just going off my experience with a movie player I wrote. I blended the previous frame with the current one, with alpha derived from the fractional part of the movieTime* fps. I hadn’t thought too much about how it would work in an interactive setup. Ever so sorry.

speedy · October 16, 2007, 12:43pm

From the Unreal Tournament 3 Demo .ini file, default settings:

[Engine.GameEngine]
bSmoothFrameRate=TRUE
MinSmoothedFrameRate=22
MaxSmoothedFrameRate=62

I wonder what they actually do, because the game action really seems smoother when it’s enabled, and I was positively surprised by the responsiveness.

Benchmarkers recommend disabling that particular setting because it clamps the FPS.

ZbuffeR · October 21, 2007, 5:08pm

It seem to be the topic du jour for games, ETQW has something similar:
http://community.enemyterritory.com/forums/showpost.php?p=44596&postcount=1

And well, I must admit the default settings are smoother and input delay seems really reduced, compared to “render as much as possible + vsync”.
Even the tearing is not so noticeable.