OpenGL Swap Buffers Blocking.

I have a dilema. When vsync is on, then the swap buffers function call stalls, correct? Does this mean that it is impossible to actually update the world faster than the refresh rate if vsync is on without using multithreading?

My goal is to update the world such as physics and AI in the game faster than the refresh rate but rendering at the refresh rate. Is this possible without using multithreading. I understand that this is easily achievable with multithreading, but I would rather avoid this if it is at all possible.

You can only display one frame per vblank interval if you use vsync.

However, you can run multiple ticks of physics without presenting any frames.

lastTime = now();
forever() {
  renderFrame();
  presentFrame();
  curTime = now();
  while( lastTime < curTime ) {
    userInputAiAndPhysics();
    lastTime += StepSize;
  }
}

Originally posted by MrShoe:
I have a dilema. When vsync is on, then the swap buffers function call stalls, correct? Does this mean that it is impossible to actually update the world faster than the refresh rate if vsync is on without using multithreading?

Regarding whether SwapBuffers blocks or not, it’s implementation dependent (I know of implementations that do not block). Nevertheless, drivers try not to queue too many frames, so if all your frames always take less than 1/60 secs, it will eventually block (after n frames have been queued up or when the DMA command buffer fills).

The easy solution is to do the calculations of the next frame before calling SwapBuffers and as many times as you want, so you leave time for the current rendering to be consumed by the graphics card while you calculate AI and so on.

Even with vsync on SwapBuffers still stores one frame in the FIFO. Swapbuffers will only block if the earlier issued swapbuffers is still in the queue. It is important to understand this because it can explain timing differences post swap and it’s good to know where your extra frame of latency came from.

I like to call glFinish at some point after a swap call to keep latency low but it’s up to you. Others hold strongly views that this can defeat parallelism in app & GFX, it might if you don’t know what you’re doing.

Some cards I’ve seen (it’s been a while since I noticed this) buffer many frames especially if they are small and can fit in the FIFO, and can introduce a LOT of latency, this is a bug if you see it.

Your problem doesn’t depend on wheter vsync is on or off, because in the case it’s disabled you still won’t be able to update tha game faster than the frame rate.

The solution proposed by jwatte has a bug: if the update takes longer than StepSize you have an infinite loop. What I do is to add a new condition to the loop:

 

lastTime = now();
forever() {
  renderFrame();
  presentFrame();
  curTime = now();
  int loopCount= 0;
  while( lastTime < curTime ) {
    userInputAiAndPhysics();
    lastTime += StepSize;
    if(++loopCount>maxLoops){
        lastTime= now();
        break;
    }
  }
}

 

Quote dorbie:
“Even with vsync on SwapBuffers still stores one frame in the FIFO. Swapbuffers will only block if the earlier issued swapbuffers is still in the queue. It is important to understand this because it can explain timing differences post swap and it’s good to know where your extra frame of latency came from.”

This would imply triple-buffering. I seem to recall sometime ago someone was saying that the spec did not actually permit triple-buffering (something having to do with FRONT and BACK buffer contents after SwapBuffers, I think). Was the spec restriction lifted, or did it turn out not to be an issue, or is this one of the cases where nobody follows the spec? :slight_smile:

@martinho_: if the update takes longer than the step size, you have a bug worse than an infinite loop – the hardware is not capable of running your program.

When we detect that, we tell the user “hopeless CPU overload” and give up. We also profile the CPU on install to make sure it looks reasonable. But that problem is mostly orthogonal to the problem of dealing with frame-to-frame timing.

Originally posted by jwatte:
[b]Quote dorbie:
“Even with vsync on SwapBuffers still stores one frame in the FIFO. Swapbuffers will only block if the earlier issued swapbuffers is still in the queue. It is important to understand this because it can explain timing differences post swap and it’s good to know where your extra frame of latency came from.”

This would imply triple-buffering. I seem to recall sometime ago someone was saying that the spec did not actually permit triple-buffering (something having to do with FRONT and BACK buffer contents after SwapBuffers, I think). Was the spec restriction lifted, or did it turn out not to be an issue, or is this one of the cases where nobody follows the spec? :-)[/b]
That’s interesting. AFAIK OpenGL doesn’t say anything about swapping buffers and for WGL you have the PFD_SWAP_EXCHANGE AND PFD_SWAP_COPY pixelformat flags which tell you whether the content of the backbuffer remains the same or becomes the previous contents of the frontbuffer.

In any case both are hints and off the top of my head I can’t see WGL/OpenGL specs forcing the implementation not to queue up frames (the only reason not to do so is because of the “mouse lag” effect on the user).

Another thing to note is that n-buffering, in the wide term, doesn’t need to interact with SwapBuffers anyway, because you don’t really need n color backbuffers, just space for n full frames worth of commands in the command FIFO.

jwatte: If I say that is because that happened to me when executing the debug version of an app that used particle systems and that ran at 70 fps in the release build.

another interesting thing I was dissapointed about related to SwapBuffers:

  • set SwapBuffers to wait for vsync (blocking)
  • make a loop with SwapBuffers only !!not drawing anything!!

it executes ie. just 80 times per second and burns the CPU somewhere in GDI ExtEscape() funcion?!?

I can’t believe there are no CPU friendly ways for OpenGL to sync userspace and kernel/drivers… or am I missing/messing up something here? :rolleyes:

There’s always threads. Leave one thread the task of drawing (and waiting on SwapBuffers), and the other can do the logic updates. As long as the update loop doesn’t touch OpenGL you should be able to sync them easily enough.

Originally posted by hh10k:
There’s always threads. Leave one thread the task of drawing (and waiting on SwapBuffers), and the other can do the logic updates. As long as the update loop doesn’t touch OpenGL you should be able to sync them easily enough.
Yeah, but still it does not solve the CPU burning… I think SwapBuffers() is internally doing some kind of busy-loop waiting on vsync instead of putting the calling thread to sleep…

jwatte, it doesn’t imply tripple buffering. There are no more that two buffers, the front and the back, the fact that data can be stored in a FIFO does not require a framebuffer. Tripple buffering would allow that data to be rasterized from the FIFO despite a blocked swap, so clearly it would be advantageous and eliminate stalls even further however it is not mandated, it’s just filling the FIFO until the next swap is issued.

Infact if you don’t block at all and have big FIFOs and don’t do anything that might block like a readback you could potentially store hundreds of frames in the FIFO (and I’ve seen this happen) but those frames don’t need buffers. (this is out of spec).

w.r.t. spec on this I don’t know. I suspect any issue would be related to non blocking on the second swap but the spec AFAIK wouldn’t prevent you rasterizing from the FIFO, but once again it’s not mandated.

You can have a full frame in the FIFO ready to go when the swap clears then you can start rasterizing and start dispatching your next frame. However with tripple buffering you could rasterize that FIFO contents even while you waited on vblank, otherwize the backbuffer is taken (I don’t think that would be out of spec). Now, whether or not you are rasterizing that full FIFO, I expect the spec says something like you can’t start issuing the next frame until the swap. Any relaxation would be related to this I think, so you could rasterize the FIFO but not issue the next frame, that doesn’t seem too onerous and is still tripple buffering, you start getting behind on your rendering by more than a frame and introducing latency if you allow another frame in the FIFO while rendering to the third buffer. The frames/per/second fanatics might like it but the latency sucks, it’s getting ridiculous to start issuing a frame while you have almost another full frame in the FIFO. I don’t even like a full frame in there but I’m wierd :slight_smile:

Originally posted by speedy:
I think SwapBuffers() is internally doing some kind of busy-loop waiting on vsync instead of putting the calling thread to sleep…
That depends on your graphics card. Some graphics cards will do a busy loop, some others will put the thread to sleep and wake it up with an interrupt. The resources to put the thread to sleep are there, now it only depends on your graphics card capabilities and your driver’s goodwill.

In any case, getting 80fps just by doing SwapBuffers is completely normal when vsync’ed: the first few frames may go above your vsync because they can queue up frames, but eventually you will be bottlenecked by the vsync frequency (the CPU burning is a different thing, though).

dorbie: all you need is a big FIFO

Yes, that’s true; you triple-buffer commands instead of bits. Letting that go too far, of course leads to other problems, like excessive presentation latency. And, no matter how big your FIFO, at some point you’ll run out, and you’ll start running at the speed of the display anyway, assuming each frame uses about the same amount of FIFO.

evanGLizr: While the graphics card can wait on an interrupt and swap out of a DPC and whatnot, getting the application thread scheduled once the DPC has run is not guaranteed. Thus, for highest Quake benchmarks, you have to busy-wait. Aaah, good old benchmarks driving the implementations!

Originally posted by jwatte:

evanGLizr: While the graphics card can wait on an interrupt and swap out of a DPC and whatnot, getting the application thread scheduled once the DPC has run is not guaranteed. Thus, for highest Quake benchmarks, you have to busy-wait. Aaah, good old benchmarks driving the implementations!

I thought so! Busy loop can only be good for high frame rates.

For waiting on vsync, I’d prefer the driver to put the thread to sleep. (and letting the other threads do their jobs)