It sounds like you might be.
Should I keep glFinsih even now that I’m finished though? I’ve seen different opinions on this and that it’s just a debugging tool. I’m on a desktop Nvidia gxt 660M, but I want spread my application on multiple desktop environments.
Your call. Here are some reasons why you might want to leave the glFinish after SwapBuffers in. All of them derive from the fact that (on a desktop or Tegra mobile GPU), glFinish after SwapBuffers synchronizes the CPU’s draw thread with the video scan-out clock. That’s a good thing! Note: video scan-out clock is also called the vertical sync clock, aka VSync clock. This is the clock used for timing when SwapBuffers actually happens, which determines when the user actually gets to see that cool frame you just rendered.
-
In many cases at the beginning of the frame, applications often sample the state of the simulation world and user input controls to determine what to render (what the new camera position/orientation is, where the entities are, what state the effects are in, etc.) If you leave the glFinish() after SwapBuffers in-place, AND you tune your rendering so that you always make the VSync period, THEN you can be relatively sure that this “beginning of frame” processing will happen at very regular and consistent intervals. Every 16.66ms, if you are running at a standard 60Hz scan-out rate with SwapInterval 1. Time it and see! The end-to-end latency of the system (from user input to frame displayed) is regular like clockwork. This has the effect of making what is rendered very smooth (not jumpy) and consistent in terms of frame-to-frame differences. This feels good to a user.
However, if you do “not” leave the glFinish() after SwapBuffers(), then your CPU draw thread is NOT synchronized with the GPU’s video scan-out clock, so your “beginning of frame” processing will happen at seemingly pseudo-random intervals. For example, you might hit beginning-of-frame 3 ms after the last time, then again 5ms after that for the next, then 22ms after that, 13ms after that, etc. In other words, your CPU draw thread is running out-of-sync with the rendering. This makes it difficult-to-impossible to generate a result that looks and feels smooth to the user (no stuttering, lags, stepping, popping, etc.).
Why would it be so erratic? Without glFinish() after SwapBuffers(), the GL driver will often queue the SwapBuffers request for later, and immediately return, letting you go ahead with beginning-of-frame processing and GL call submission for “the next” frame, before the previous frame has completed rendering much less been displayed to the user (i.e. well before the SwapBuffers has actually been performed!). It might even read a full-frame or more ahead of “reality” (what’s been displayed to the user thus far). The GL driver will block on seemingly random calls depending on driver/GPU-specific and driver-internal criteria you can’t control (e.g. command queue fills up, GPU pipeline is backlogged, etc.). Move the camera to a simpler scene, and the driver may be able to read much further ahead into subsequent frames. Move the camera to view more complex scenes, and the driver might only be able to read half a frame ahead. So you just end up blocking in random places. This results in your frame inputs being sampled at random intervals in time, giving your system an erratic end-to-end latency.
-
Also, another reason to leave Finish after SwapBuffers is it’s very useful to have per-frame CPU timing statistics in-place to diagnose performance problems (e.g. frame overruns, where it took the CPU+GPU more than 16.66ms to render a frame). With SwapBuffers+Finish, it’s easy to see when this happens, and from there, to track down the offending bottlenecks in that specific frame. If your “beginning of frame” is completely uncorrelated with the scan-out (VSync) clock, then your CPU frame timing is not nearly as useful. Yes, you can use GPU timers (timer queries) but there are a number of problems with that. GPU time is only half the story. What matters is the aggregate CPU+GPU time. And synchronizing the CPU with the GPU at end-of-frame is an easy way to give you that.
Embarrassingly enough, I can’t seem to reproduce the lag with v-sync anymore. I’ve been fixing with my code and maybe I got it, maybe it will come back some day and haunt me. The only candidate I can think of was the input system. I used GLFW’s callbacks to process an input event as soon as it occurred. Now, I’ve changed that so that the events queue up and I processes them all in one function call in a controlled fashion. Could have been that before an input event came in between some critical CPU -> GPU thing.
Try removing the Finish after SwapBuffers briefly for testing. Does your lag gremlin come back? 
But before, what I would see was that swapbuffers took 200% minus the rest of what went on in the loop. And it wasn’t occasional, it either occurred during all of the execution or not at all. So Mr Vsync clearly sometimes had the notion that the GPU couldn’t keep up with 60Hz and decided to go with 30Hz instead. But clearly with our experiments Mr Vsync is dead wrong.
Possibly. But I wouldn’t be so quick to pin the blame on Mr. VSync. My bet is Mr. GL driver and the internal “read-ahead” buffering I described above. Without the Finish after Swap, I’d completely expect the behavior you’re seeing.
CAVEAT: Again let me caveat that the glFinish after SwapBuffer we’re discussing to synchronize the CPU draw thread with the GPU output is only a reasonable approach on desktop or Tegra GPUs (sort-last architecture). This Finish after SwapBuffers is a really bad idea for other mobile GPUs (sort-middle architecture, sometimes called tile-based GPUs) which have a completely different design with much longer GPU draw latencies. On those GPUs, Finish after Swap can easily double your frame times!