Taking full advantage of hardware T&L

After setting up your scene to render, I assume that glFlush() or flipping the buffers is a blocking call that does not return until the scene is rendered… is this true?

If so, it would make sense that I should implement a dual-threaded system - one thread that gets signalled to do a render, and another that does per-frame computations. The rendering thread would then wait until the processing thread signals that a scene is ready, set up the render, then signal the processing to start computations on the next frame just before calling Flush or buffer flipping.

Is there anyone out there who has already done this? If so, have you done any performance tests to see if it has any advantage?

glFinish is the blocking function. I think people use it for benchmarks only. I would use glFlush instead and let the card return to me as soon as possible even though it might not be done rendering.


Using two or more threads might speed up your application. Especially on future hardware (i.e. intel´s new processor) it is possible. However i think it is very hard to do this properly, because you have to sync the threads, which if done wrong will slow you down.
Another very important thing to think of is, that OpenGL is thread-safe. This means, that you won´t be able to call OpenGL functions from both threads, but only from one of them (Does anyone know, if OpenGL 2.0 is thread-save? If not, one could use its fence-system to syncronize threads).


I was thinking of only having one thread execute at a time - the only point of using two threads would be to be able to start work on the next frame while the hardware is rendering the current one. using glFinish() is a possibility, but then it could be possible that the graphics hardware will hold things up - it would be possible to process and try to start rendering the next frame before the current frame is complete. Using two threads, you can using a mutex (or the fast critical section on win32) and a signal object in this method:

Processing thread steps:
-1 Process Frame
-2 Check RenderThreadDone signal object
-3 if RenderThreadDone signalled, signal RenderThreadStart
-4 otherwise go back to step 1

Rendering thread steps:
-1 wait for RenderThreadStart signal
-2 render scene
-3 flip buffers/glFinish()
-4 signal RenderThreadDone
-5 go back to step 1

Pretty simple and lightweight. Mutexes are used very little. Unfortuantely, this would be single-threaded most of the time, so the p4s with Hyperthreading probably won’t get much performance out of it.

SwapBuffers doesn’t block on the Mac; I highly doubt it does on the PC.

This is a question to your question.
How do you take advantage of Hardware T&L in OpenGL in the first place. I’ve posted a question in the newbie forums but no one has answered.


Originally posted by atimes:
[b]This is a question to your question.
How do you take advantage of Hardware T&L in OpenGL in the first place. I’ve posted a question in the newbie forums but no one has answered.


You don’t have to do anything special to get hw T&L to work in OpenGL. If there is hw support, it will just work. If there is no hw support, the driver will do it on the CPU.

Thanks -

SwapBuffers seldom blocks, although it might if the right combination of user preferences and drivers are installed.

OpenGL automatically uses hardware T&L. To make the best use of it in vanilla OpenGL, you should submit geometry using glDrawRangeElements(), or by submitting display lists.

If you want to add extensions, the VertexArrayRange and VertexArrayObject extensions will allow you to make even better use of hardware T&L, by increasing the geometry throughput between your program and the card.

Originally posted by OneSadCookie:
SwapBuffers doesn’t block on the Mac; I highly doubt it does on the PC.

Something is blocking - when I enable VSync, I notice my CPU usage can drop to near zero while I’m still rendering 60 FPS. CPU usage only gets higher (but still not 100%) when I disable VSync.

If you’re submitting geometry much faster than the graphics card can render it, then something (probably swapbuffers) will block. You’re graphics-bound, though, so extra CPU time is not going to benefit you in any way…


I think you’re mistaken. If you submit geometry, and the graphics card can render it FASTER than it takes to refresh the screen, THEN you will block in swapbuffers, assuming you have vertical sync on.

Actually, even if you submit, and it takes longer than one frame, swapbuffers will block if vsync is on.

If vsync is off, then I would expect the place to block will not be swapbuffers, but instead whatever OpenGL call realizes that all the FIFOs are full, and it needs to wait until it can issue the command you’re queuing.

Perhaps to prevent too much queuing, they’d block you in swapbuffers, too…

There are two cases – one where the amount of time it takes you to process a frame is less than the amount of time the graphics card needs to render it, and one where it is more.

If you are submitting faster than the graphics card can render, you will eventually block somewhere; either when all submission queues become full, or when you get a sufficient number of frames ahead of the GPU (probably only 1 or 2).

If you are submitting slower than the graphics card can render, you will never block*.

Usually turning vsync on will put you into the first camp.

  • depending on which GL calls you make. Most calls will not cause you to block; some may.