How to render with multi-threads for mult-views

I am interested in rendering multiple views of the same 3D scene using multiple threads. My intent is to have a 3D geometry that is seen from several different camera angles simultaneously.

How can I do this as efficiently (optimally) as possible. I would like to avoid a loop that sends each duplicate OpenGL call sequentially to each thread.

Can someone point me to a guide or example code? Thanks in advance!


You have only one graphics card, so at end of the line, that graphics card will render all views one by one or piece by piece. If you want to do multi-thread rendering, create one main rendering context, and one context in each rendering thread and share lists & objects between main context and thread context.
Onli in main context create opengl objects (vertex buffers, shaders, textures, …)

My system is a dual quad core with three PCIe slots. I’m using three ATI cards with dual DVI outs to drive six displays.

As I understand it, WinXP sends all OpenGL calls to all cards regardless of context. So, in essence I guess it is a “single” card at the moment.

However, I have read that some nVidia cards will allow a way to program around this “all OpenGL calls go to all cards” situation.

My present thinking is that I will move to three nVidia cards that will support this behavior.

However, for now, I would like to be able spread six rendering threads across the available dual quad cores. I want to get that part of my implementation down. If it works, I’ll invest in the nVidia cards. (Long story as to why I’m using ATI currently.)

These six threads will render the same 3D scene with six different view fustrums (and possibly six different vertex and fragment shadders).

What would really be helpful is a sample code or tutorial that would lead me through how to do multiple rendering threads for the same 3D scene.

Thanks in advance.


Hi David,

you can find sample code in the Equalizer parallel rendering framework, or simply use Equalizer to build your application. This is imo the fastest way to get a parallel OpenGL application, and you get a lot of free boons with Equalizer, e.g., run-time configuration and scalability.

More information is on
If you want to invest the time to learn the pitfalls and roll your own multithreaded rendering code, this is a good starting point:



Both Ati and Nvidia drivers support the GPU affinity extension (WGL), which allows you to specify which card is referenced by an OpenGL context. This will likely give you the best performance.

Moreover, Ati supports 2- and 3-way crossfire, which allows the cards to share the workload and improve performance. Crossfire should be enabled by default - check your driver options.

Stefan, very helpful link. Thank you.

Stephen, I will look into the WGL extension. Thanks.

Stefan, I have taken sometime to read through Equalizer. Very interesting.

However, I will utlimately use Chromium to run unmodified existing apps and I’m planning to write an SPU that allows a single box to render across multiple graphics cards within the box. The idea is avoid the interprocess communication within a box that seems to clog Chromium up on WinXP.

As a starting point, I am not writing a Chromium SPU currently. I just want to write a simple OpenGL app that renders multiple viewports (one per graphics card) without having to loop through the geometry N times (where N is the number of cards or viewports).

In an ideal world, I would like to have a draw code that sets N viewports up on each card, then throws out the geometry calls for a single scene. The cards render this scene relative to their viewport.

In my head it seems simple enough. I am just unable to figure out how to get OpenGL and WinXp to achieve it.

If I use GPU affinity on the N viewport calls, then the masks are immutable and thus I’ll have to loop N times on geometry. Defeats the intent. Conversely, if I just setup the viewports sequentially without GPU affinity, then the last one set will be the one the geometry calls render into. Again, no avail.

I am thinking that multiple threads with multiple contexts and using the GPU affinity mask is the way to go. However, two concerns:

(1) If the calls all go over the PCIe bus anyway, is there an advantage to the GPU affinity?

(2) Assuming an advantage, how can I efficiently have the threads access a common set of geometry calls, keeping in mind that I ultimately want to write a Chromium SPU that would simply pass down a sequence of OpenGL geometry calls to each thread?

I hope I’m making sense here. I would really like to use Equalizer, but I just don’t think that’s possible. For better or for worse, I think we are married to Chromium.