Native Multiple GPU Support

Hello everybody

Since the version 3, OpenGL look like have the same power of DirectX. But there only one thing where OpenGL can be considered “deprecated” : multiple GPU support.
OpenGL contexts’ are to basic, and it may be very nice to have a such support like DirectX 10 or 11 with multiple GPU. OpenCL’s context’s creation is very nice too.

The idea is to have a full control off GPUs and bridges (SLI and CrossFire) without using specific vendors’ APIs or extensions, and without specific operating systems function (like OpenCL’s context’s creation).

The easiest solution could be to consider a virtual device : an array of multiple GPUs interconnected by the same bridge.

Exemple of a computer who has a SLI 3-ways and an other GPU :

Virtual device 0
     GPU 0
     GPU 1
     GPU 2

Virtual device 1
     GPU 3

A OpenGL Context should be create on one virtual device.
For example in our fictive computer, we create an OpenGL Context 0 on the virtual device 0.

In a context, we should create one or more memory device. A memory device is where texture, FBOs, render buffers, VB0s, shaders […] should be store And each GPU can have one or more memory device.
For example in our fictive computer, we create 2 memory device :

Memory device 0 on GPU 0 and 1 (so data will be copied in the two graphic memory) 
Memory device 1 on GPU 2

And in a context, we should create one or more OpenGL pipeline. One pipeline per thread on the CPU.
For example in our fictive computer, we create 3 memory device :

pipeline 0 on GPU 0 on the thread N1
pipeline 1 on GPU 1 on the thread N2
pipeline 2 on GPU 0 and 1 on the thread N3
pipeline 3 on GPU 2 on the thread N4

So on threads N1 et N2, we can render by a dynamic 2D load-balancing, on threads N3 we can have an “on the fly loading for the memory device 0”. and on the thread N4, we can render shadow maps.

And then we should have commands to copy somme objects from a memory device to an other may be on an other GPU. And the copy directly use the bridge connection instead passing by mother board’s memory.

I think a such idea could be very difficult to implement, but it could bring to us a lot of facilities and performance for Multiple GPUs rendering.

I think it’s a nice idea, but I can’t see that happening in reality.
The biggest trouble is your asking to abstract the GPUs completely and treat them all like they are multiple equivalent devices. And that’s where the reality will start to bite first.
Not all GPUs are the same; they vary widly in their capability,performance and rendering quality.
I could imagine cases where a Sandy Bridge GPU was ‘coupled’ with a real GPU and rendering performance would actually drop rather than increase. What if one of the GPUs did not support a feature? What if one of the GPUs did not natively support a texture format, nor had the memory speed or size to properly handle the task. Not to mention bad drivers for one of the GPUs.
How would an application developer work round those kinds of issue and avoid using parts of the abstracted GPUs - how would they even identify which virtual GPUs were which in order to begin coding around them? Kinda defeats the whole purpose of what your asking for.

FYI, Apple’s latest OpenGL Contexts have a switch to make them very similar to what you propose - it makes all screens share textures (at least) seamlessly.

I was quite surprised to see it in WWDC talks from, maybe 2010?


Where “latest” = “for the last ten years” : Virtual Screens

But this mechanism doesn’t guarantee anything about performance-- only that resources will be migrated between GPUs as needed. And as BionicBytes (and Apple’s documentation) point out, it’s up to the application to check for mis-matched feature sets between GPUs and respond appropriately.

Well, GPU bridges (SLI or CrossFire) are active only on a set of a same GPU model, so in my example, GPU 0, 1 and 2 are same model of the same constructor (nVidia or AMD), so there drivers are same.
if the GPU 3 is different, he may miss some features. But that is not a problem because it is in the Virtual Device 1 who will have a second OpenGL Context different of the Virtual Device 0’s Context.

EDIT: I think a such structure can win performance, because in benchmark of try or quad SLI or CrossFire, performance are not 3 or 4. but if GPU has independent work, result should be better !

Actually, AMD allows CrossFire to work between different GPU models in some cases.

one major problem with proper multi-gpu support is that it is too dependent on window-system specific stuff and on windows the gl vendors are having very hard time working around the microsoft works.
And of course microsoft themselves are not too eager to help.

but vendors can implement it for DirectX, why not OpenGL ?

It would be nice to have a vendor-neutral way of creating a context assigned to a specific GPU, and as an added bonus to provide an API to copy data between GPUs. Direct3D has supported this for a while, and it “just works” there.

SLI is only appropriate for a small subset of use cases, and isn’t always enabled or available. Forcing multiple GPUs to process the exact same command stream is only good for speeding up some games - having independent control would allow for many more use cases (stereo rendering, offscreen rendering and GPGPU, using an integrated GPU to offload some of the primary GPU’s work, etc.).

AMD and NVIDIA both have their own (very similar) vendor-specific extensions to create contexts for a particular GPU. AMD did the right thing and exposed it on their consumer hardware, while NVIDIA chose to artificially restrict it to Quadros (guaranteeing it won’t see widespread use). AMD’s implementation seems to scale decently, aside from a few bugs, but NVIDIA’s appears to serialize all access to the GPUs, meaning that in practice 2 GPUs will each run at half speed, resulting in a wash performance-wise. Direct3D doesn’t seem to have this problem when running multiple GPUs.

It looks like there’s work to be done both inside the OpenGL drivers (NVIDIA) and at the OS level (Windows needs to expose GPUs that aren’t connected to an active display). Maybe an extension would help push things in the right direction.

vendors dont implement it in d3d, microsoft does.
the d3d is the microsoft own api and they are doing their best for it’s welfare.

in contrast, with opengl the HW vendors are on their own. it enjoys virtually no support from microsoft.
they dislike it because it is not windows-only but is cross-platform. it is barely tolerated only because of marketplace pressure.

It’s actually quite simple on Linux. You can set up each GPU to be its own X screen.

Then you just set DISPLAY=:0.# (where # is the screen number) before you create your GL context. Simple. And nothing vendor-specific about that.

So MSWin doesn’t have a similar “screen” abstraction?

Yes, OpenGL context creation on linux is very nice !

On Windows : depend with the driver. If we create a context on an screen with an Geforce, all opengl instructions are sent to all GPU anyway. (it’s stupid y know ^^) if we want chose the GPU, we have to use wgl_nv_gpu_affinity, that are only available on Quadro cards. I haven’t an ATI card, and i’m not very sure, but i heard about one thing like that. I consider this commercial idea very stupid, because, DirectX11, who run on windows too (it’s not new i know =D), support GPU assign.

Since version 3.0, OpenGL is very modern. But context creation is to “old school”. Only enough for beginner library like SDL, glut and others. Honestly, how many years does the opengl context creation has, in particularity on windows ? We are in 2011, i think it’s time to have a universal and moderne context creation like DirectX 11.
A such standard has to be imposed like OpenCL to vendors and Microsoft : any exceptions. When constructors who doesn’t want a such technology will loss some market parts because latest professional applications use this new standard, they will be obligated to develop it on their drivers. Surly with different OS, it will has some few differences, but not like that : who it’s the developer problem to modernize his program. OpenCL is cross platforms, and doesn’t have any differences !

I’m pretty sure if OpenGL has as such technology, it will know a “gold age” in cross platform great games (like Crysis …) and professional applications, even it’s developed by a freelance programer.

context creation under windows is exactly the same as have been in windows NT 3 or so about 20 years ago. there is absolutely nothing new. this is not fault of opengl however.
context creation is actually part of the windows api and microsoft have exclusive control over it.
but microsoft would be very pleased to see opengl dead, so you can imagine how much they care about the outdated context creation

there is absolutely nothing new.

Absolutely nothing new? What about WGL_ARB_create_context? Or the extended pixel format selection of WGL_ARB_pixel_format?

Yes, there are things that we can’t do due to old restrictions placed by Microsoft. But it’s wrong to say that nothing has changed, or that Microsoft has “exclusive control” over it.

It is possible to create a WGL_ARB_multi_gpu extension, entirely without Microsoft’s consent or approval. Nothing is stopping that. The primary limitations on WGL are the need for a device context (so creating HGLRCs that don’t have a default framebuffer is difficult), the inability to use any WGL extension without first getting an OpenGL context, and the fact that only one ICD can be active on a system at a time.

WGL_ARB_create_context and stuff are attempts by the HV vendors to bypass microsoft with varying degree of success.

For example it appears to be impossible or at least very hard to create window-less context under windows. see an extraction from WGL_ARB_create_context spec:

4) Should there be a way to make a context current without binding
    it to a window system drawable at the same time?

    RESOLVED: Yes, but only in OpenGL 3.0 and later. This results in a
    context with an invalid default framebuffer, the meaning of which is
    defined in the OpenGL 3.0 specification.

    NOTE: Apparently on Windows, opengl32.dll makes use of the drawable
    argument to identify the namespace of the driver, so we may not be
    able to work around it.

under windows WGL_ARB_create_context is quite awkward:
to get pointer to the new context create function, you have to use wglGetProcAggress and to use wglGetProcAggress you must have current context. so you must first create dummy context using the old way (which includes creating a dummy window too, don’t forget), make it current, get some pointers by wglGetProcAddress, destroy the dummy stuff and only then you can use the shiny new WGL_ARB_create_context.
which is not so shiny after all:

  • you still can’t have window-less context.
  • still can’t choose GPU.
  • the context pixel format and the window pixel format still have to be the same. this is not so under glX where you can e.g. use the same context to draw in different windows with different visuals.
  • there are more problems i cant think of at the moment.

WGL_ARB_create_context and stuff are attempts by the HV vendors to bypass microsoft with varying degree of success.

I’m not sure I understand your point. Your statement was “context creation under windows is exactly the same as have been in windows NT 3 or so about 20 years ago. there is absolutely nothing new.” This statement is inaccurate, which I called out. And you apparently agree, as your above comment is an admission that context creation has changed.

You can haggle over whether it has changed enough for your liking, but that wasn’t what you said. You said that there was “absolutely nothing new” when there clearly was something new.

In any case, the simple fact is this: WGL isn’t going away. It will still be supported by Microsoft, in the sense that the WGL API will exist. But it isn’t going to be replaced by something else. So if you want to improve context creation, it has to be done within those limitations. Complaining about them will not make them go away.

Multi-GPU support can be done without creating a new context creation function. AMD’s wglCreateAssociatedContextAMD function took an integer value as a parameter, but you could just as easily shove that integer value into wglCreacteContextAttribsARB’s attribute list to get the same effect.

As to the notation you quoted, perhaps that is the reason why AMD exposed wglMakeAssociatedContextCurrentAMD() in their extension. If so, then whatever multi-GPU extension we would be talking about could have a similar special make-current function to work around deficiencies in Microsoft’s version.

ok, i meant there is absolutely nothing new from the microsoft part. and this is what makes things as proper multi-GPU support hard.

yet i do think HV vendors can do better if they really had the will. but maybe proper multi-GPU support would require a bit too tight co-operation between rivaling AMD, NV, intel, which is hard to attain.

Is it legal that Microsoft do not bring updated support for OpenGL because they don’t want to have concurrence on there Windows ? What about freedom concurrence ? I think if they don’t want to, they have the obligation to let OpenGL community to develop what they want : like a native Multiple GPU support. I think with linux, this is not a problem, and apple might be please to do same on mac OS. But if it’s idea is already available on Mac OS an Linux, may be Microsoft will be obligated to do same. A true argument to sold professional version of windows. They profit about Maya, 3DS (and others) big professionals applications, but without any support from them.

However, I found very ridiculous that politics decisions reduce the technology like that.

Ha ha ha. Be realistic, politics both do and undo technology.

It’s actually quite simple on Linux. You can set up each GPU to be its own X screen.

Then you just set DISPLAY=:0.# (where # is the screen number) before you create your GL context. Simple. And nothing vendor-specific about that.

So MSWin doesn’t have a similar “screen” abstraction?

Yes, things are a lot better under Linux for multi-GPU, especially if you have the ability to control your display configuration (not so good for taking advantage of multi-GPU in random end-user machines). When I tried it, though, I found that the NVIDIA driver was still serializing much of the GPU work, and didn’t get a performance increase in my use case.