SLI do it yourself

Hi,

is there any way to control/program SLI manually in Windows, using nVidia cards?

Once there seemed to be an extension named WGL_NV_gpu_affinity. But I haven’t seen any driver yet exposing it?! Apart from that it wouldn’t help me very much, because I need a setup where I could ceate two contexts which share their lists, but where I´m able to render different stuff with two graphics cards at the same time.

The reason for my question is that I want to exploit 2 graphics cards for rendering one frame but the driver-provided automatic SLI happens already too late in the pipeline. If I could, for instance, render two parts of the image in two separate threads, I would be able to aid the rendering already at frustum culling level.

So, to sum it up: How can I exploit two graphics cards, when SLI does not work?

I´m not interested in graphics algorithms in general, but rather how to access the two graphics cards (which API to use) and how to make them work on one image.

thanks in advance!

Apparently this extension is only exposed on the Quadro line …

The workaround proposed here seem promising :
http://www.gpgpu.org/forums/viewtopic.php?t=4140&highlight=

So you will render “manually” two half of the screen, wich each card, and composite it to display the final result.

But how to create a window on a display (GPU?)which is not conntected to any monitor (I´m using a single monitor only)?
Also, I expect that I´m not able to have both rendering contexts share their lists, am I? Not sharing lists means complicated resource management and unnecessary waste of memory to me.
(having unwanted shadow copies of 400 MB of geometry is hard enough… but raising that to 800MB is a real pain)

Is there anything known about how Longs Peak (and beyond) will address multiple GPUs?

I need a setup where I could ceate two contexts which share their lists, but where I´m able to render different stuff with two graphics cards at the same time.
That is not the purpose of SLI.

SLI exists to let you use multiple graphics cards to render 1 image.

The reason for my question is that I want to exploit 2 graphics cards for rendering one frame but the driver-provided automatic SLI happens already too late in the pipeline.
If you ultimate goal is to really just render one image, why don’t you just let the driver do it’s job? The point of having a hardware abstraction is to abstract the hardware. SLI lets you use 2 graphics cards as one; the API allows it to work without specialized coding.

In short, if you have 2 GPUs in an SLI configuration, just code as you normally have and the driver will take care of it to the extent possible.

I hope it’s only a matter of (short) time before we see compilers and runtime environments themselves sophisticated enough to seamlessly and automatically leverage multicore CPUs and GPUs, across all application domains.

This looks like a step in that direction:
http://www.rapidmind.net/product.php

That is not the purpose of SLI.

SLI exists to let you use multiple graphics cards to render 1 image.
Then my request was false. I want to use multiple GPUs where “standard” SLI can’t help me.

In short, if you have 2 GPUs in an SLI configuration, just code as you normally have and the driver will take care of it to the extent possible.
Well, thats the problem :slight_smile: For instance, I´m using occlusion queries that need to be evaluated in the current frame. This couples CPU and GPU to a certain extend, so the CPU can’t do much ahead of the GPU (where SLI could help). This makes AFR not working. SFR mode won’t help either. I’m mainly vertex transform limited. Assuming that SFR splits the rendering work at rasterization level, this is already too late. Also, how would the driver handle a case where occlusion-query-geometry straddles both parts of the image? It would have to wait for both gpus to return a value, add that and then return that to the application.

There’s an argument like “why don’t you examne the occlusion queries in the next frame?!”. Since I’m doing hierarchical occlusion culling, I cannot do this. I’m creating queries dependend on the outcome of previous ones within one frame.

The easiest solution would be to have two rendering threads working completely independend on each half of the screen. This should be already possible somehow, but the two rendering contexts I would have to create couldn’t share their lists (textures, VBOs etc). This is a major PITA, since it wastes memory and makes resource management a nightmare (both contexts basically render the same stuff anyway).

So, I need: two rendering contexts (one for each gpu) which are able to share lists. I really think this could be easily offered, since SLI boils down to the same.

Since I’m doing hierarchical occlusion culling, I cannot do this.
“Doctor, when I raise my arm like this, it hurts.”

“Then don’t raise your arm like that.”

If you’re doing something that impedes performance on SLI cards, you have two options: accept the performance loss, or stop using that algorithm. It’s just an algorithm; you’re not married to it. Use a different occlusion test.