Driving 6-12 displays with 1 GPU card

Question about the:

  • [li] ATI Radeon HD5970 Eyefinity 12[*] ATI Radeon HD5870 Eyefinity 6

Nearly all the material I’ve found is marketing fluff, but best I can tell they allocate one ginormous framebuffer and then just partition the output to separate display outputs (basically, you just crank up your FOV and resolution, and pretend that it’s all-the-same to the GPU). AFAICT, there’s just one clump of “GPU memory” shared by all displays, one clump of GPU cores processing the vertex data going to all displays, one clump of GPU cores doing fragment processing and downsample for all displays, and one GPU command stream, not 6 (or 12) possibly driven by different CPU cores in parallel.

Is this correct?

If so, then seens this setup is interesting for simple WoW-style games. But if your app really pushes the GPU (lots of verts, lots of fill, lots of view-dependent texture, high AA, different projections per display, etc.) then it seems it would bring this setup to its knees.

That said, this is what I’m inferring. I’d much appreciate some hard facts from some developer that knows details.

http://hothardware.com/News/Computex-201…th-Lucid-Hydra/
http://www.engadget.com/2010/04/30/powercolor-hd5970-eyefinity-12-makes-six-screens-yesterdays-new/
http://www.theinquirer.net/inquirer/news/1652779/powercolor-hd5970-display-outputs
http://www.engadget.com/2010/04/30/powercolor-hd5970-eyefinity-12-makes-six-screens-yesterdays-new/

Yes, a program I’ve done using a standard single backbuffer (D3D, not OpenGL, but the overall principle is the same) has been tested and works fine on the ATI Eyefinity stuff.

It even ran very well, but then again it wan’t pushing the GPU too hard.

(basically, you just crank up your FOV and resolution, and pretend that it’s all-the-same to the GPU)

“pretend”? It is “all-the-same to the GPU”. That’s the whole point of Eyefinity; that an application rendering to many monitors works almost identically to an application rendering to one. This makes it better than multi-GPU configurations because the application doesn’t have to do any work.

The only Eyefinity code you need to do is querying the size of the desktop and setting up an appropriate aspect ratio. Which you should be doing anyway, simply due to the various different monitor resolutions out there.

If so, then seens this setup is interesting for simple WoW-style games. But if your app really pushes the GPU (lots of verts, lots of fill, lots of view-dependent texture, high AA, different projections per display, etc.) then it seems it would bring this setup to its knees.

Well, yes it would… today. What about later, on the HD 6970 or next year on the HD 7970? It seems rather close-minded to say that, just because <insert program here> exercises current hardware too much to make it work means that Eyefinity’s ultimately useless.

Really, when you get down to it, hardware is running out of stuff to do. Tesselation is the main feature of DX11/GL4.0, and it’s not even that useful overall. All of the easy stuff is taken, and even that isn’t going to be used by the majority of applications. Developers simply don’t have the resources to use much more stuff. Implementing the hard stuff for developers requires a lot of artist input (ie: more money) and really starts approaching the Uncanny Valley unless you do it exactly right. Even if hardware started supporting more features, they wouldn’t be widely used simply due to lack of need.

Eyefinity is an effective way to use hardware that the developers aren’t/can’t use.

Also, not everyone cares about pushing hardware. Not everyone is trying to make the longest, most complex fragment shaders imaginable. Some of them just get to the point where what they have is “good enough” for their needs. Would you deny Eyefinity to these developers and their users?

Bumping up the resolution dramatically can also improve the immersive experience of playing the game or walking through a simulated environment. Do not discount the impact of being able to turn your head and seeing more of the world.

Also, the number of vertices doesn’t scale with resolution, unless you’re somehow creating more vertices for higher resolution. So vertex shader costs only increase in the sense that you’re rendering more stuff with a wider FOV.

And if you’re rendering at a high resolution, you don’t need as much (or any) anti-aliasing. You also get more efficiency out of your textures (assuming regular texture access patters), though you’ll be accessing the higher mipmaps more often.

All this means that performance does not scale linearly with resolution. It only does so if you’re fragment program execution bound.

12x the FOV and 12x the batch/vert count is all-the-same to the GPU? If performance is no object, sure! :wink:

But yes, if you just want to stretch your 40 deg FOV over 120 deg of gamer FOV to get super-high-res (which is kinda odd), then yes, same batch/vert count – and it’s all about fill. But really, as gamer FOV increases you’d like to present more geometry detail. GPU tesellation helps, but isn’t necessarily best for everything, meaning more batches/verts.

It seems rather close-minded to say that, just because <insert program here> exercises current hardware too much to make it work means that Eyefinity’s ultimately useless.

Didn’t say that. You did.

I just want to know bare-bones how this works and how flexible it is.

What I was hoping for is 2-12 display outputs each driven by their own GPU (command processor + stream processors + memory) each with their own separate command stream (e.g. X screen in X11) so they can be driven in parallel by separate CPU cores ALL one one GPU card. Guess I’ll wait a little while longer for that…

What’s be really cool is a variant on this with a large portion of global memory GPU-shared (textures/VBOs/etc.) and another portion which is GPU-specific (system framebuffer, FBOs/attachment textures/renderbuffers, display-local textures, etc.) to support parallel draw of multiple frustums with GPUs on the same board, without eating double the GPU memory/PCIe bandwidth for shared objects (e.g. most textures/VBOs/etc.).

i wonder if you can get that by creating 6 or 12 windows and contexts, running in 6 or 12 threads, with 6 or 12 culled batch lists?

Yeah, though that leads to these questions:

  1. [li] Can you safely simultaneously submit GL calls to the ATI Eyefinity GL driver for rendering to different windows/contexts from different threads without a lot of “synchronization” overhead … as is typical of this many writers one reader model. Context swapping is often touted to be a killer to this model.[] Can the GPU command processor dedicate a subset of the stream processors to be working on one display and a subset to be working on another display at the same time – i.e. multiple simultaneous but distinct kernels? (the other camp calls this Concurrent Kernel Execution)[] Are the number of stream processors scaled up 6x or 12x so the same perf per display is possible on GPU-limited portions (answer: no; same number of stream processors as 1x board, and same clock rate).

Caveat: IANAGDE (I am not a graphics driver engineer :whistle: )

glViewport, glScissor, keep the same context, change the viewing matrix and possibly the projection depending on the screen geometry and draw.

This should be the easiest supported. Yea sure you want to render to multiple screens etc, but all you’re saving is dispatch, the geometry shader uniforms change accross channels, and fragment shading is different, you’ll only hose fragment and FB caches anything fancy, and/or be implementing (in the driver layer) a complete abortion of an intermediate vertex shader results buffer.

glViewport is your friend.

P.S. for a video wall of course you can use a single shared projection, viewport & dispatch across multiple video outs, it’s only when you rotate the display plane that you have to alter the viewing matrix transformation.

I’m sure many people will be blissfully ignorant when they do this on a tripple display wraparound, but almost nobody does projection right these days.

But really, as gamer FOV increases you’d like to present more geometry detail.

As a gamer, I’d like a lot of things. That doesn’t mean they’re going to happen. The game would have to be written to support that. Meaning more art assets, more modeler time, etc. I’ll take what I can reasonably get.

Ultimately, Eyefinity is a bonus, something nice for people with way too much money (we are talking about 6+ monitors here). It is not a design goal, and it certainly isn’t the baseline you would write a game for.

The most you might do is adjust the LOD scheme based on the aspect ratio and resolution of the game. But that’s a good idea in general, not something you specifically do for Eyefinity.

What I was hoping for is 2-12 display outputs each driven by their own GPU (command processor + stream processors + memory) each with their own separate command stream (e.g. X screen in X11) so they can be driven in parallel by separate CPU cores ALL one one GPU card. Guess I’ll wait a little while longer for that…

Why would you want that? All it does is make an already complicated system (rendering) that much moreso. You have to do bezel correction (correcting for the size of the gaps between screens) manually, while the Eyefinity drivers do it for you. You have to transfer data multiple times (for objects that are on multiple monitors), which also means transforming the objects multiple times.

The only reason I could see wanting this is if you’re wanting to render arbitrary images, rather than rendering one large scene and having the system break it up into multiple physical locations for display.

Yeah, though that leads to these questions:

You’re really over-thinking this. The reason why ATi doesn’t have detailed developer info on Eyefinity is that it isn’t something a developer does. It’s something the user does. All a developer needs to do is make sure that the program can display at an arbitrary aspect ratio and provide FOV settings in the program.

Eyefinity is not a generalized developer tool from which you can create arbitrary effects; it is a generalized user tool.

I should clarify the latter. HD5870 Eyefinity 6 (6 monitors) is same number of stream processors (1x), though HD5970 Eyefinity 12 (12 monitors) is (I gather) 2x the stream processors.

Yeah, well our app renders to multiple screens and happens to be CPU dispatch and vertex heavy for the usual reasons (multipass, etc.), and has a hard 60Hz requirement. So it’s very useful to have draw parallelized. If you have a GPU (command stream + command processor + stream processors per display), this dispatch is embarrassingly parallel.

…you’ll only hose fragment and FB caches anything fancy, and/or be implementing (in the driver layer) a complete abortion of an intermediate vertex shader results buffer.

Not if each display has it’s own window, context, and GPU, right. All I’m thinking is a form factor reduction of that where you didn’t need multiple physical GPU cards in separate PCIe x16 slots to get this embarrassingly-parallel multifrustum draw capability.

glViewport is your friend.

Yeah, except our goal isn’t “let’s make this flat glass window pane bigger” model, but instead “let’s wrap displays around the user for an immersive experience”.

And even with the “flat glass window pane” model, if you crank up your frustum FOV 6x-12x to match the 6x-12x increase in gamer FOV with those extra monitors, glViewport doesn’t do away all the extra batch setup and vertex shader overhead those same number of stream processors (HD5870 Eyefinity 6) or 2x stream processors (HD5970 Eyefinity 12) will now have to bear. Not even to mention the extra fragment shader load if you’re fill-limited. Right?

So yeah, Eyefinity is definitely very cool for some applications (dialing 6-12 PCs down to 1 is great when you can get it). But just thinking how it could extend to be even more general, to support a broader class of applications and increased performance.