Consider a dual-GPU graphics card like the Nvidia 590 or the Radeon 6990.
Does anyone know if such a card cwill show up as a single or two devices? If just one device, has the device automatically access to all compute units on the card, or only to one internal GPU?
Does anyone know if this is also platform-dependent (diff between Nvidia and ATI/AMD; some stuff having to do with cross-fire / SLI etc?)?
So assume you want to keep both GPUs of the card busy. I suppose the most straightforward way is simply setting up two command queues (one for each device), and then split the data, i.e. half goes to the first device, second half to the second.
OK let’s generlize and say we have multiple devices, but for purposes of simplicity assume they are all identical (e.g. one dual-GPU card or several single-GPU cards). Is there some sort of data splitting facilitation mechanism across devices (OpenCL built-in, platform vendors, code base somewhere)? Splitting the data proportionally is easy but not necessarily compute-time efficient if data items operations may have fairly different computational complexity.
I have also read multiple times that SLI / cross-fire is not supported / shall be disabled for GPGPU. For mulitple discrete cards that’s a matter of just turning it off, but I don’t know how dual-GPU cards are handled / if there is something to turn on/off ??
Is there any major benefit / disadvantage for choosing two Radeon 6970 / GTX 580 over a single 6990 / GTXX 590? Out of the box the PCIe-bus might play a role, but on the other hand my naive thinking concludes that a single slot at x16 but for two GPUs unified on a single card (6990/590) boils pretty much down to the same as to slots at x8 for two individual discrete cards?
Depending on the algorithm the easy way may be the best way (or not significantly worse than any other way). However, if things branch a lot or go through significantly different iterations within the kernel you may consider creating your own work scheduler to feed each command queue. Sorry, but there’s no built-in OpenCL APIs for this that I’m aware of. I know of one platform vendor that unifies multiple devices as a single device and evenly splits data automatically, SNU-Samsung OpenCL Framework.
I think you can turn off the cross-fire support in AMD dual-GPU cards via the Catalyst Control Center under Windows or the aticonfig command under Linux. The major benefit for some of choosing the dual-GPU cards over two single-GPU cards is they can get a higher GPU density per machine. The disadvantage is that the clocks of the dual-GPU cards are usually lower than that of the two single-GPU cards because of power/thermal constraints. The PCIe bus is a serial communication channel (the total bandwidth is shared) so I think a single x16 slot and dual-GPU card is equivalent to two x16 slots and two single-GPU cards. Some motherboards only have one x16 slot and then some x8 or x4 slots. Some power supplies don’t have the 4+6 12V or 4+4 12V power connectors dual-GPU cards require. Single-GPU cards typically take the more common 3+3 12V power connectors.
I would caution that support for dual-GPU cards seems to lag behind (sometimes by more than several months) that of the single-GPU cards, at least in the case of AMD. More complicated hardware takes more time to get things right.