Beginners Question: new ATI or nVidia boards?

Hello there,

I need to buy some new video cards to my lab. HD Radeon seems to offer quite a advantage on terms of price and number of processors.

I checked a review on Phoronix about nVidia’s vs ATI’s boards, and seems that new nVidia cards are quite better than ATI’s. As that review is not brand new, I’m not sure which ones to buy. The review can be found here:

Do you guys have any experience\recommendations on this subject, or any newer benchmark? I want to use my boards to do some MD simulations.

Personally for Windows or Linux I prefer AMD cards over NVIDIA cards because it gives me the option to use one platform and context for both CPU (both AMD and Intel CPUs are supported) and GPU. If you’re devepling for Apple then you can use one platform and context as well.

If you want to use OpenCL 1.1 specification, which I strongly prefer over 1.0, then NVIDIA doesn’t support that for the general public. You would have to be a registered developer to download and install their nearly year old driver. If you browse NVIDIA’s forum then you may notice somewhat of a lack of feedback from them on their future support of OpenCL.

Of course NVIDIA hardware is nice and their CUDA support is great, but in my mind AMD wins in performance/price and in regularly providing updated OpenCL drivers and sdks. I think portability at a slight performance hit is worth it in the long run.

I’m not really interested on running my programs on CPU, altough OpenCL’s ability to run on those do matter. Intel has recently it’s own OpenGL implentation. The version for Linux is still in “Preview support”. But currently, I need only stuff to run fast. For a few months I have been using CUDA on a GTX 465, but my group got more money to buy new boards, and it’s up to me to decide which ones.

The real deal here would be the ratio flop/dollar. I like the OpenCL’s idea of “running everywhere”, but in the end of the day, this ratio is the important stuff.

The benchmarks in the review you linked to don’t tell enouch information about how they’re implemented. If an algorithm is memory bound, then NVIDIA’s hardware will generally provide better performance. If it’s computationally bound and can implicitly or explicitly be performed using float4 or double4 (or double2), then AMD’s hardware will generally provide better performance; however in the case of float3 and double3 it may take a slight performance hit. In this last case NVIDIA may or may not win in performance because they treat everything as scalars anyways so vector types don’t really matter much.

Depending on your sample you may also need to consider the amount of memory. Wikipedia has nice summaries of the hardware specs: AMD Radeon HD 6xxx Series and NVIDIA GeForce 5xx Series.

Also, I highly recommend for OpenCL not getting a 2xGPU board. While I can’t speak for the case of NVIDIA, AMD hasn’t ironed out using both GPUs on the single PCB board. There are some advanced things you could do to make it work, but it’s a lot of hassle and not officially supported.

Thank you for your answers, sean.seatlle. I’m quite sure my problem is computationally bound, as we use ‘just’ some 5k of 2D particles. I’m really turning to ATI, because of the price. Are Stream Processors just like CUDA processors, or there a subtle change?

The ATI Stream Processors are basically vector processors as compared to NVIDIA’s scalar processors. The ATI Stream Processors execute very long instruction width (VLIW), the Radeon HD 5xxx (and I believe also the low 6xxx) use VLIW5 and the higher 6xxx use VLIW4.

Basically this means that one VLIW4 instruction executes on four 32-bit ALUs which together also form two 32/64-bit SFUs (VLIW5 executes of four 32-bit ALUs and one 32/64-bit SFU). This is the hardware reasoning behind using float4 and double4. In fact double4 is likely broken down into two double2 executions using two 32-bit processing element for each double calculation. This grouping and ungrouping of vector types can be done explicitly or the compiler may implicitly do it for you. I suggest programming whichever way is more expressive for your problem but remember that float3 may starve one ALU. Depending on your program and the hardware scheduling you may never notice this happening.

So while NVIDIA cores are clocked faster, AMD cores are clocked slower but there are many more of them. Generally lower clocks are better for power consuption and heat disipation.

Don’t forget that the wavefront for AMD is 64 as opposed to 32 for NVIDIA. You can query this with clDeviceInfo and something like CL_PREFFERED_WORKITEM_MULTIPLE. That way you can change your hardware later easily if needed.