So far I have been maintaining some code that uses CUDA and there is a vanilla C++ version.
Now, I would like to move everything to OpenCL and please correct me if I am wrong:
I should be able to maintain a single OpenCL code base. My project is open source and as long as a user can get an OpenCL compiler for their system, they should be able to build the code for the underlying hardware. For example, if the user only has a quad core CPU, the code should be able to take advantage of that. Otherwise, if they have a OpenCL compatible graphics card, it should be able to take advantage of that.
Also, I am very confused as to what I need to start doing the development. I have an AMD CPU and an NVIDIA GPU. Now, do I need to ensure that I install the AMD SDK and the NVIDIA SDK or is there one SDK that can take care of all of this. With such multiple devices supporting OpenCL, how is all of this managed?
I would be grateful if someone could point out a good starting point and how to solve this SDK issue. I am going to use OpenCL for pure computation (image processing algorithms primarily).
The SDK doesn’t provide much apart from some profiling tools and some headers: everything at run-time is provided by libOpenCL which is maintained and comes from khronos (i believe), so every vendor ships the same one as part of their graphics driver/run-time. It finds the backends (platforms) at run-time. So for development you should be able to use any sdk from any vendor, unless you want vendor-specific extensions (I use JOCL personally, and for that I don’t even need an SDK to write code). You only need an SDK if you don’t otherwise have a run-time, e.g. intel, or amd before they included it in the gpu driver/you don’t have a compatible card.
Single code-base isn’t that simple: some cards have more features than others, although modern ones have a pretty common baseline. Also given the hardware is so different different algorithms may be required to attain peak performance for a given machine. The difference can be an order of magnitude in performance too. But in general, unless you use specific vendor features or require specific topology, a single set of code should at least run.
Thanks! The project is about optimizing certain parameters to reconstruct some optical images and there is a lot of image post processing involved. So a lot of operations like resampling, convolutions etc.
I will start with the NVIDIA kit and get my hands dirty at the moment and hopefully things will start to clear up as I go along.