Hey there,
I am working on relative big OpenCL (1.0) project (total sum of our OpenCL code is tens of thousands of lines).
Our code has to run on the OpenCL implementations from AMD, Intel & nVidia and this nowadays is nearly impossible, since all of those implementation (which I think are certified, but I haven’t check that to be honest) are doing different things in some cases …
We are finding some differences in their implementations, that make the code crash or not compile and I wanted to ask somewhere more familiar with the standard if those behaviors can be expected. So here is a list to begin with …
-
Passing un-itialized variable to OpenCL kernel causes a crash (in spite of the fact that this variable is not being used at all (no read, no writes) in the entire kernel). This situation can happen relatively easily (for example, if you have a lot of #ifdef’s because you want to support all the other GPGPU platforms too). This happens on Intel OpenCL for CPU only (latest version, all the oldest version I have tried - too).
-
Converting __global const T * restrict to a bool does not compile. Example :
__global const int * restrict ptr = NULL;
if (!ptr) {} //this does not compile
if (ptr != NULL) {} //this is okay
This happens on nVidia OpenCL (latest version).
-
Macro redefinition causes compilation error - “Macro redefinition is not allowed in OpenCL”. This happens only with Intel OpenCL.
-
Can’t use -> operator for float4. Example :
float4* foo = &bar;
foo->x = 0.f; //does not compile
(*foo).w = 0.f; //works
I am not sure which implementation did that, but I can check (most likely nVidia). It works on all other implementations.
There are more, but right now these are the first that come to my mind …
Please note, that the problem is that all examples I made above are problematic in one of the three OpenCL implementations we use, so we can never be sure if we have OpenCL valid source code after just testing on one of those.
Clearly, all of these are either undefined behaviours, or these OpenCL implementations does not stick very closely to the standard (and if it is the second and they are OpenCL certified, how did they become such ?) ?
The problem is that it works in one place (we haven’t enable any extensions), because somebody has decided to “extend” the language, or because he has crappy OpenCL implementation, it does not work as “a standard” - we can never be sure if everything is okay at our side, and this has started to cause a lot headache for quite some time now … It has either to work on all or none. We don’t care if we can use float4->w, as far as it is consistent along the implementations, for example.
Thanks,
Blagovest Taskov.