Constant pointer dereference issue, possible compiler bug?

Essentially I’m cross-posting this issue from the nvidia forums (I posted there first but there seems to be little interest so I figured I’d ask here as well.

Anyway, on to the matter at hand: When iterating over constant memory in kernels the GPU has a tendency to simply return 0s rather than fetch the value at the corresponding adress. I’m using a Macbook 15’ 4.1 with Snow Leopard, nvidia cuda sdk 3.0 installed and a GeForce 8600M GT. Following is a code example of when aforementioned issue arises:

__kernel void add(__constant float *a, __global float *answer)
    const int id_x = (int)get_global_id(0);
    int i,j;
    float sum = 0;
    for(i = 0; i < 3; i++)
        for(j = 0; (j < 3); j++)
            //This will not work
            sum +=  a[i+j];
            This will work:
            sum += a[j*i];
            So will this:
            sum += a[j];
            And this:
            sum += a[i]; 
    //Write result
    answer[id_x] = sum;

In this instance if I changed the iteration to the following:

int k = 0;
for(i = 0; i < 3; i++)
    for(j = 0; (j < 3); j++,k++)
        //This works
        sum +=  a[k];

It worked just fine. Until today that is. Now this code will fail and return 0s as well. Oddly enough I’m using this work around in one of my kernels (it’s a simple naive convolution filter used for benchmarking) and it still works (code via pastebin). Now I haven’t read the specification back to back so if there’s something about pointer arithmetic I’m missing please tell me. However, considering this rather simple bug I’m unable to use constant memory. Using a single loop workaround with modulo arithmetic is not an option as it nearly doubled the running time of the kernel (god knows why). It’s important to point out that this code works just fine on the CPU, it’s only when run on the GPU starts returning 0s.

Sounds like a bug in Apple’s implementation. Apple’s implementation (even on NVidia hardware) is quite different. I suggest you post a bug report with them. Apple also has their own OpenCL forum located in the developer portion of their website.

Okay, I’ll get on over there and post it asap. Thank you! I just wanted to make sure I hadn’t missed something about what is and isn’t allowed in terms of pointer arithmetic.

Can you also provide the test source where you call the CL APIs to initialize CL, create program & kernels and execute the kernel? This along with the kernel you posted can then be used to debug what is going on.

Right, here’s the complete source for the above mentioned example. The main.c file is an adapted version of the opencl demo posted in conjunction with the podcast on OpenCL on