Converting CUDA grid to OpenCL

Hello. I’m porting a fairly simple CUDA kernel to OpenCL, but I’m struggling at getting the indexing correct. I have a 4D array, we’ll call it A(isize,jsize,ksize,msize). In CUDA I can do a ksize x msize grid of isize x jsize blocks and that results in:

i = threadidx.x
j = threadidx.y
k = blockidx.x
m = blockidx.y

I can then calculate the index from that. I’d like to convert the same thing to OpenCL. Since OpenCL defines the global and local work sizes, I expected that I could do something like this:

globalsize[0] = ksize * isize
globalsize[1] = msize * jsize
localsize[0] = isize
localsize[1] = jsize

and pass 2 in for the size of globalsize and localsize, then in the kernel:

i = get_local_id(0)
j = get_local_id(1)
k = get_group_id(0)
m = get_group_id(1)

If I then calculate my index in the same way and just do something simple, like set every element (i,j,k,m) = i, the CUDA code and OpenCL code don’t match. Am I doing something really dumb here? Is there a simpler way to migrate from the CUDA grid/block to OpenCL global/local work size?


Oops, after spending way too much time staring at this, it turns out that I had my copy-back code wrong and wasn’t copying the whole array. The moderators should feel free to delete this thread.