Naive problem with Images and Kernels


I’m starting with OpenCL and I’m having some troubles understanding the work items and work groups stuff (and probably also images format)
I’m not new with OpenGL, so I’m a little confused because I dont really know what am I doing wrong.

I’m trying to do the most basic image based kernel: read an RGBA image and write it out to an output image buffer.
I setup both (input and output) images with the same format, but the kernel execution fails with CL_INVALID_COMMAND_QUEUE. I traced down this error and it depends only on the coordinates that I pass to write_imageui. I’m using the same coordinates which I use to read the source image (both images has the same size and pixel components, and they are NPOT)
Well, here is the setup I’m using for the images (both of 614x515):

cl_image_format imageFormatCL;
imageFormatCL.image_channel_data_type = CL_UNSIGNED_INT8;
imageFormatCL.image_channel_order = CL_RGBA;
inputMem = clCreateImage2D(context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR , &imageFormatCL, 614, 515, 0, inputImage, &error);

outputMem = clCreateImage2D(context, CL_MEM_WRITE_ONLY, &imageFormatCL, 614, 515, 0, NULL, &error);

(I’m using a lot of clFinish just for debugging)

localWorkSize[0] = 16;
localWorkSize[1] = 16;
globalWorkSize[0] = 624; //((int)(614/16)*16))+16
globalWorkSize[1] = 528;//((int)(515/16)*16))+16 

CLCHECK(clEnqueueNDRangeKernel(commandQueue, kernel, 2, NULL, globalWorkSize, localWorkSize, 0, NULL, NULL));

// Read back the results
size_t sizeImage = 614 * 515 * 4;
unsigned char* outputImage = (unsigned char*)malloc(sizeImage);
memset(outputImage, 0, sizeImage);
CLCHECK(clEnqueueReadImage(commandQueue, outputMem, CL_TRUE, origin, region, 0, 0, outputImage, 0, NULL, NULL));

And this is my kernel:

__kernel void gray(__read_only image2d_t imageIn, __write_only image2d_t imageOut)
  int id0 = get_global_id(0);
  int id1 = get_global_id(1);
  if (id0 > get_image_width(imageIn) || id1 > get_image_height(imageIn))

  int2 pos = {id0, id1};

  uint4 inputPixel = read_imageui(imageIn, sampler, pos);
  write_imageui (imageOut, pos, inputPixel);

After the execution of this kernel, with the above setup, I always get an CL_INVALID_COMMAND_QUEUE.
If, for example, in the kernel, I write out the image as: “write_imageui (imageOut, pos/4, inputPixel);” the CL_INVALID_COMMAND_QUEUE doesn’t happens, but obviously, the output image is wrongly constructed.

Any idea about this problem would be appreciated.
Thank you very much.

As said, it was a naive problem. I was checking incorrectly the bounds of the image, sorry for bothering you with tons of code.