Kernel produces vertical bars instead of solid color. Why?

I’m trying to write image processing OpenCL application, but my problem that any attempt to alter input image produces artifacts which look like vertical bars. This does not happen if I copy image pixels without altering them. So for example this line produces artifacts:

pixel = (uint4)(image1_pixel.x,

…but this one works as expected:

pixel = (uint4)(image1_pixel.x,

Input is opaque 32-bit PNG image, so I expect both code lines to produce the same result (exact copy of the input in this case). In reality, however, only second line works as expected. First line gives output with artifacts.

Here is my kernel:

__constant sampler_t sampler = CLK_NORMALIZED_COORDS_FALSE |
                               CLK_ADDRESS_CLAMP |

__kernel void test(__read_only image2d_t image1,
                  __write_only image2d_t out) {
  const int2 pos = (int2)(get_global_id(0), get_global_id(1) );
  uint4 image1_pixel = read_imageui(image1, sampler, pos);
  uint4 pixel = (uint4)(image1_pixel.x,
  write_imageui(out, pos, pixel);

Here is relevant portion of main.cpp code:

  CImg<unsigned char> image1("../input.png");
  Image2D clImage1 = Image2D(context,
    ImageFormat(CL_RGBA, CL_UNSIGNED_INT8),
    image1.width(), image1.height(), 0, );
  Image2D clResult = Image2D(context, CL_MEM_WRITE_ONLY,
    ImageFormat(CL_RGBA, CL_UNSIGNED_INT8),
    image1.width(), image1.height(), 0, NULL);
  Kernel test = Kernel(program, "test");
  test.setArg(0, clImage1); test.setArg(1, clResult);
  Event kernel_event, read_event;
  queue.enqueueNDRangeKernel(test, NullRange,
    NDRange(image1.width(), image1.height() ),
    NullRange, NULL, &kernel_event);
  cl::size_t<3> origin;
  origin.push_back(0); origin.push_back(0); origin.push_back(0);
  cl::size_t<3> region;
  region.push_back(image1.width() );
  region.push_back(image1.height() ); region.push_back(1);
  queue.enqueueReadImage(clResult, CL_TRUE,
                         origin, region, 0, 0,
               , NULL, NULL);

Here can be downloaded full source code for my test application (it contains short main.cpp under 30 lines, CMakeLists.txt, readme.txt explaining how to compile and run it, input image and the kernel). I use CImg library to load and save images. I double-checked that input opens as 32-bit RGBA image. I tried to run the kernel with AMD or NVidia SDK and got the same result.

Any idea why I get unexpected result?

You should define the local work size when enqueuing your kernel.
Your image is 200x200 (not a power of 2) so OpenCL is probably unable to find a valid local work size by itself.

I get a correct result from your kernel with a local work size of 20x20.

OpenCL implementations probably use GCD calculators to choose good values so it’s only prime numbers that are problematic - but even if they chose 1x1 that kernel would still function fine. It’s multiples of 32 or 64 which are ideal, not powers of 2. Besides, the code is running fine, otherwise he wouldn’t get the output he is getting!

Any idea why I get unexpected result?

The only obvious one given that you have a white pixel every 4 columns is that the data is actually being processed as greyscale (one can pass greyscale to a colour image and process it 4 pixels at a time this way). This is exactly the result you would expect in that case - but I can’t see how that is happening. Unless the image library is doing something strange with .data().

Actually, the kernel works fine for me even with a local work size set to null (both with NVIDIA OpenCL and Intel OpenCL).
However, I used GDI+ to load and save bitmaps, so the problem could indeed come from your bitmap library.

Thanks for your help, the problem was in CImg. I still do not understand what was wrong with it, but with Magick++ everything works as expected. If somebody interested, I posted fixed source code at stackoverflow.