Here’s what I need to achieve - I have an input image and for every pixel I calculate a new position and draw that same pixel in the new position into the destination image2d_t.

```
float3 rgb = read_imagef(src, smp, (int2)(x, y)).xyz;
dst_x = calc_x(rgb);
dst_y = calc_y(rgb);
write_imagef(dst, (int2)(dst_x, dst_y), (float4)(rgb, intensity));
```

The problem is that dst_x and dst_y can overlap for multiple source pixels and I would like to add the RGB value to dst, instead if overwriting it.

In other words I would like to be able to do something like this:

```
float3 rgb = read_imagef(src, smp, (int2)(x, y)).xyz;
dst_x = calc_x(rgb);
dst_y = calc_y(rgb);
float3 old_rgb = read_imagef(dst, smp, (int2)(dst_x, dst_y)).xyz;
write_imagef(dst, (int2)(dst_x, dst_y), (float4)(old_rgb + intensity * rgb, intensity));
```

I know it’s technically impossible (at least with OpenCL 1.2) but how would you go about implementing it?

Any algorithms to somehow accumulate the values in a local memory and then merge across the working groups to render it all to the destination image? Something like histogram generation, but here the result is as big as the source image and can be pretty big (so the local memory constraints could be limiting…).

Any suggestions would be greatly appreciated.