I’m a OpenCL neophyte so take what I say with a big grain of salt. Besides the obvious that this type of processing would probably run better on a multi core CPU. I’m not sure that GPU’s were designed for this type of processing. You could enqueue your kernel with a global work size of 1, but I think you will have problems with the local work size. I think it needs to at least match the number of stream processors in each multiprocessor (this might only apply to Nvidia hardware?). You might be able to limit the kernel to run on each multiprocessor but that will be limited to the number of multiprocessors your card has. My card (a GeForce 9400 GT) only has two multiprocessors, so if you ran on it, you would be limited to just two processes. That is my understanding anyway, it could be wrong. If it is, please someone let me know.