global_work_size seems to have a limit


My OpenCL kernel seems to have issues when I execute it with a large global_work_size.
On a Macbook pro with an ‘Intel Iris 1536 MB’ it returns no results (all ints in the out buffer are 0).
200_000_000 works fine but 300_000_000 does not.
I searched for limits on the global_work_size but if I understand correctly there are no limits.
Does anyone have a clue about why this is happening?

I stripped down my kernel to an example that is as simple as possible:

__kernel void process_moves_with_local(__global int* out)
	int global_id = get_global_id(0);

	int test[128];
	for (int i=0; i < 128; i++) {
		test[i] = i;

	if (global_id < 5) {
		out[1] = test[1];

	out[0] = 2;

This is off course a silly example but it’s enough to demonstrate the issue.
I write the test[1] to the output buffer as otherwise the issue does not occur.
I guess the compiler optimises the code and removes the array initialisation when I do that.

I run the kernel with this host code:

	int nrOfMoves = 10;

	final int dstArray[] = new int[nrOfMoves];
	final Pointer dst =;
	final cl_mem memObjects[] = new cl_mem[1];
	memObjects[0] = clCreateBuffer(context.context, CL_MEM_READ_WRITE, Sizeof.cl_int * nrOfMoves, null, null);
	clSetKernelArg(kernel, 0, Sizeof.cl_mem,[0]));

	final long global_work_size[] = new long[] { 200000000 };
	final long local_work_size[] = new long[] { 64 };

	clEnqueueNDRangeKernel(commandQueue, kernel, 1, null, global_work_size, local_work_size, 0, null, null);

	clEnqueueReadBuffer(commandQueue, memObjects[0], CL_TRUE, 0, nrOfMoves * Sizeof.cl_int, dst, 0, null, null);

Thanks in advance,