I have MacOS 10.6.2 (Snow L) and ATI Radeon HD4870. For this card, CL_DEVICE_MAX_WORK_GROUP_SIZE=1024. However, for some reason I cannot use any work group sizes larger than 256, for example if I try to use 512x1x1, clEnqueueNDRangeKernel reports CL_INVALID_WORK_GROUP_SIZE. Any ideas why this can be happening? Can it be something inherent to the ATI Stream?

Here’s my code (error handling stripped, since no errors are generated in the middle):

clGetPlatformIDs(max_num_platforms, platforms, &num_platforms);
clGetDeviceIDs( platforms[0], CL_DEVICE_TYPE_GPU, num_entries, devices, &num_devices);
device = devices[0];
cmd_queue = clCreateCommandQueue(context, device, 0, NULL);
program = clCreateProgramWithSource( context, 1, &kernel_str, NULL, NULL);
clBuildProgram(program, 1, &device, NULL, NULL, NULL);
kernel = clCreateKernel(program, "inc", NULL);
cl_mem memobj = clCreateBuffer(context, CL_MEM_READ_WRITE | CL_MEM_COPY_HOST_PTR, sizeof(float)*n, srcA, NULL);
clSetKernelArg(kernel, 0, sizeof(cl_mem), (void *)&memobj );

size_t gws = 512, lws = 512;
err = clEnqueueNDRangeKernel(cmd_queue, kernel, 1, NULL, &gws, &lws, 0, NULL, NULL);

A kernel is very simple:

__kernel void inc (__global const double *a) {
      int k = get_global_id(0);

You should use the work-group size value returned by clGetKernelWorkGroupInfo(kernel, CL_KERNEL_WORK_GROUP_SIZE, …). Can you check what this value is?

Note that CL_DEVICE_MAX_WORK_GROUP_SIZE is the max. workgroup size that can be used on device but the max. work-group size value can vary from kernel to kernel depending on resources used by the kernel. You should always use the value returned by clGetKernelWorkGroupInfo.

Thank you, this explains everything. It reports max available work group size = 256 for my kernel.

with a ATI RADEON 5870 i have the same problem (on NT)
MAX WORK ITEM = 1024 but it works only with max size = 256
I thinks it’s just a bug in the AMD driver for the moment.