I wrote a simple kernel where I’m calling get_global_id(0) and print this number on the screen. The host application runs this kernel in the loop only just to look what happens and I get strange result on the screen - I have set only 256 work-items in clEnqueueNDRangeKernel(…) where global_work_size = 256 and local_work_size = 256 , but sometimes it prints also the numbers from 256 to 511. I don’t understand why, it looks like there is running more than 256 threads…any idea why?
You must have given 256 as offset parameter too. That adds that number to all workitems’ global id values. If there is no multiple gpus, you may not need that value to be other than zero.
cl_int clEnqueueNDRangeKernel ( cl_command_queue command_queue,
const size_t *global_work_offset, <------ this pointed value should be zero
const size_t *global_work_size,
const size_t *local_work_size,
const cl_event *event_wait_list,