What is global_work_size in clEnqueueNDRangeKernel?


I’m following this tutorial:

I was doing fine until I got to this line. And that’s where I hit a snag. I don’t understand what global_work_size means in the context of telling my GPU to go and make those computations.

    size_t local_item_size = 64; // Divide work items into groups of 64
    ret = clEnqueueNDRangeKernel(command_queue, kernel, 1, NULL, 
            &global_item_size, &local_item_size, 0, NULL, NULL);

Does this mean that this is the value that gets set to i when you get to the *.cl code?

int i = get_global_id(0);

Yeah, I’m not getting it…

Also, for some reason I can’t post links… which prevents me from sharing the complete context with you…

OpenCL and clEnqueueNDRangeKernel are all about parallel execution.

The global work size is “how many work items do I want to compute?”

Then, inside your parallel kernel execution, “get_global_id(0)” is “what work item is this thread of execution responsible for?”

In the context of an example problem, let’s say you wanted to check every character of a string and replace “A” with “B”. If your string was 1000 characters long, your global work size would be 1000. Inside the kernel you’d use “get_global_id” and access just that character, and if it is “A” then write back a “B”. The OpenCL runtime will ensure that your kernel gets run across all 1000 characters in the string (perhaps all at once, perhaps in groups; it’s up to the runtime with some control from you if you set the work group size but ignore that until you fully understand this first part).

Make sense?

There are some fantastic OpenCL tutorials out these; please seek them out and soak them up. You’ll learn more there than a question at a time here.