basic question regarding get_global_id

prince · November 25, 2012, 7:14am

In kernel code if i use

size_t id =get_global_id(0);
output[id] =input[id]*input[id];

then what will be the value of id and
if i use

size_t id=get_global_id(1);
output[id] =input[id]*input[id];
then what will be the value of id because o dimension means what???

as i under stand if it is 1 means one dimensional means
output[1]=input[1]*input[1];
means first element of input[] array

if input array is of 10 length then what will be the value of id for 2nd core,3rd core so on…

i am confused pls help
thanks

Dithermaster · November 25, 2012, 7:48am

Hopefully this has been cleared up in the other thread.

get_global_id returned the index into the work for a given work item. Each work item will get a different id. Each unit of work will get executed. So if you have a 1D array to process, pass the size as the global size and write your kernel to use get_global_id to figure out which item to process. Study the example code; these are critical concepts to understand fully.

You only need get_global_id(1) for 2D kernels (and likewise get_global_id(2) for 3D kernels). You can’t do more than 3D kernels; higher dimensional work has to be subdivided down into 1D, 2D, or 3D.

prince · November 25, 2012, 7:54pm

size_t id=get_global_id(0);
it means id =0 for 1st core of GPU
so it will pass to
output[0]=input[0]*input[0];

means first element of input array after squaring will be stored in output array
if initial say 5 core are busy and statement gives
size_t id=get_global_id(0);
id =6;
so it will pass to array so
output[6]=input[6]*input[6];
means wt about initial 5 element of the array which i have declared

thanks

Dithermaster · November 26, 2012, 5:17am

How is it that initial 5 cores are “busy”?

It doesn’t matter how many cores there are or what other work they are working on. If you specify a certain global work size, all units of work will get done. If not at the same time (which can happen for global work sizes smaller than the core count), then in groups.

Example:
4 core machine.
Global size = 10.
First work group will process 0 to 3.
Second work group will process 4 to 7.
Third work group will process 8 and 9.
Now all work items have been processed.
The OpenCL runtime handles breaking up the global work size into work groups and assigning each work item a unique ID.

With a GPU work groups are larger; often 32 or 64 but they can be bigger. To get best performance you should design your work so the global work size is in the thousands or tens of thousands. If your work is smaller than that then the device can be underutilized and performance will suffer.

chippies · November 27, 2012, 1:44am

get_global_id returns the number for the current thread. The parameter is just the dimension of the array of threads. When you enqueue a kernel, one of the parameters is an int array global_work_size. If global_work_size is an int[2] (2 dimensional array of threads) then each thread will have a 2 dimensional identifier. Lets say global_work_size[0] = 3 and global_work_size[1] = 3 then there will be 3 * 3 = 9 threads in total in a grid something like this:

| 0,0 | 0,1 | 0,2 |

| 1,0 | 1,1 | 1,2 |

| 2,0 | 2,1 | 2,2 |

So the thread in the top left corner will have get_global_id(0) = 0 and get_global_id(1) = 0; The one just below it will have get_global_id(0) = 1 and get_global_id(1) = 0.

Edit: have a look at this link for another picture that might help: http://www.khronos.org/message_boards/viewtopic.php?f=28&t=5375.

prince · November 27, 2012, 4:39am

means if one kernel use say global work size =10 and other kernel uses the global work size =10,in both get_global_id will be 0,and i not getting because which link u have provide in that link u told that global id is unique across gpu so if there are 2 kernel using same gpu then both will return get_global_id=0 or other because u told that get_global_id is unique across the gpu in image u can see which u have given about global_id.

__kernel array_sum (__global float* A, __global float* B, __global float* C)
{
int idx = get_global_id(0);/this is unique or not across gpu if 2 kernel are using as u told global id will be unique pls clarify thanks/

C[idx] = A[idx] + B[idx];
}

Dithermaster · December 1, 2012, 1:03pm

But if you have two kernels it’s OK for get_global_id to return the same value, just like it’s OK for two houses to have the same house number, because they are on different streets. It’s just like having two C arrays; they each have an index 0, but they refer to different elements because there are two arrays.

In your array_sum example, get_global_id(0) will return 0 to the global size minus one. If you have another kernel (perhaps called array_difference) then get_global_id(0) will again return 0 to global size minus 1.

I wonder if you are confusing “kernel” with “work item”? I kernel is a piece of code that gets executed across a number of work items in parallel. Inside each work item, you can find out which item of work you are supposed to do by using get_global_id.

An analogy would be if I had a team of interns to grade papers. I say “start” and the first thing they each need to do is grab a paper, but which one? So they each call get_global_id and I give them each a unique paper number (index) and they grab that paper and grade it. In your example kernel (array_sum) each work item calls get_global_id in order to figure out which array element to sum.

basic question regarding get_global_id

size_t id =get_global_id(0); output[id] =input[id]*input[id];

| 0,0 | 0,1 | 0,2 |

| 1,0 | 1,1 | 1,2 |

| 2,0 | 2,1 | 2,2 |

size_t id =get_global_id(0);
output[id] =input[id]*input[id];