Beginner question: understanding NDRange

Rui · March 9, 2010, 1:35pm

Hello,

I’m learning OpenCL and I’m having some difficult understanding the NDRange concept. All I have to understand is that it is related to the positions of work-items, etc right? Where can I find some simple examples about 1, 2 and 3 dimension problems?

Thanks in advance

dominik · March 9, 2010, 2:44pm

A very basic example for a 1-dimensional problem is vector addition. Work-item X is responsible for adding the elements at position X of the input vectors.
An example for a 2-dimensional problem is matrix multiplication. Work-item (x,y) computes the result of element (x,y) in the output matrix.
I can’t think of a nice 3D problem right now, but I hope that you get the idea…

Rui · March 9, 2010, 8:24pm

I see, and from what I’ve read a 3 dimensional problem could refer to a volume for example.
Apart from that, what I could conclude from the examples I saw was, and please correct me if I’m wrong: imagining a 1024x1024 image where we want to apply some filter to each pixel.

The operation to apply would be defined within a kernel, which would be executed 1024*1024 times
Those 1024*1024 kernel instances (work-items) could be grouped in 128x128 work-groups (number defined by the user, according to some criteria).
NDRange means ‘simply’ the dimensional index space from where all those kernel instances could be identified

dominik · March 9, 2010, 11:54pm

That’s right. NDRange describes the space of work-items, which can be 1-, 2- or 3-dimensional. Each work-item then executes the kernel code (usually on different data depending on its position in the work-item space).
As you said, in your example you would create 1024x1024 work-items that each apply the filter to one pixel in your image, namely work-item (x,y) applies the filter to pixel (x,y).

Rui · March 10, 2010, 6:29am

One last thing, in the OpenCL specification it’s written that “Work-groups are assigned a unique work-group ID with the
same dimensionality as the index space used for the work-items”. I’m not quite understanding what does an “id with the same dimensionality as the index space” means.

Thanks a lot!

Rui · March 10, 2010, 7:36am

Another one came up, you said a 2D problem could be for example a matrix multiplication, but I read also that buffer objects in OpenCL are one-dimensional, and for 2D or 3D you have image objects. Does that mean in fact that a problem like matrix multiplication (“apparently” 2D) would have to be mapped to a 1 dimensional problem?

dominik · March 10, 2010, 7:37am

It means that if your work-item space is n-dimensional, your work-group space is also n-dimensional.
To come back to your example: if you have 1024x1024 work-items (in 2 dimensions), then your workgroups are also arranged in two dimensions, i.e. they are identified by two IDs. For example you can have 64x64 workgroups of size 16x16.

dominik · March 10, 2010, 7:40am

You would have to map the matrix (2-dimensional) to a buffer (1-dimensional), but it’s still possible (and sensible) to use a 2-dimensional space of work-items. You just have to make sure you get the indexing right when accessing the matrices.

wilson · March 10, 2010, 8:03am

So, if i call this opencl function size_t get_local_id (uint dimindx) , this would return an unidimensional id of a work-item from a workgroup in the dimindx Dimension. Right?

And the if i want a a bi-dimension coordinates i would have to use the equation, at page 20 from specs,
(gx , gy) = (wx * Sx + sx , wy * Sy + sy)

right?

Thanks

dbs2 · March 10, 2010, 9:32am

get_local_id() will return your ID within your current work-group.
E.g., if your work-group size is 100, and you have 1000 global work-items, work-item 201 will return local_id of 1 (since it’s item 1 in work-group 2).

get_global_id(0) will return the 0th dimension global ID. (x)
get_global_id(1) will return the 1st dimension global ID. (y)
etc.

Douglas125 · March 30, 2010, 11:08am

Hi;

Not all arguments of EnqueueNDRangeKernel are being used right now. Some are there for future implementations. It should be OK to leave some values for OpenCL to decide (like the number of workgroups).

As for the work-items and work-dimension question, here is a simple answer. Have in mind that in OpenCL all codes are executed at the same time and in regular code they are executed in sequence, which is a big difference.

work_dim = {n}
Regular Code:

for (int i=0; i < n; i++)
{
     //your code
}

OpenCL code:

kernel ( /*your arguments*/ )
{
    i = get_global_id(0);
   //your code
}
EnqueueNDRange(yourkernel, work_items, etc.)

work_dim = {n, p}
Regular Code:

for (int i=0; i < n; i++)
{
   for (int j=0; j < p; j++)
   {
     //your code
   }
}

OpenCL code:

kernel ( /*your arguments*/ )
{
    i = get_global_id(0);
    j = get_global_id(1);
   //your code
}
EnqueueNDRange(yourkernel, work_items, etc.)

I hope I mande things clearer. I am currently working on an OpenCL tutorial that may help you. I have made it available at
http://www.cmsoft.com.br/ , developers section. I have not reached advanced topics yet but it might help.

grimm · April 13, 2010, 6:54pm

This is a very useful thread for me. It has helped to settle my thinking on the NDRange values, thanks. I hope that this does not highjack the thread too much but I don’t quite understand the relationship between local work size and the get_local_id function. In my code (that appears to be working) I have set the local work size to 128, but in my opencl code I’m assuming that the get_local_id function is limited to the actual number of stream processors. Is this true? I’m worried now because I have tied the local id to a buffer and that I might be accessing memory that is beyond the bounds of the buffer. In this case the buffer is an array 16 items long, because my card (geforce 9400) only has 16 stream processors. Should the buffer be 128 items long? As I said before the program runs fine and I haven’t had a crash because of this in all the times I have ran it. I just want to make sure that I haven’t just been lucky.

Thanks,

Grimm

ibbles · April 13, 2010, 11:37pm

grimm:

This is a very useful thread for me. It has helped to settle my thinking on the NDRange values, thanks. I hope that this does not highjack the thread too much but I don’t quite understand the relationship between local work size and the get_local_id function. In my code (that appears to be working) I have set the local work size to 128, but in my opencl code I’m assuming that the get_local_id function is limited to the actual number of stream processors. Is this true? I’m worried now because I have tied the local id to a buffer and that I might be accessing memory that is beyond the bounds of the buffer. In this case the buffer is an array 16 items long, because my card (geforce 9400) only has 16 stream processors. Should the buffer be 128 items long? As I said before the program runs fine and I haven’t had a crash because of this in all the times I have ran it. I just want to make sure that I haven’t just been lucky.

get_local_id has no relation to the number of stream processors. If, as you say, the local work size if 128, then get_local_id will return a value in the range from 0 to 127.

grimm · April 14, 2010, 3:11pm

Thanks Ibbles, now to go fix my code. I must have been lucky then. 8)

Grimm

GSS_Mahadevan · April 21, 2010, 12:19am

Douglas/ibbles,

Thanks for your helpful replies. I have written small Java program using nativelibs4java that will emphasize the global-ids, local-ids and group-ids. I ran the kernel on my HD5750.

To get global-ids, local-ids and group-ids for a global-work-size of 256 and local-size=4, run the following command (with proper OpenCL for Java setup and CLASSPATH).

java -DGLOBAL=256 -DLOCAL=4 com.nativelibs4java.opencl.demos.NDRange1

Same kernel can be tried using C/C++ to get same result