work item adjacency for 2D and 3D work ranges

Does the OpenCL spec address essentially the issue of row-major versus column-major ordering for work items in 2D and 3D work ranges? Or is this up to the implementation?

Being a C based language, I’ve assumed that work items are assigned to compute units such that the highest dimension index varies fastest. i.e. for a 2x2 ND Range, the compute units work items in the order [0][0], [0][1], [1][0], [1][1].

This is important because if I want to access elements of a 2D or 3D array in global memory based on a work items global id, I want the array elements to follow the same ordering scheme.

A little experimentation and some code in the NVIDIA OpenCL best practices guide suggests the opposite ordering is natural.

int row = get_global_id(1);
int col = get_global_id(0);
c[row*N+col] = sum;

But could another vendor make a different choice?

This is a very interesting question. The OpenCL specification doesn’t give any guarantees about the scheduling order of work-items inside a work-group (see section 3.2), in part because all work-items inside a work-group could be executing in parallel and the specification intentionally avoids making performance claims.

As for what your hardware vendor recommends, isn’t it what you would expect? They indicate a row-major order like section of the C99 standard specifies for multidimensional arrays. It’s certainly the order I would have expected :slight_smile:

Thanks. This ordering issue plagues me every new time I encounter it in life. :slight_smile:

I wouldn’t naturally use the semantics row = get_global_id(1);
I would choose row = get_global_id(0);
But I’m usually wrong about such things. :stuck_out_tongue: