For the code I’m trying to write, I plan on having each workgroup manipulate a set of 2D arrays. What is generally considered the easiest way to code this?
Right now, when I’m trying is flattening a 3D matrix, then addressing using some scheme like: matrix[(numGroupslocalSizewgroup)+localSize*x+y]
I’d like to copy each matrix in a local matrix per workGroup work on them from there.
Am I headed in the right direction? Are there any good examples of how to do this?
Here’s some example host code:
size_t global_worksize=N*N*computeUnits;
size_t local_worksize=N*N;
error=clEnqueueNDRangeKernel(cq, k_matTest, 1, NULL, &global_worksize, &local_worksize, 0, NULL, NULL);
error=clFinish(cq);
error=clEnqueueReadBuffer(cq, mem, CL_FALSE, 0, global_worksize*sizeof(int), matrix, 0, NULL, NULL);
N is the size of the matrix (it’s square) and computeUnits are the number of work groups I want (for now, it’s CL_DEVICE_MAX_COMPUTE_UNITS)
and my kernel code is:
__kernel void matrixTest( __global int *matrix) {
const size_t numGroups = get_num_groups(0);
const size_t localSize = get_local_size(0);
size_t wgroup = get_group_id(0);
size_t x = get_local_id(0);
size_t y = 0;
matrix[(numGroups*localSize*wgroup)+localSize*x+y]++;
}
I initialize the matrix to all zeros. I then expect each matrix to have a “1” in the first column of each row… but I don’t get that. In my kernel, I’m expecting localSize=N - is that correct?
I’m really knew to this, so I apologize if it’s totally wrong.
Thanks!