How do I pass 7 arrays to device for kernal processing?

biokhar · December 17, 2011, 5:03am

Warning: I’m an opencl newbie and an intermediate C++ programmer. Feel free to assume that I need to be spoken to like a child.

I’m writing an opencl program to look at a set of atoms (with x,y,z coordinates) and a set of rays. In C++ I made 2 for-loops. So for every ray, I looked at every atom to see if it touches the ray and if so get the distance and save the smallest distance. I’ve tried to convert it to opencl, but I’m certain I’ve missed some important points.

Previous version:
for(i=0;i<RAY_ARRAY_SIZE;i++)
{
smallestDistance[i] = 999.;
for(j=0;j<ATOMS_ARRAY_SIZE;j++)
{
if(ray i touches atom j)
a = distance to atom;
if(a<smallestDistance[i])
smallestDistance[i] = a;
}
}

So, I’m assuming I can do this in opencl by using 2 dimensions.

One dimension has rays indexed by i. Second dimension has atoms indexed by j.

I’ll make arrays for rays:
cl_mem double phi[RAY_ARRAY_SIZE], psi[RAY_ARRAY_SIZE], and rho[RAY_ARRAY_SIZE];

and arrays for atoms:
cl_mem double atomX[ATOMS_ARRAY_SIZE], atomY[ATOMS_ARRAY_SIZE], atomZ[ATOMS_ARRAY_SIZE], atomRadii[ATOMS_ARRAY_SIZE];

define kernal arguments:
errNum = clSetKernelArg(kernel, 0, RAY_ARRAY_SIZE*sizeof(double), phi);
errNum |= clSetKernelArg(kernel, 1, RAY_ARRAY_SIZEsizeof(double), psi);
errNum |= clSetKernelArg(kernel, 2, RAY_ARRAY_SIZEsizeof(double), rho);
errNum |= clSetKernelArg(kernel, 3, ATOMS_ARRAY_SIZEsizeof(double), atomX);
errNum |= clSetKernelArg(kernel, 4, ATOMS_ARRAY_SIZEsizeof(double), atomY);
errNum |= clSetKernelArg(kernel, 5, ATOMS_ARRAY_SIZEsizeof(double), atomZ);
errNum |= clSetKernelArg(kernel, 6, ATOMS_ARRAY_SIZEsizeof(double), *atomRadii);
if (errNum != CL_SUCCESS)
{
cerr << “Error setting kernel arguments.” << endl;
return 1;
}

define memory passed to & from gpu:
err = clEnqueueWriteBuffer(queue, phi, CL_TRUE, 0, RAY_ARRAY_SIZEsizeof(double), phi, 0, NULL, NULL);
err |= clEnqueueWriteBuffer(queue, psi, CL_TRUE, 0, RAY_ARRAY_SIZEsizeof(double), psi, 0, NULL, NULL);
err = clEnqueueWriteBuffer(queue, rho, CL_TRUE, 0, RAY_ARRAY_SIZEsizeof(double), rho, 0, NULL, NULL);
err = clEnqueueWriteBuffer(queue, atomX, CL_TRUE, 0, ATOMS_ARRAY_SIZEsizeof(double), atomX, 0, NULL, NULL);
err = clEnqueueWriteBuffer(queue, atomY, CL_TRUE, 0, ATOMS_ARRAY_SIZEsizeof(double), atomY, 0, NULL, NULL);
err = clEnqueueWriteBuffer(queue, atomZ, CL_TRUE, 0, ATOMS_ARRAY_SIZEsizeof(double), atomZ, 0, NULL, NULL);
err = clEnqueueWriteBuffer(queue, atomRadii, CL_TRUE, 0, ATOMS_ARRAY_SIZE*sizeof(double), atomRadii, 0, NULL, NULL);
if (err != CL_SUCCESS)
{
cerr << “Error passing gpu memory.” << endl;
return 1;
}

Is this the appropriate way to do this? Any advice would be appreciated. I know I’m doing clEnqueueWriteBuffer wrong because buffer and pointer should be different things, but I wasn’t sure which is which.

noah_r · December 18, 2011, 9:19am

Looking at your algorithm, I recommend a one-dimensional work range. This would probably be along your RAY_ARRAY. The reason is that you would be doing a global ‘reduce’ operation when saying “a<smallestDistance[i]” if you attempted to implement this as a 2D work range where you have a work item for every i,j pair. This would take some global synchronization that is not supported in OpenCL – not that it would be efficient even if it was.

Instead, each ray work item could check every atom, retaining the inner for loop in you OpenCL kernel.