OpenCL data transfer

I just started building a code for parallel computation with OpenCL.

As far as I understand, the data generated from CPU side (host) is transffered through the buffers (clCreateBufferclEnqueueWriteBufferclSetKernelArg, then processed by the device).

I mainly have to deal with arrays (or matrices) of large size with double precision.

However, I realized the code never runs for arrays larger than 8000 entries with errors. (This makes sense because 64kb is equivalent to 8000 double precision numbers.)

The error codes were either -6 (CL_OUT_OF_HOST_MEMORY) or -31 (CL_INVALID_VALUE).

One more thing when I set the argument to 2-dimensional array, I could set the size up to 8000 x 8000.

So far, I guess the maximum data size for double precision is 8000 (64kb) for 1D arrays, but I have no idea what happens for 2D or 3D arrays.

QUESTIONS

  1. Is there any other way to transfer the data larger than 64kb?

  2. If I did something wrong for OpenCL setup in data transfer, what would be recommended?