clEnqueueReadBufferRect works with int but not float/double?


#1

Hi,

I’m trying to use clEnqueueBufferRect to copy data out of a buffer with some stride. It works just fine when the buffer is filled with int's, but if I change it to float or double it does not work anymore. By that I mean that it doesn’t seem to copy anything, the receiving buffer remains unchanged with no values written into it. Here is a small compilable test code that demonstrates the issue (here I use the C++ API but the same happens with the C API):

#include <iostream>
#define __CL_ENABLE_EXCEPTIONS
#include <CL/cl.hpp>

int main(int argc, char** argv) {

    // Set up OpenCL environment

    cl::Context context;
    cl::Device device;
    cl::CommandQueue queue;

    try {

        std::vector<cl::Platform> all_platforms;
        cl::Platform::get(&all_platforms);
        cl::Platform tauschcl_platform = all_platforms[0];

        std::vector<cl::Device> all_devices;
        tauschcl_platform.getDevices(CL_DEVICE_TYPE_ALL, &all_devices);
        device = all_devices[0];

        std::cout << "Using OpenCL device " << device.getInfo<CL_DEVICE_NAME>() << std::endl;

        // Create context and queue
        context = cl::Context({device});
        queue = cl::CommandQueue(context,device);

    } catch(cl::Error &error) {
        std::cout << "OpenCL exception caught: " << error.what() << " (" << error.err() << ")" << std::endl;
        return 1;
    }


    /*********************/
    // Thus works with int
    // but not float nor double
    typedef int buf_t;
    /*********************/

    // Start buffer, length 10, filled with integers from 1 to 10
    buf_t *buf1 = new buf_t[10]{};
    for(int i = 0; i < 10; ++i)
        buf1[i] = i+1;

    // create an opencl buffer with same content
    cl::Buffer clbuf(queue, &buf1[0], &buf1[10], true);

    // receiving buffer of length 5, initialised to zero
    buf_t *buf2 = new buf_t[5]{};

    // buffer/host offsets are both (0,0,0)
    cl::size_t<3> buffer_offset;
    buffer_offset[0] = 0; buffer_offset[1] = 0; buffer_offset[2] = 0;
    cl::size_t<3> host_offset;
    host_offset[0] = 0; host_offset[1] = 0; host_offset[2] = 0;

    // We copy 5 values (with stride of 2)
    cl::size_t<3> region;
    region[0] = 1; region[1] = 5; region[2] = 1;

    try {
        queue.enqueueReadBufferRect(clbuf,
                                    CL_TRUE,
                                    buffer_offset,
                                    host_offset,
                                    region,
                                    2*sizeof(buf_t),    // buffer stride of 2
                                    0,
                                    1*sizeof(buf_t),    // host stride of 1
                                    0,
                                    buf2);

    } catch(cl::Error &error) {
        std::cout << "OpenCL exception caught: " << error.what() << " (" << error.err() << ")" << std::endl;
        return 1;
    }

    // print result
    for(int i = 0; i < 5; ++i)
        std::cout << "#" << i << " = " << buf2[i] << " --> should be " << 2*i+1 << std::endl;

    return 0;

}

The above code works just fine as is (with int), but it can be observed to fail when changing int in line 38 (the typedef) to float or double, suddenly nothing seems to be copied. But why?


#2

Thanks to Quimby on Stack Overflow for a solution to this:

The OpenCL 1.2 html reference is a little misleading (same for the OpenCL 1.2 C++ Wrapper API reference), the 2.0 and later references use clearer language as to how to specify the region parameter:

region defines the (width in bytes, height in rows, depth in slices) of the 2D or 3D rectangle being read or written. For a 2D rectangle copy, the depth value given by region[2] should be 1. The values in region cannot be 0.

Thus, this line

region[0] = 1; region[1] = 5; region[2] = 1;

needs to be changed to

region[0] = 1*sizeof(buf_t); region[1] = 5; region[2] = 1;

and now it works.

Maybe the language in the OpenCL 1.2 reference should be updated (corrected) to be a little clearer here.