Can somebody tell me an example of How OpenCL works with subbuffer?

I have this buffer
cl_mem buffer_salida_sumsi = clCreateBuffer(contextCPU, CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR,length * sizeof(float), salida_sumsi, NULL );
this is the argument of kernel
clSetKernelArg(kernel_rama_inferior, 0, sizeof(int), &M);

clSetKernelArg(kernel_rama_inferior, 25, sizeof(float), &buffer_salida_sumli);
And this is queue
please, can somebody tell me how make this with subbuffer???

Your clSetKernelArg for index 25 is wrong. The sizeof should be of cl_mem, not float.

A subbuffer is just a buffer that uses a parent buffer for storage. It can have an offset within the larger buffer and a smaller size. So after you create your first buffer you could create a subbuffer within it. Typical applications are for distributing work across multiple GPUs. What did you have in mind?

Look, I have this:
cl_mem buffer_entrada = clCreateBuffer(contextCPU, CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR,20000* sizeof(float), entrada, NULL );
then I create the subbuffer:

clGetDeviceInfo(info[deviceCPU].device_id, CL_DEVICE_MEM_BASE_ADDR_ALIGN,0,NULL,NULL);
cl_buffer_region cpuBufferRegion1 = { 0, cpuPortion};
cl_buffer_region cpuBufferRegion2 = { cpuPortion, theRest};
cl_mem subbufferCPU1 = clCreateSubBuffer(buffer_entrada, 0,CL_BUFFER_CREATE_TYPE_REGION, &cpuBufferRegion1, &errores);
cl_mem subbufferCPU2 = clCreateSubBuffer(buffer_entrada, 0,CL_BUFFER_CREATE_TYPE_REGION, &cpuBufferRegion2, &errores);

The variable cpuPortion and theRest are in byte, but my original buffer is float, how take it the first half of my array?? I need to convert 1 float to byte and then multiply it per 10000!!!
You understand me???

My next question: which I put in the argument of kernel :
clSetKernelArg(kernel_rama_inferior, 0, sizeof(cl_mem), &M_subbufferCPU1 );
clSetKernelArg(kernel_rama_inferior, 0, sizeof(cl_mem), & buffer_entrada );

thank you for everything

Right, you understand that the cl_buffer_region is in bytes. Seems to me both cpuPortion and theRest should be set to 10000 * sizeof(float).

For your kernel argument you’d put whichever buffer you want the kernel to run. Since both buffer_entrada and M_subbufferCPU1 start at the same place, either could be used if you’re only accessing 10000 elements.

thank you for the help!!! I will prove and then tell you!!