clGetKernelWorkGroupInfo does not return correct local mem

I am trying to query the local memory usage of a kernel.
The kernel takes a local memory argument.
I am setting the size of the local memory with clSetKernelArg.
Then simple try to use the function clGetKernelWorkGroupInfo with CL_KERNEL_LOCAL_MEM_SIZE to get the local mem usage.
It gives me incorrect number.
I am using cuda OpenCL.
Any ideas?
Sunil Sathe

Can you show us the source code you are using to query the local memory size? Please include the calls to clSetKernelArg() and the source code of the kernel (at least we need to see the kernel arguments and all variable declarations).

What value did you expect? What value was returned? Did you check whether the call to clGetKernelWorkGroupInfo() returned an error code?

Here is the kernel function

__kernel void jacobi_local(int rows, __global int *ia, __global int *ja,
__global float *diag, __global float *off_diag,
__global float *x, __global float *b,
__global float *y, __local int *lja,
__local float *loff_diag)

Here is the kernel argument setting and local mem query

  err = clSetKernelArg(*smooth_kr, 8, sizeof(int)*maxnz, NULL);
  err = clSetKernelArg(*smooth_kr, 9, sizeof(float)*maxnz, NULL);
  err = clGetKernelWorkGroupInfo(*smooth_kr, *device_id, CL_KERNEL_LOCAL_MEM_SIZE , sizeof(cl_ulong), &local_mem, NULL);

maxnz ~ 1500, err returned is CL_SUCCESS.
The value of local_mem is always 48 no matter what maxnz is.



Sounds like a bug in the implementation, I would report it to your vendor.

Before checking the spec, I didn’t realize that CL_KERNEL_LOCAL_MEM_SIZE was supposed to include the dynamically set arg sizes in the total. It may be that the driver implementor didn’t either. :frowning:

I had a similar problem with an earlier release of the Intel OpenCL implementation, posted this issue earlier, turned out to be an incorrect return value from CL_DEVICE_MAX_MEM_ALLOC_SIZE .

I guess we can only hope these drawbacks will stabilize with updates…