I’ve read somewhere (some forum I cannot recall right now) that allocating local (“shared” in nvidia cuda nomenclature) memory statically like below should be avoided since it’s implementation dependend:
__local float s_elData;
The dynamic allocation using kernel args and clSetKernelArg should be used instead:
__kernel void kernelP1(
__local float* s_elData,
and (in host code):
clSetKernelArg(kernel, 1, 32 * sizeof(float), NULL);
Unfortunately when I’m using the latter method my register usage increases from 14 to 19 - no other change in code, just the way of allocation. So I rather stick to the former - static - method of allocation - is it safe or really should be avoided?
Where do you define this variable?
At program scope I guess?
You mean when allocating statically? In kernel. Like this:
__kernel void K(
//.. kernel args
//definition of s_el
__local float s_el;
//.. download data from global to s_el, make computations in parallel, store results from s_el back to global
So kernel scope. I need it only to download some data from global memory to it and then perform a lot of computations in the kernel and store the results back to global memory.
It’s working (on nvidia opencl implementation) and the reg consumption is lower then if I allocated dynamically with kernel arguments and clSetKernelArg (The s_el array is always constant size so I don’t need dynamic allocation). Is this way of defining variables in local mem all right?
I guess it’s right.
Perhaps it’s a bug in nVidia implementation.
You are doing things fine. Declaring a local variable at kernel scope is perfectly legal. See section 6.5.2 of the CL 1.1 spec; there’s even an example. Don’t worry about that.