Local memory allocation

pplaszew · June 13, 2010, 3:41am

Hi everyone,
I’ve read somewhere (some forum I cannot recall right now) that allocating local (“shared” in nvidia cuda nomenclature) memory statically like below should be avoided since it’s implementation dependend:

__local float  s_elData[32];

The dynamic allocation using kernel args and clSetKernelArg should be used instead:

__kernel void kernelP1(
	__local float* s_elData,
        //...

and (in host code):

clSetKernelArg(kernel, 1, 32 * sizeof(float), NULL);

Unfortunately when I’m using the latter method my register usage increases from 14 to 19 - no other change in code, just the way of allocation. So I rather stick to the former - static - method of allocation - is it safe or really should be avoided?
Thanks!

matrem · June 14, 2010, 1:30am

Where do you define this variable?
At program scope I guess?

pplaszew · June 14, 2010, 4:32am

You mean when allocating statically? In kernel. Like this:


__kernel void K(
    //.. kernel args
){
    //definition of s_el
    __local float s_el[32];
    //.. download data from global to s_el, make computations in parallel, store results from s_el back to global
}

So kernel scope. I need it only to download some data from global memory to it and then perform a lot of computations in the kernel and store the results back to global memory.
It’s working (on nvidia opencl implementation) and the reg consumption is lower then if I allocated dynamically with kernel arguments and clSetKernelArg (The s_el array is always constant size so I don’t need dynamic allocation). Is this way of defining variables in local mem all right?

matrem · June 14, 2010, 7:04am

I guess it’s right.
Perhaps it’s a bug in nVidia implementation.

david.garcia · July 13, 2010, 3:47pm

You are doing things fine. Declaring a local variable at kernel scope is perfectly legal. See section 6.5.2 of the CL 1.1 spec; there’s even an example. Don’t worry about that.