I cannot use 16K shared memory in GTX9800


It seems that I can only use 15K in such a 1.0 architecture.

Is it normal or something wrong in my code?


Dont forget, shared memory is lvl2 cache, so another values, like variables or values from global
array are cached there. Avoid to be close to memory limit!

Oh really?!

Is it true that when my shared memory usage is close to the limit, the execution will be slower because the L2 cache has only a little memory left?

Usually how many percentages of shared memory occupation are safe enough?


True, because if insufficied memory, it started to pack variables in global memory, at least there is so in CUDA and in OPENCL is the same i guess. With memory limits people usualy optimize the stuff wildly
and if they want to launch that on different GPU they must rewrite pretty mutch of a code :smiley:
Safe enoph… i thing the half should be safe enoph everywhere.

Thanks a lot!

I think Local memory is also used for kernel parameter passing. And possibly other housekeeping stuff by the runtime.

Local memory has to be dedicated memory units (in order to achieve the performance they do), these have no need for a path outside of the local processor. i.e. it has nothing to do with L2 or even L1, which are only concerned with global memory accesses.

So use as much as you need - the main issue is using too much in one work-group limits how many workgroups can run concurrently on a given processing core, which may reduce performance.

Oh I see. So that is why I cannot use all of the 16K memory for my local memory?