I cannot use 16K shared memory in GTX9800

linyufly · August 28, 2012, 1:38pm

Hi,

It seems that I can only use 15K in such a 1.0 architecture.

Is it normal or something wrong in my code?

Thanks!

o4kareg2 · August 28, 2012, 2:36pm

Dont forget, shared memory is lvl2 cache, so another values, like variables or values from global
array are cached there. Avoid to be close to memory limit!

linyufly · August 28, 2012, 3:28pm

Oh really?!

Is it true that when my shared memory usage is close to the limit, the execution will be slower because the L2 cache has only a little memory left?

Usually how many percentages of shared memory occupation are safe enough?

Thanks!

o4kareg2 · August 28, 2012, 6:14pm

True, because if insufficied memory, it started to pack variables in global memory, at least there is so in CUDA and in OPENCL is the same i guess. With memory limits people usualy optimize the stuff wildly
and if they want to launch that on different GPU they must rewrite pretty mutch of a code
Safe enoph… i thing the half should be safe enoph everywhere.

linyufly · August 28, 2012, 6:46pm

Thanks a lot!

notzed · August 29, 2012, 3:21am

I think Local memory is also used for kernel parameter passing. And possibly other housekeeping stuff by the runtime.

Local memory has to be dedicated memory units (in order to achieve the performance they do), these have no need for a path outside of the local processor. i.e. it has nothing to do with L2 or even L1, which are only concerned with global memory accesses.

So use as much as you need - the main issue is using too much in one work-group limits how many workgroups can run concurrently on a given processing core, which may reduce performance.

linyufly · August 29, 2012, 7:29am

Oh I see. So that is why I cannot use all of the 16K memory for my local memory?

Thanks!