Running kernel on GPU causing CPU to leak memory

Hi. I have a kernel which when run on the CPU works fine.

However, when I run the kernel on a GPU my system profiler indicates about 1MB / sec of system memory (RAM) is leaking. This is still the case after I modify my kernel to return immediately.

Any of you folks have any idea what might be going on? My kernel has many arguments but I checked and I can have up to 9 __constant arguments for my hardware. I’ve included some specifics to the end of this post. Cheers. Dave.

these look like a lot of args, but the total memory required for all __constant and __local memory is probably around 1KB.

// Execute one complete step of hier-leap
// This kernel should run on any number of threads: 1 to MAX_LOCAL_THREADS
void hierLeap(global int *globalState, int stateSize,
			  global float *time,
			  int constant *cellStartIdx,
			  float constant *rates,	
			  int2 constant *inpSpec, 
			  int2 constant *outSpec,
			  constant int *highDeltaS, constant int *lowDeltaS, 
			  constant struct DeltaX *scDeltaHighCell, int nScdHighCell,
			  constant struct DeltaX *scDeltaLowCell, int nScdLowCell,
			  global unsigned int *randomInts, int randIdx,
			  int leap,  
			  local int *highBoundS, local int *highBoundCellS, local int *state, local int *lowBoundCellS, local int *lowBoundS, 
			  local float *DboundHigh, local float *DboundCellHigh, local float *D, local float *DboundCellLow, local float *DboundLow,
			  local int *rxnEvents, local int *orderedRxnEvents, local float *taus)

called many times per sec; omitting this call results in no memory leaking.

		size_t localSize = 1; // different values still result in memory leak
		err |= clEnqueueNDRangeKernel(_cmd_queue, _hierLeap, 1, NULL, 
									  &localSize, &localSize, 0, NULL, NULL);	
		assert(err == CL_SUCCESS);

MacbookPro OS X 10.6 with NVIDIA GeForce GT 330M.
A few key values queried from GPU: MaxParamSize(4352), MaxConstArgs(9), LocalMemSize(16384).

Thanks to everyone who has looked into this.

I reduced the kernel to take only one __constant argument, and the memory leak persists. This thread suggests that there is a bug with the drivers: … D=11451666

I guess for now I’ll use __global memory in place of __constant.