OpenCL weird error from clBuildProgram


I call

cl_int err;
clProgram=clCreateProgramWithSource(clContext, numSources, progSources, NULL, &err);


and immediately after that I call 
err = clBuildProgram(clProgram, 0, NULL, buildArgs, NULL, NULL);

err is now -42 - CL_INVALID_BINARY

Any idea why this is happening? There were no previous calls to clCreateProgramWithBinary();

Videocard: GTX 280; driver version: 260.99

  • videocard is GTX 480 :slight_smile:


I don’t know what kind of calls you have before the clBuildProgram, but here follows an example of how I use to compile my OpenCL codes:

clGetDeviceIDs(cpPlatform, CL_DEVICE_TYPE_GPU, 1, &cdDevice, NULL);
clCreateContext(0, 1, &cdDevice, NULL, NULL, &ciErrNum);
clCreateCommandQueue(cxGPUContext, cdDevice, 0, &ciErrNum);
shrFindFilePath(cSourceFile, file_path);
oclLoadProgSource(cPathAndName, "", &szKernelLength);
clCreateProgramWithSource(cxGPUContext, 1, (const char **)&cSourceCL, &szKernelLength, &ciErrNum);
clBuildProgram(cpProgram, 0, NULL, NULL, NULL, NULL);

Check the calls before your clCreateProgramWithSource, they might give you a clue.

I hope it helps.

There is a build log, which says:

ptxas fatal : Memory allocation failure
error : Ptx compilation failed: gpu=‘sm_20’, device code=‘cuModuleLoadDataEx_4’
: Considering profile ‘compute_20’ for gpu=‘sm_20’ in ‘cuModuleLoadDataEx_4’
: Retrieving binary for ‘cuModuleLoadDataEx_4’, for gpu=‘sm_20’, usage mode=’ ’
: Considering profile ‘compute_20’ for gpu=‘sm_20’ in ‘cuModuleLoadDataEx_4’
: Control flags for ‘cuModuleLoadDataEx_4’ disable search path
: Ptx binary found for ‘cuModuleLoadDataEx_4’, architecture=‘compute_20’
: Ptx compilation for ‘cuModuleLoadDataEx_4’, for gpu=‘sm_20’, ocg options=’ ’

Are you reading your .cl file and loading the prog source correctly?

BTW This PTXAS is a TRU memory-eating pig, eats nearly 1.5 gigabytes of RAM? Geez
Well, what do you mean by “correctly”, standard file reading operations. And yep, the file is big, 5000+ lines

Sorry for pointing out the obvious: it looks like the compiler is running out of memory. Can you split your program into several parts?

In the kernel I call inside a loop several different tree traversals + several large sub-kernels.
I have tried using

#pragma unroll 1

before the main loop, but I still get the same error.
I have had problems with large,large loop unrolling before, so if there is a way to prevent this, I would be very glad to know it. The loop limit is not a constant, it is a formal parameter to the kernel. Also the kernel has 60+ parameters. This is still not exceeding the GF2xx limit of 256 bytes per formal parameters.

So the only thing suggested is to extract the main loop and play it on the host side. Won’t inlining by hand (for example tree traversal code) reduce the compiler’s struggle to inline all?