Unrecognized Function Error on Macbook Pro, but not on Windows Desktop

Hey there!

Relatively new to OpenCL (and pyOpenCL, since I use Python for all the non-parallelized computation I’m doing), but I’ve already had a lot of fun (and a fair bit of success!) using it to brute-force some research problems that haven’t yielded to traditional optimization algorithms so far.

I’ve recently run into a problem that leaves me a little perplexed, though; I’m running identical pyopencl and OpenCl script/kernel code on two separate boxes, a Macbook Pro and a Windows box. The Macbook has an ATI Radeon HD 6750M and the Windows box has an AMD Radeon HD 6670. I’ve run a number of pyopencl/OpenCL programs successfully on both laptop and desktop so far, but with my most recent coding attempt on the laptop I get the following error:

[i]pyopencl.RuntimeError: clBuildProgram failed: build program failure -

Build on <pyopencl.Device ‘ATI Radeon HD 6750M’ on ‘Apple’ at 0x1021b00>:

Undeclared function ‘_Z8__cl_powdd’ called by function ‘find_equil’[/i]

(where ‘find_equil’ is the name of my kernel)

On the desktop, the very same code runs perfectly (and is running as I write this on the laptop, in fact!), as best I can tell; certainly it compiles and runs, anyway.

From what I can tell (I think), this suggests the desktop’s AMD card has some function that I’m using whereas the laptop doesn’t; however, I’m not sure what function ‘_Z8__cl_powdd’ is, and googling for it doesn’t turn up much, no matter which underscores I leave/delete or portions of the function name I use as a partial search.

Judging by the name: I have no idea what Z8 refers to, but cl is probably just a tag dubbing this an OpenCL function; powdd seems to be the meat of the function’s name, and makes me think this is a mathematical power function (and I do use OpenCL’s built-in pow function at several points; see the code sample below, for example), and dd maybe designates it as of double type? I’m not sure what the extra d would be for, though, and I’ve already fastidiously cleansed my code of all doubles (converting them, instead, to floats), as I discovered early on that my desktop’s AMD card doesn’t support the double OpenCL extension.

Sorry for all the words! Could anyone help me in breaking down this error message?

I would be happy to post the code as well, but didn’t want to assume anyone would want to wade through it, especially as the compiler error message has no line reference, and as I’m not a programmer by trade, so I’m certain I’ve bungled proper coding convention in a number of places. Here’s an excerpt where I use the OpenCL pow function, though, in case that seems to be at the core of this problem:

float w1(float p, float A1, float K1)
{
        float K_dwn = floor(K1);
        float w_dwn = 1.0 - dec(K1);
        float K_up = K_dwn + 1.0;
        float w_up = 1.0 - w_dwn;
        float w_tot = 0.0;

        if(K_dwn >= 0.0)
        {
                float A_fac = 1.0 / (1.0 + pow(A1 - 1.0, 2.0 * K_dwn + 1.0) );
                float p_fac = pow(A1 * p + 1.0 - A1, 2.0 * K_dwn + 1.0) + pow(A1 - 1.0, 2.0 * K_dwn + 1.0);
                w_tot += w_dwn * A_fac * p_fac;
        }
        else

Update: running on the Laptop’s 8 CPU cores works as a device works just fine, too. It seems that I’m using some function available to the CPUs and to the desktop’s AMD GPU but not to the laptop’s ATI GPU – but what function is the mysterious _Z8__cl_powdd? Hm…

Update 2: a number of my constants weren’t explicitly tagged as floats with the f suffix, so I added those. This seems to have corrected the _z8__cl_powdd error, so maybe the ATI GPU attempted to promote its floating-point operations to doubles?

But! A new error message, which I’ve also run across in the past and been confounded by:

[i] raise err
pyopencl.RuntimeError: clBuildProgram failed: build program failure -

Build on <pyopencl.Device ‘ATI Radeon HD 6750M’ on ‘Apple’ at 0x1021b00>:

Error getting function data from server
(options: -I /Library/Python/2.7/site-packages/pyopencl-2014.1-py2.7-macosx-10.7-intel.egg/pyopencl/cl)
(source saved as /var/folders/89/f4jshzbx5553fjj7jl4g8ymw0000gn/T/tmpr4MHpV.cl)[/i]

Update 3: the primary work in my code is done by two functions, BigV1 and BigV2; commenting out the code in either of them separately (and setting it to return 0.0f instead of doing the work it’s meant to do), while leaving the other one in tact, causes the program to run error-free, but when they’re both left in tact, the error persists.

The odd bit is that these two functions do not interact at all; they do not call one another. And, they’re highly symmetric; they’re doing the same things, effectively, but working with separate values—I could even re-write them as a single function with an additional argument, and may have to try that as work-around now, I suppose. At most they call the same helper functions a few times (e.g. to get the fractonal part of a float, or get the sign of a float).

Why in the world would the laptop’s GPU be willing to run the code with one of these functions operational, or the other, but not both?

Update 4: I’ve located a very simple bit of code that seems to be the problem, though I’m not sure why. My kernel (successfully) computes 4 floats: p0_diff, p1_diff, q0_diff, q1_diff. I would liketo store them in the __global float4 *tru_pq (which is the variable in which I’m storing the data retrieved from my kernel by pyOpenCL) like so:

tru_pq[idx-1] = (float4)(p0_diff, q0_diff, p1_diff, q1_diff);

But this results in the error I noted above,

[i]pyopencl.RuntimeError: clBuildProgram failed: build program failure -

Build on <pyopencl.Device ‘ATI Radeon HD 6750M’ on ‘Apple’ at 0x1021b00>:

Error getting function data from server
(options: -I /Library/Python/2.7/site-packages/pyopencl-2014.1-py2.7-macosx-10.7-intel.egg/pyopencl/cl)
(source saved as /var/folders/89/f4jshzbx5553fjj7jl4g8ymw0000gn/T/tmpyzAIo_.cl)
[/i]

However, if I turn just one of those values in tru_pq into a constant 0.0f, like so:

tru_pq[idx-1] = (float4)(0.0f, q0_diff, p1_diff, q1_diff);

then the code compiles and runs error-free.

Oddly, it doesn’t seem to matter which of the 4 values I convert to 0.0f; so long as only 3 of them are computed values, they’re retrieved successfully. But if all 4 are computed values, the error results.

The only thing I can think is that I’m not allocating enough memory in PyOpenCL to store the 4 float values, but everything looks (to me, anyway!) correct on the PyOpenCL end, the relevant excerpt being:

tru_pq = numpy.zeros((bigN*bigN*bigN*bigN,4), numpy.float32)
destination_buf = cl.Buffer(context, mem_flags.WRITE_ONLY, tru_pq.nbytes)

I’ve also tried changing numpy.float32 here to numpy.float64, but the error persists.

Update 5: I really only need the sum of the absolute values of the four values I mentioned above (q0_diff, q1_diff, p0_diff, p1_diff), so I tried changing

tru_pq[idx-1] = (float4)(0.0f, q0_diff, p1_diff, q1_diff);

to

tru_pq[idx-1] = (float4)(0.0f, fabs(q0_diff)+fabs(q1_diff), p1_diff, q1_diff);

and to many variations on this. None of them work. Maddening to be able to get 3 of the 4 values I need! I could just run the code twice, I suppose, but that’d begin to defeat the purpose in using OpenCL in the first place, oy.