File-level var, touched by passed-in param, becomes unusable

Hi folks!

I pass a struct full of data to my kernel, and I run into the following difficulty using it (very stripped down example which compiles and demonstrates the failure):
[xcode 3.2 on macbook pro; this compile is obviously for cpu]

typedef struct
    float xoom;
    int   sizex;
    } varholder;

float zX,  xd;

__kernel void Harlan( __global varholder * vh )
    int X = get_global_id(0), Y = get_global_id(1);

    zX = ( ( X - vh->sizex/2 ) / vh->xoom + vh->sizex/2 );  // (a)

    xd = zX;                                                // (b)  (Boom!)

After executing line (a), the line marked (b), a simple assignment, gives “LLVM compiler failed to compile a function”. If, however, we do not execute line (a), then line (b) is fine. Together they cause the failure.

So, through fiddling around a LOT with this, it seems that what messes up line (b) [or any future access of that variable] is the fact that the assignment statement (a) uses, in its calculation, an input parameter – an element of the structure pointed to by my kernel arg 0.

I do need to be able to make calculations and then use those results later! The reason that zX and xd etc. are declared at the file level is because my helper functions need them. Could this be part of the problem? There must be a way around it… I have tried __global and __local qualifiers in their declaration; no help.

Any thoughts?


I am sincerely hoping that this is not a “correct” answer to my own question. I found on another forum (though not the same question asked!) the following, and I am afraid that it refers to what I’m trying to do:


You’re doing something the standard prohibits. Section 6.5 says:

'All program scope variables must be declared in the __constant address space.'

In other words, program scope variables cannot be mutable. (*)

(end quote)

… well, tcha!!! What an astoundingly inconvenient restriction! I’m sure there’s reasoning behind it.

Someone feel free, please, to correct me and say that’s not what it’s referring to!

You guys & dolls all knew this, right, and didn’t have the heart to tell me?..

(*) … then they’re not VARIables, at all, then, are they???

From my understanding, you are correct. The reason for that is that you could have multiple copies of that kernel running in parallel, and each one could be trying to assign the value to that variable. By preventing you from doing that, it ensures data consistency.

What I find interesting is that you are crashing on (b), not (a).

There is a workaround that might work…you could add an extra argument to your kernel (say an array of those two values), and use that in your functions and whatnot. I’ve not done anything with helper functions in OpenCL, but you could probably make it work (can’t see any big reason why not).

Thanks, HolyGeneralK.

Understood. I guess I had just hoped (without understanding memory spaces at the time) that I could make them __private, like the variables declared within a function, so they’d only be visible to that one kernel.

Yes, that is interesting! Of course it shouldn’t have worked, but only gave the error on the combo of both lines. I’ve seen similar squirrely behavior – today I had one function where I couldn’t say “while ( Hangle > k_Hval[i] ) i++;” – it gave error -36, Invalid Command Queue, when I enqueued the readbuffer later. But I could say “if ( Hangle > k_Hval[i] ) i++;”. Turns out that k_Hval is a constant array at the file level, but I’d misunderstood the spec I quoted above and was thinking that just putting my float array there would make it __constant. The fix was to explicitly put that __constant declarator there, but figuring out what was wrong was “fun”. Maybe we’re lucky to be here in the beginning; a few years from now, detailed error messages might take away this special kind of “fun”.

Thanks! Actually the next day I found it ridiculously easy to completely work around it. Like sitting on the porch with a cigarette in less than an hour’s time. I’d thought I had lots of static variables, as I do in my whole project, but the kernel only had about 8, and most of those were just used in the __kernel func itself, so just moving them in there was most of it. I think I only had to pass like 2 vars into 1 or 2 functions. (Having no alcohol in my system whatsoever may have affected the ease with which I approached the problem freshly the next morning!)
Another comment on your suggestion is that … I hadn’t thought of an outside argument to allocate the space, since these vars don’t come from outside, but I’ve seen other people recommend allocating some things like that. I guess they’d end up in the __global space if I wanted to be able to write to them? … anyway, i got it going with declaring them in the __kernel function, so they’re __private, (I think,) so they should be nice & fast … I think!

AND, a p.s. – I thought I was further away from getting this thing to work. BUT, it now runs correctly on the cpu, and on the gpu it’s great too while it lasts, but it crashes the card anywhere from a few seconds in to a minute and a half. And, it’s not much faster than opencl on the cpu. But this is proof-of-concept, and I think it’s time to get a mac pro and a heavy-duty graphics card. I’m cooking this poor macbook pro and draining the battery twice a day!