Sorry to leave this so long without replying. The short answer is: look at the OpenCL spec regarding select() and the ternary operator ( ? : ). (And, know that they are performed element-wise for vector types, so when you get to that stage of optimization [if needed for your gpu] they will still work properly].)

What I did was (Just typing this, not real code, may contain errors) something like:

if the original logic structure was:

if ( a < .25 ) z = q^2.;

else if (a < .75 ) z = q^3./2.;

else z = q^4.;

then the new structure is:

z =

(a<.25) * q^2. +

(a<.75) * q^3./2. +

(a>=.75) * q^4.;

. . . that’s the basic initial change I made (using copious #defines), but then later had to make changes to allow for the fact that “true” i.e. for the subclause (a<.25) is -1 instead of 1 for vector types, which actually makes sense, but that won’t matter for you because . . . you’re not going to do what I did, or not in that order.

Someone (may well have been notzed) suggested I try select(), which does *almost* exactly what I was trying to do above. I did try his suggestion, but my much, much uglier attempt, which I already had working by then, turned out to be 1ms faster than select(), and was a tiny bit better suited to my purposes. BUT, you will use select() first, because it’s much much prettier (less confusing to read, therefore less error-prone in the first place), and then when you’ve got that working, if you need 1 more ms, you may dig further. But the OpenCL spec *has* thought of this, and they *do* take pretty good care of you.

Good luck!

Später!

David

edit: It might seem that my re-structuring of the logic example I give is doing far too much work. It is calculating all three values of the example and using “logic multipliers” to zero out the non-relevant results. How could this be more efficient than branching and only calculating the necessary value? (I have similar structures that reduce 6 or 7 options to a single result, with **much** more complex subclauses!) Well, all I can tell you is that it *is* faster to calculate 3 (or 6 or 7 or 8) results in the way I have described than it is to use branching to just get the one that is needed. Apparently calculations are cheap and branching is very expensive. When you get your code running, if you get this bit figured out, please come back and tell me about it!