goto statements in OpenCL

njbenann · March 16, 2017, 11:43am

Are goto statements supported in OpenCL?

I saw few posts regarding this but they seem to be pretty old. I am hoping to see if there are any updates. To provide some context, I am thinking of using a finite state machine C code on the GPU side that is runtime generated (on the CPU side) based on a given regex and this code has goto statements.

Please let me know.

Salabar · March 16, 2017, 1:25pm

I cannot find anything in standart about goto being restricted in relation to normal C. It might perform very poorly on GPUs, though, but I assume you’re aware of that.

njbenann · April 28, 2017, 6:25pm

Thank you for the reply. Yes, I am aware that goto statements perform poorly in GPUs. But I am trying to understand the reason behind the poor performance due to goto statements.

Can you please tell me what makes goto statements perform poorly in GPUs?

Salabar · April 29, 2017, 12:44pm

Goto by itself is fine and efficient, but “finite state machine” part is what concerns me. Each OpenCL workitem shares the instruction pointer with 32/64 others. This means even in case of simple if-else statements in which one half of the threads takes one route and the other half takes another, the execution time effectively doubles. Somewhat like this:

bool x = condition()
turnOffThreadsNotFittingCriteria(x)
//code
…
//code
turnOffThreadsNotFittingCriteria(!x)
//code for “else”
…
//

This will be much worse in case of state machine, unless you can guarantee that most of the time each thread in a 64 thread cluster will be in the same state. Then a GPU will easily jump over unused branches of execution.

Then again, it may turn out to be an uncharted territory: rarely used and therefore poorly tested. Perfectly valid code may refuse to work due bugs in kernel compiler (coughing sound AMD coughing sound).

njbenann · April 29, 2017, 9:26pm

Thank you for the explanation!

I am getting a gist of what’s happening in this case. Would you agree that it is a similar case when using switch-case statements?

Also, can you please share any reading material (if any) related to this case?

Thanks!

Salabar · April 30, 2017, 9:33am

Would you agree that it is a similar case when using switch-case statements?

Switch-case is essencially goto, so yeah.

You should find plenty of info on the topic and more by googling “CUDA/AMD GPU/Intel GPU optimization guide”. Any of 3, I mean.