FIFO queue implementation in OpenCL?


How am I supposed to implement an ever growing FIFO queue in OpenCL?

I have a queue that originally has a predefined size, but my kernel is supposed to:

  • dequeue the first item,
  • see if this item needs “subdivision”.
    -> if it does not, stop processing the item, and move on to the next one in a FIFO fashion
    -> if it does, then subdivide, eg. generate 2 new items and enqueue them.

Would somebody have a reference for me for the implementation of a FIFO queue in OpenCL?


Sounds like a job for dynamic parallelism in CUDA.

In OpenCL you could have an atomic queue head pointer and an atomic queue tail pointer and use traditional techniques to dequeue and enqueue items. It’s not going to be efficient though.

If possible, let the CPU figure out what to enqueue first, run a kernel that does those items, and then take the results, which define new items to run, and run the kernel again on them. In other words, do a bunch of work on known queue items for each kernel invocation rather than have one kernel that somehow keeps eating items from a queue (that’s not how OpenCL was designed to work).

If you overlap memory transfers and compute you could keep the GPU pretty busy doing it this way.