Parallel data transfer and gpu processing


I currently have a GPU program with the following work-flow:

  1. Upload data
  2. Process uploaded data into something more useful
  3. Do more gpu intensive stuff
  4. Repeat

This is all done within a single command queue on a single GPU. I occurred to me that in theory, nothing prevents me from doing steps 1 and 3 in parallel, as the calculation intensive stuff only requires the processed data from step 2, and I can already load the next slice of data from disk and upload it while step 3 finishes.

So much for theory. How would I best go about this in practice?

Do the simplest thing which suits your problem.

If each data packet is independent, just use 2 sets of queues and buffers and alternate which you use.

The only care needed will be to make sure the input buffers are finished with before using them again on the cpu - events can be used for this. And that any output data is read in a non-synchronous way. One approach is to use 1 thread per queue and just use synchronous reads. Or use a similar approach to the input buffers and process the result before running the next iteration. This can be extended further where you double-buffer the input/output buffers as well.

If there are data dependencies between iterations you need to use events to synchronise between the queues.