Do OpenGL and CUDA/OpenCL run in parallel?


I have a situation in which I really need OpenGL rendering and CUDA calculations to run in parallel, completely decoupled. They really need to run in parallel, from an application, and even theoretical standpoint, eg. I need to launch both tasks and do something when the first has completed. Having the GPU internally handle OpenGL then OpenCL sequentially would be a showstopper in my scenario.

Do OpenGL and CUDA/OpenCL run in parallel on the GPU? In other words, when launching, say, a CUDA kernel, and immediately after if I start to render an OpenGL scene, will the GPU wait until the CUDA kernel has completed before processing OpenGL draw calls?


Multiple different CUDA kernel can run in parallel iff they are from the same application context and they are running on a GPU of the Fermi or Kepler architecture and the GPU has multiple Streaming Multiprocessors (see page 18 Concurrent Kernel Execution).
But: I guess OpenGL ‘Kernel’ (== Shaders) and CUDA Kernel even from the same application might not run in real parallel as they don’t use the same context. Even if that is possible, I don’t see a way of enforcing this behaviour…

But context switching is quite fast, so unless your individual Kernels take very long, I don’t see a problem in switching. You could tell us, why you think switching will be a showstopper for you, maybe there is a different solution.

Well, they are two different contexts. They can be bound at the same time, so this could work, couldn’t it?

My scenario would be the following:

At application start:

  • create and bind a CUDA context
  • create and bind a CUDA kernel
  • create and bind an OpenGL context

The CUDA context and OpenGL context would always remain bound, at all times, until the application terminates.

Main program execution:

launchCUDAKernel(); // calls the kernel. This call returns immediately.

// The purpose of the following loop is to continuously render the OpenGL scene,
// and update some OpenGL buffers with the contents of a CUDA buffer
// *as soon as* the CUDA buffer has been filled by the CUDA program/kernel

while (true)
   renderOpenGLScene(); // issues many draw calls, but does not block until they finish (eg. we do not call to glFinish())
   // at this point, here, OpenGL commands / draw calls continue to be executed
   if (hasCUDAKernelFinishedItsJob())
      byte[] bufferContents = readSomeCUDABuffer();

It is important I know if things can run in parallel before I start coding. If I know they won’t run in parallel, I need to target the CPU (and write my program in C++).


It doesn’t matter that both contexts are bound at the same time, as they are different contexts it might not be possible to run them in parallel by the GPU (see the documentation).
Do I see it correctly, that the cuda kernel will take a very long time? The driver might do context switching here (haven’t worked with long running CUDA/OpenCL kernels). If not, you might be able to split up the kernel into multiple shorter ones.
Getting a ‘real-time view’ into the memory of a CUDA kernel while the code is still running might not work correctly unless you use some kind of sync mechanisms which will slow down the code.
I would split up the CUDA part into smaller kernels or feed less data (depends on the problem) so that they run only ~ 16 msec, this way you can visualize the progress in 60FPS.

ArrayFire has a simple, free Graphics Library that plugs into any CUDA code. That might be a useful plugin for you. I’ve been using it to visualize a bunch of particles in some cell dynamics simulations.