Catching up to cuda?

Does anyone know if, in the future, OpenGL will be able to harness more of the full unified GPU core computing capability on NVidia GPU’s that Cuda claims to do?

I’ve been writing imaging applications with OpenGL for 3 years. I’m getting a lot of crap from development partners about how great Cuda is!

I’ve reviewed the Cuda 2.2 user guide, and it looks great if you want to calculate the answer to LIFE, the UNIVERSE, and EVETYTHING in a few microseconds, but it doesn’t seem to be designed for graphics at all. Where is all the texture formatting for file format input/output? Where is the drawing code and geometry scaling? You have to do it all yourself!!!


If you read the CUDA specification, you have to find out that it is a GPGPU API. Its purpose is not to draw graphical primitives but to use GPU for general purpose computation. There is also OpenCL, which should be friendlier for collaboration with OpenGL. The data should be shared through buffers, and the new API alleviates those data sharing.

So is it a reasonable solution to do 90% of my coding with OpenGL and share buffer data with Cuda or OpenCL as a more efficient alternative to using shaders for maximizing unified multi-core GPU’s? Do you think I’d be better of waiting for OpenCL to become more defined?

Thanks, Rennie.

I’m not sure that I have understood this question correctly. OpenCL is not a replacement for OpenGL. If you can do your calculations in shaders, there is no reason for switching to some GPGPU API. But, if there is some calculation that had to be done on CPU, and if that task can be divided into tens or hundreds of parallel tasks, then GPGPU API is something that can boost up the speed. In any case, I suggest you to try to find answers from more competent programmers for GPGPU. There are many interesting forums. For example: - for CUDA - for OpenCL

the main difference between CUDA and OpenGL is that in opengl GPGPU the number of shaders that ur quad can trigger depend on its resolution while in CUDA in depends on hardware ,as u have threads re-grouped in warps, warps in blocks, and blocks in grids and it is the programmer that define the number of working threads in each block .
Another important thing in CUDA is there is a synchronizing mechanism in ur application where u can define a barrier that all threads in a block can wait.

Another important thing is that in cuda , threads in blocks can share data via on chip shared memory. So in cuda with synchronizing mechanism and sharing memory between threads block, less passes are needed, but in opengl u need to write to intermediate texture and then ping pong to share data …
last important thing in cuda is that the shared memory is organized into banks, ur application should avoid bank conflict which it happens when two threads in a half warp( warp=32 threads) access the same bank so the access will be serialized ( u should know that a shared memory for a warp is split into one request for the first half of the warp and one for the second ).
CUDA come with a profiler and debugger , the profiler show u the occupancy in each sm and shared memory to maximize ur application efficiency and with debugger u can laught ur application in emudebug mode where all threads access are serialized so u can debug ur app( u have to know that ur application may not crash in emudebug mode while in parallel mode it crash !)

If u plan to use opengl with cuda u should know that u cant pass opengl texture directly to cuda. U need to store it in a pbo and than maps it to a cuda buffer!
So in conclusion, it depends of the application type, in some cases using glsl/fbo is more efficient than passing data to cuda !