the main difference between CUDA and OpenGL is that in opengl GPGPU the number of shaders that ur quad can trigger depend on its resolution while in CUDA in depends on hardware ,as u have threads re-grouped in warps, warps in blocks, and blocks in grids and it is the programmer that define the number of working threads in each block .
Another important thing in CUDA is there is a synchronizing mechanism in ur application where u can define a barrier that all threads in a block can wait.
Another important thing is that in cuda , threads in blocks can share data via on chip shared memory. So in cuda with synchronizing mechanism and sharing memory between threads block, less passes are needed, but in opengl u need to write to intermediate texture and then ping pong to share data …
last important thing in cuda is that the shared memory is organized into banks, ur application should avoid bank conflict which it happens when two threads in a half warp( warp=32 threads) access the same bank so the access will be serialized ( u should know that a shared memory for a warp is split into one request for the first half of the warp and one for the second ).
CUDA come with a profiler and debugger , the profiler show u the occupancy in each sm and shared memory to maximize ur application efficiency and with debugger u can laught ur application in emudebug mode where all threads access are serialized so u can debug ur app( u have to know that ur application may not crash in emudebug mode while in parallel mode it crash !)
If u plan to use opengl with cuda u should know that u cant pass opengl texture directly to cuda. U need to store it in a pbo and than maps it to a cuda buffer!
So in conclusion, it depends of the application type, in some cases using glsl/fbo is more efficient than passing data to cuda !