Process level GPU pixel data sharing

The application I am working on is a time sensitive Windows XP device driver using OpenGL for graphic rendering.
The driver is processing HD size dynamic textures produced by third party OpenGL code on the GPU.

There are two options:
1.Do the processing using third party code in one process
2.Read back the dynamic textures from the GPU to the system memory
3. Map the memory to our driver process and send the dynamic textures back to the GPU for the next step of the OpenGL processing.
Link against the third party code and use shared OpenGL contexts to exchange resources.

Option A is to slow.
Option B exposes our driver to the crashes in third party code and it is unacceptable.

Do you know about any method/trick allowing GPU resources sharing between different processes?

Thank you
– Jan

Is that third party stuff, application or library? Can you link your code with theirs (static or dynamic)? Do you have soruce code of that third party stuff?

You can try to use some tricks from glIntercept project. Idea is to pu fake OpenGl32.dll in app folder. Upon start app will load your fake OpenGL driver and you should wrap calls to real OpenGL32.dll. Then you can find some sweetspot to add your draw code.

I do not own the source.
They can build for me dll exe lib if I wish.
The problem is – they are working as plug-in for the broadcast driver. The broadcast driver is using OpenGL for post-processing of the video streams created with the use of OpenGL in the plug-ins code.
The broadcast driver has to be very robust (nobody likes glitches on TV vision).
To make broadcast driver very robust we have to separate it from the plug-ins by running them in the separate process space.
I can not change this restriction.
In the case of plug-ins creating output in the system memory we are using shared buffers mapping memory.
When OpenGL is used to produce output we do not have analogical technology without enforced read back of the data from the GPU. Because we are dealing with multiply HD video streams it is serious bottleneck.
Any advice how to overcome this bottleneck will be appreciated.
– Jan

Maybe it is possible to share contexts accross processes, so you could have access to its texture, avoiding the download/upload in step A.

I remember something about stealing context from a window on these boards.

Moving broadcast driver in separate process doenst make things mre or less stabe. Whole pipeline is stable as much as weakest component in pipeline. From your first message you are trying to add processing after third party OpenGL plugin finished processing and feed it again after your code finish processing. So…

ThirdParty OGL efect -> your effect -> ThirdParty OGL…

Is your processing code OpenGL based too?

I still cant figure what are you trying to do. Are you trying to grab result from OpenGL and send it to some other hardware (different video/encoder/whatever card). Please explain more if it doesnt break your NDA.

Today hw is powerfull… Using PBO techniques you can achive over 2GB/sec grabbing data from GPU or even more to send data to GPU.

>> Moving broadcast driver in separate process doesn’t make things more or less stable.
Yes it does. Crash of plug-in does not interrupt driver.
Plug-ins are used only as video stream providers.

>> I still cant figure what are you trying to do.
The architecture is following:
GPU(plug in)->|->GPU(driver)->Broadcast hardware

If plug-in crashes video stream “disappears” (and reappears after plug-in restart) but the driver is still working.
Imagine looking at multi-window infomercial TV channel. It is big difference if one section of the screen is for a second blank or if it is interruption in the program because computer is rebooting.
Another example is – plug in is leaking very small amount of memory. The diver can run for months uninterrupted if the plug-ins are periodically restarted and if they are in a separate memory space. If they share memory space with the driver it would be disaster.

I am not talking about plug-in crashes in OpenGL driver but CPU based crashes.
Running plug-in on a separate thread is not safe enough because memory is shared.
The limitation (separate processes for plug-ins) is not my idea but design requirement.

>> Using PBO techniques you can achive over 2GB/sec grabbing data from GPU
I am doing this exactly with this performance (and bigger on PCIex 2.0 based hardware). I am also sending the data back to the GPU in the driver and read them back again and it is not fast enough for a few HD streams so I am looking for alternate solutions.

– Jan

OK… you have more than one sources (GPU plugin), each one in separate process, and your GPU driver (a “composer”) which accept all those streams and prepare one final stream to the broadcast hardware.
Because all those units runs in its own process & memory space, you cant share GPU objects between them. It is clearly stated in MSDN that wglShareLists can share rendering contexts within the same process. So… copy data from plugins to sysmem, share those memory buffers with your driver, upload again on GPU (but from driver) do compositing, readback again and send to broadcast hardware. Seems pretty straight forward… and SLOW!!!

Better run all those gpu plugins on different machines and send their streams on LAN. Make server which accept multiple LAN streams and do the rest.

Thanks a lot for all the advice.
I proposed the network option but it was criticized as not cost effective.

– Jan

cheap, fast, robust : pick two