I would like to know if there is a better way to stream a video file (e.g. AVI) to an OpenGl texture than what one of the Nehe tutorials suggests (i.e using vfw and AVIStreamGetFrame).
My problem with this is that a lot of codecs are not compatible with vfw and it is quite slow.
Of course, I can do this with DirectX and stream the video to a DirectDraw surface then copy every frame from there to an OpenGl texture. But is there a way to make this copy inside GPU memory (‘blit’) and without involving a much more costly download from GPU than upload it back again?
The ARB_pixel_buffer_object extension seems to suit to your problem. With it, you can benefit from DMA and speed up your texture updates. See the exention spec for more information.
I am already using PBOs extensively, the problem is about decompressing a video file frame by frame. The vfw library the nehe tutorial suggests is straightforward, but most of the new codecs are not compatible and it is quite slow.
DirectX is capable of streaming video to a DirectDraw surface, but if this requires an extra copy to transfer it to opengl, I would like to avoid reading it back from the GPU (DirectDraw surface) to PC memory then transferring it back via PBO to GPU memory. Are you sure this copy would happen inside the GPU?
look into the nvidia opengl sdk, they have a sample of using directx decompressing to a PBO
You may want to read this if it is not already done and then check your pixel size and pixel format.
Although I could not yet compile the nvidia sample, at first glance the contents of a frame of video is being passed to a directx surface (afaik the equivalent of an opengl texture) which is a - hopefully - asynchronous upload to the GPU, then being read back to PC RAM. Then, using PBOs it is uploaded again asynchronously to the GPU, this time to an opengl texture.
This looks like two extra steps compared to if I used directx and not opengl. Even if all this is asynchronous on the CPU side it still requires an extra effort from the GPU and is an extra load on the bus.
Am I mistaken in my assumptions? If not, is this still the best way to stream video to an opengl texture?
the nvidia sample doesnt read back to RAM but to PBO directly, then to texture. Yes there is one more “action” then using directx directly, but there is no readback to regular RAM.
dsurface -> PBO (two alternating PBOs used) -> texture
dsurface -> dtexture
nvidia also has sample for the latter method in their SDK. Be aware to read the notes about compiling those demos, you will need some content of older directx sdks to get the dx9/dxshow renderer.
Thanks guys. I will investigate this. If only I didn’t hate directshow…
Check examples Texture3D and Texture3D9 in DirectShow SDK (part of older DXSDK). Remove DX stuff, add PBO stuff and thats it.
If you still have problem, send PM and I’ll give you my classes for video to texture.
Thanks, luckily I had Directshow already installed on my PC. After having suffered for half an hour because of unicode, ATL and VS2005 problems I successfully recompiled my Dshow base classes and the Texture3D9 sample now works.
Now I can play with it a little.
Sorry, another quick question:
I read somewhere that if I have a PBO and a mapped buffer, I can copy to it in another thread. Is this correct?
yes… create pbo pool, map all pbo buffers and select one of the pointers from decoder thread, copy decoded frame, notify pbo pol about that. In render loop, in next frame, check is there any full pbo, unmap it and upload texture data. After that mark pbo as free, and map it in next frame.
Using this tech, you can stream several video feed at same time.
yes, mapping a buffer gives you a memory pointer and you can use it in whatever thread you like. The only restriction is that you should only map/unmap it in the GL context thread, and use plain old mutex’s to ensure your 2 threads don’t interfere with each other (once you unmap it in the GL thread that pointer becomes invalid).
I did not yet implement the texturerenderer, but I thought moving the upload to the pbo to another thread is too interesting not to try it first.
It works very well. I could not measure exactly, but even by watching the task manager, it visibly lightened the load on the CPU core that handles the gl thread.
But there is one thing I cannot explain. It could be a GLSL problem and off topic here, but since you were very nice giving me excellent advice, I thought I would ask it here:
I recently rewrote my renderer in a shader. All lights out of a possible 8 can be per vertex or per pixel and any of the usual three types: directional, point or spot.
It is not surprising that a per pixel spot is the slowest to render. However, I cannot explain why using only one spot, maybe 80000 polygons it adds significantly (~20-25%) to the load of the CPU core handling the gl thread if I change the light from per vertex to per pixel.
What has the CPU got to do with a more complicated fragment shader?