Video decode hardware acceleration

Sorry for posting this question here in the advanced forum, but it’s where the people who might know the answer hang out.
I also know this has come up before, but perhaps someone has solved the problem by now.

As many of you know, the hardware vendors have chosen to only expose their wonderful hardware MPEG decompression capabilities through DirectX (DXVA/DirectShow). That also goes for other quality video processing, like deinterlacing and perhaps scaling. I’m surprised nVidia hasn’t done anything on the OpenGL side, but then they still don’t have YUV texture support on Windows, either…

Is there ANY way to use the hardware to create decoded/processed video frames and access them from the OpenGL side (Pixel Buffer Objects?) without moving the data across the bus twice (by using a download/upload cycle to get the data across the “great wall”)?

It’s a crying shame that you can use DirectShow to generate HD video frames in on-card memory (Direct3D surface) in real-time but you can’t somehow map that data into an OpenGL buffer. Has anyone managed to figure out a way to do this, even if you have to copy through specially allocated on-card memory or something?

I had some hope that CUDA would help bridge this gap, but it doesn’t sound like it will.

Thanks for any pointers,

  • LoopinFool

Method i found most suitable and widely supported by vendors is to use color_matrix from ARB_Imaging to accelerate YUV -> RGB conversion.

I have done partial video acceleration using fragment shaders to do YUV -> RGB conversion on the fly and some processing - like deinterlacing - should be fairly simple to do in this manner too.