I am working on a project where we are discussing using the GPU to do High Definition video processing. The question then becomes, what is the best way to get the edited video stream from the video card back to the hard drive? What would be the implications of using the GPU to do video processing instead of doing it all on the GPU? Is this even feasable.
Well, if you’re going to try this, first, get an nVidia card. ATi cards are notorious for their (comparitively) slow uploads and downloads.
Second, you should make use of available streaming facilities. Namely, ARB_PBO. It doesn’t alieviate all the CPU burden, but it gets rid of some of it.
Third, you’re going to be doing streaming, so you’ll need to be uploading a frame while you’re processing one at the same time you’re downloading one. This means you’re not uploading to the framebuffer; you will be uploading/downloading to textures or renderbuffers. In which case, you need FBO. Fortunately, nVidia’s FBO implementation is fairly good, though still beta. Use FBO.
Here’s what the cycle should look like. You allocate an upload PBO and a download PBO. You’ll create 3 textures to use as rendertargets.
Note to FBO team: here’s a good reason why renderbuffers should have dedicated upload/download functions. So that they can be used with ARB_PBO. If they had them, then we would use a renderbuffer for the destination data rather than a texture here, but they don’t, so we have to use textures for everything. It’s kinda annoying.
Anyway, so you’ve got a frame of data on your CPU. Upload it through to a texture ARB_PBO (you can map the buffer object so that you can directly store the decompressed video into the buffer). The first load should be synchronous, but the rest can be async. Anyway, once it’s uploaded, do your image processing, storing that data into the destination texture. While that’s happening, do an async (PBO) upload of the next frame to the third texture.
Once the rendering is finished, do an async (PBO) read from the dest texture. Then, start a render using the previously uploaded texture onto the currently unused texture (the one that was used as the first source image). Then do another async upload. Keep doing this until you run out of frames.
In theory, this should be the fastest way to do this on the GPU. Whether the bus traffic makes it slower than CPU based operations is unknown.
< edit >
I didn’t notice that you have harddrive access going on there too. Well, given that, unless you’re doing some serious processing, it probably won’t matter whether you’re doing the processing on the CPU or GPU; the disk access is going to fill much of your time.
Thanks for your help. I appriciate it.