Creating a movie from a opengl application

Hi,

I would like to create a video from my application, since other ways of capturing it failed, I was trying to implement it inside.

Therefore I used 2 PBOs and I read the data and store it on disk in a simple TGA for every 1/fps seconds. (I can run a script after to turn the images into a movie)

But I am right now unable to increase the fps. It is capturing with only 10fps, with 15 fps I noticed too much lag.

I am considering that the following function is causing the slowdown, but since it is the process of writing to the disk, I do not know if there is any other way:


FILE   *out = fopen(destFile, "w");

                //glReadPixels(0, 0, W, H, GL_BGR, GL_UNSIGNED_BYTE, pixel_data);
                written_size= fwrite_unlocked(&TGAhead, sizeof(TGAhead), 1, out);
                written_size+= fwrite_unlocked(pixel_data, 3*W*H, 1, out);

                fclose(out);

                glUnmapBuffer(GL_PIXEL_PACK_BUFFER_ARB);

                glBindBuffer(GL_PIXEL_PACK_BUFFER_ARB, 0);

                return true;


My question is: do you know any other way to speeding up and allow the capture of 30fps movies?

Am I forgetting something?

Do you think that putting the disk bounded code inside a thread would help?

I use the krut tool for simple capture. If you don’t care about exact sampling, it has been great for me making nifty youtube captures…

In my experience writing to disk becomes the bottleneck very fast. If you are only going to make “short” movies, try storing your screenshots in RAM and only write them to disk later. See how your performance is then.

Later one could use a thread to save data in parallel, but you will usually generate images faster than you can store them, so your RAM will always fill up over time.

Of course it is also dependent on your resolution, so try reducing it.

I mostly use such a feature to render out a previously recorded camera-path in HD resolutions. For a 30 second clip i can easily wait several minutes to complete.

Jan.

Currently I have implemented it as reading pixels back from the framebuffer into a texture (glCopyTexSubImage2D) and storing the pixels in ram (glGetTexImage). I also fix the framerate to 30fps. Having 4GB of ram, I can capture about 40s (or 1200 frames) with 1280x720 dimension (1280 x 720 x 3 x 1200 / (1024 * 1024 * 1024) = 3.0GB). After I stop capturing, the application dumps all frames as plain bitmaps to the harddisk. Thereafter I use a linux commandline tool (ffmpeg) to create a movie with the frames.

For the result, check:
http://www.youtube.com/watch?v=-IyMqn-86zE
(the fps counter is not entirely correct in that movie)

In the future I might change my approach to storing the frames as jpeg or png in main memory. This would allow me to capture much longer videos while storing each frame in memory. Another option is having a seperate thread writing the frames to harddisk, thereby freeing main memory (and thus being able to store even more frames before main memory is full). If you have a fast harddisk (maybe an SSD) and you store images as jpeg or png you are probably able to capture frames until your harddisk is full ;).

Another thing I though of (but I’m unsure whether it would make much difference) is rendering to texture first and then render from the texture to the framebuffer (instead of using glCopyTexSubImage2D to create the texture from the framebuffer).

Indeed, if you can send directly your uncompressed images to movie encoding libs, you can go back to realtime.

Second point, why do you need a texture at all ?
You can directy glReadPixels from the framebuffer in ping-pong mode (while rendering current one, read the previous, then swap roles).

I think that’s what zweifel is doing, hence the PBO’s.

Now I’m not sure 2 PBO’s is enough with current hardware: if you map, say, PBO[0] in memory just before starting a new render to be copied in PBO[1], glMapBuffer() will cause your CPU to wait until PBO[0] was filled, so that’s a big synchronisation point between CPU and GPU, basically the same as a direct glReadPixels() to RAM.

You should have started rendering the image for PBO[1] before mapping PBO[0] to memory, otherwise PBO’s will provide no performance gain.

Now, I would suggest a FIFO with more PBO’s: you would map PBO[0] only when you’re about to start drawing frame for PBO[3] or something like that. That way, you have more chances that PBO[0] is ready when your CPU hit the glMapBuffer() call. This means shorter synchronization (remember that OpenGL is asynchronous).

You would also likely gain performance by making data compression and disk I/O in a separate thread. Modern GPUs are hungry so it’s better to have one core/thread just to feed them.

Thanks for the answers.

I am already using ping-pong PBOs, thus, as suggested, I will only store the raw data in a data structure and use it by a encoding library (avcodec).

I just hope avcodec has some non-blockable ways of calling their encoding functions, otherwise, I will have to put some threads.

@shadocko: I tried multiple PBOs, but got no better performance.

I agree that a thread to disk I/O and compression will be necessary to have.