VBO/PBO Perfomance

I am trying to get the result of a rendering into a vertex array using PBO to VBO, but the rendering is done in a FBO (so glReadBuffer is set to a FBO color attachment).

The performance I am getting is not what I expect. I was expecting the readpixel from the FBO to the VBO to be at least faster than a readpixel back into RAM, but it is not. In fact, it is EXACTLY the same speed… which leads me to believe that the memory I am trying to transfer is not residing on the card.

I am wondering if this is because I am using a FBO? :confused:

My code is pretty straightforward. I use a 32bit RGBA floating point FBO, I try to readback into a VBO using the following code:

glReadPixels(0, 0, GetTextureWidth(), GetTextureHeight(), GL_RGBA, GL_FLOAT,0);

I create the vbo like this

glGenBuffersARB(1, &m_vbo);
glBufferDataARB(GL_PIXEL_PACK_BUFFER_ARB,  GetTextureWidth()* GetTextureHeight()*sizeof(float)*4,NULL, GL_STREAM_COPY_ARB);

Does anyone have any idea why I get no speedup at all?

thank you.

I actually have just implemented an app which uses this feature and I have just about the same code as you.

The only differences are when I read from the fbo to the pbo I bind the pbo first, then the fbo, but this probably won’t matter unless there is something very subtle about the driver. The other difference is that my buffer objects are of type DYNAMIC_COPY as opposed to STREAM_COPY. Once again, subtle post.

So I have a feeling the bottleneck is probably somewhere else in your code. Can you post some more of it or e-mail it to me and I can take a look.

You might want to look at your render to texture code (I’m assuming you have some b/c you want to use texture data as vertex data) and make sure you’re unbinding the fbo there. Because if you don’t unbind it there, then application performance can be degraded heavily.

Hope this helps. Sorry that I don’t know the exact reason of no speed up.

Thanks for replying.

I tried binding the fbo after the pbo like you suggested but it did not make any difference in speed.
What driver/card are you using?
My hardware is
QuadroFX 4000
ForceWare 81.67

To make sure that no other part of the code would skew my timing, I bracketed my readpixel code with high-resolution timing functions and added a glFlush like this:

glReadPixels(0, 0, GetTextureWidth(), GetTextureHeight(), GL_RGB, 0);

This is what I get:
Read into VBO: 0.0392 milliseconds
Read into RAM: 0.0379 milliseconds

Judging by these results, it is clear that the driver utilizes the same code path to transfer the pixels: It is ALWAYS doing a round trip into RAM.

but why???

Does anyone have any other method to debug this kind of problem?


First, let’s get the timing right: Use glFinish() instead of glFlush(), and put one glFinish() before the StartTimer().