Does PBO + glGetTexImage improve the performance?

Now I have a OpenGL rendering project that need to download about 16 or more textures to disk. So I want to use PBO + glGetTexImage to improve the performance, not only using glGetTexImage.

So I made an experiment to test the performance of using PBO + glGetTexImage. In my experiment I found that using PBO + glGetTexImage would not improve any performance.

I want to know that if I use them not correctly or using PBO+ glGetTexImage would not improve performance in today’s graphic card.

The following is my snippet code
//////initialize code
#define COUNT 16
GLuint _downloadPBO[COUNT];
GLubyte *savedbyte[COUNT];
#define _width 1024
#define _height 1024
#define BUFFER_OFFSET(i) ((char *)NULL + (i))

for(int i=0;i<COUNT;i++)
{
glGenBuffers(1 , & _downloadPBO[i]);
glBindBuffer(GL_PIXEL_PACK_BUFFER_ARB, _downloadPBO[i]);
glBufferData(GL_PIXEL_PACK_BUFFER_ARB, _width * _height * 4, NULL, GL_STREAM_READ);
}

    glBindBuffer(GL_PIXEL_PACK_BUFFER_ARB, 0);

////performance test code/////////////
t1.start();
glBindTexture(GL_TEXTURE_2D, texid[GROUND_TEXTURE]);

          for(int i=0;i&lt;COUNT;i++)
	{
	glBindBuffer(GL_PIXEL_PACK_BUFFER_ARB, _downloadPBO[i]);
      glGetTexImage(GL_TEXTURE_2D, 0, GL_RGBA, GL_UNSIGNED_BYTE, BUFFER_OFFSET(0));//

          savedbyte[i] = (GLubyte *)glMapBuffer(GL_PIXEL_PACK_BUFFER_ARB, GL_READ_ONLY);

	glUnmapBuffer(GL_PIXEL_PACK_BUFFER_ARB);
	}
	t1.stop();
	copyTime = t1.getElapsedTimeInMilliSec();

///////////////////////////////////////////////////////////////
When COUNT=1,the copyTime is about 2.8 ms! When COUNT=16,the copyTime is about 46ms!

I use Microsoft OS and OpenGL 3.0 programming language and CPU Intel Pentium Dual-Core 2.5GHz,2GB RAM ,Geforce 9500 Card.

Any suggestion would be appreciated!!!

You’re sending NULL+i as your image destination for the glGetTexImage call?!

Presumably you do EITHER glGetTexImage OR glMapBuffer not both.

I don’t see where you expect any performance gain to come from.

The mapbuffer call potentially has to do some machinations for allocation. Wins would come from GART mapped memory on an AGP bus for example but not today. Besides you don’t show where you copy your texture to with the other call. NULL on upwards is a poor choice on any platforms I work with :slight_smile:

Again, you do one or the other, not both… right.

This is wrong PBO usage pattern. Never map buffer right after posted read/write operation.
Without PBO glGetTexImage force app on CPU to wait until GPU finish its job and copy data to sysmem.
With PBO, glGetTexImage is just stored in GPU command queue and control is returned to app. When GPU finish its work it will copy result to selected PBO. App later can map PBO and copy data to sysmem.

The big question is when later? Answer is not that easy. In my practice im following double buffering rules (OpenGL use double buffering, many async API’s use double buffering).

So… my suggestion is to create two PBO’s for one readback task. Usage pattern is simple:

  • bind pbo1
  • post readback (glReadPixels or glGetTexImage)
  • bind pbo2
  • map buffer, copy content, unmap buffer
  • swap pbo1 & pbo2 names
    If you do this in loop your app will “late” one frame, but it will achive better performances.

If you have multiple reads per frame you can create pool of PBO’s and write simple class to choose avaible PBO for some transfer.

Also, last time I checked glGetTexImage was not accelerated using PBO, but glReadPixels works like charm.

Is glGetTexImage fast again when using PBOs vs. glReadPixels? In my past experience I found it faster to draw to a pbuffer then use glReadPixels than to use glGetTexImage.

Now that I’ve moved to FBOs I just bind an FBO and use glReadPixels still. It would be cleaner to use glGetTexImage, but I don’t want to go there if it’ll be slower in some cases.

PBOs are about making an operation asynchronous, not fast. Performance is obtained if there is a long period of time between the point when you need to start the pixel transfer, and the point when you need to get the results of the transfer. This allows you to start the operation and do something else while it finishes. Then, after it’s done, you can come back and pick up your data.

Unless your code can be arranged like this, PBOs will be of little help, performance wise.

In real world results without PBOs I’ve found that glGetTexImage is slower than glReadPixels (for the same amount of pixel data). This was a few years ago though.

Since it’s using a different driver code path when PBOs are used, the question is are they the same speed now (or even faster). Although PBOs are about asynchronous transfer, the real transfer speed has a direct impact on when the data becomes available (and avoids stalls when mapping the buffer).

In real world results without PBOs I’ve found that glGetTexImage is slower than glReadPixels (for the same amount of pixel data). This was a few years ago though.

On what drivers and hardware did you determine this?

Since it’s using a different driver code path when PBOs are used, the question is are they the same speed now (or even faster).

There’s no way to know without testing it. The OpenGL specification does not dictate performance; only functionality.

Yes I know, that’s why I asked what results people were getting…