It is easy… Allocate one big chunk using wglMem = wglAllocateMemoryNV(NumBuffers*ImageSize, 0, 1, 1) call. You have to split this big memory chunk in several smaller “transfer” buffers.
typedef struct tagBuffer
{
byte *ptr;
GLuint fence;
int status; // free, transfer
GLuint texture;
}Buffer;
Setup all this structures as follow:
// init code
glEnableClientState(GL_WRITE_PIXEL_DATA_RANGE_NV);
glPixelDataRangeNV(GL_WRITE_PIXEL_DATA_RANGE_NV, NumBuffers*ImageSize, wglMem);
Buffer buf[NumBuffers];
for (i=0 i<NumBuffers; i++)
{
buf[i].ptr = wglMem + i*ImageSize;
glGenFenceNV(1, &(buf[i].fence));
glGenTextures(1, &(buf[i].texture));
glBindTexture(GL_TEXTURE_2D, buf[i].texture);
// setup texenv, filtering...
// this texture format are accelerated
glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, XRES, YRES, GL_BGRA, GL_UNSIGNED_INT_8_8_8_8_REV, NULL);
buf[i].state = FREE;
}
Somewhere in your code:
// Find FREE buffer
index = FindFreeBuffer();
// Copy data to buffer
memcpy(buff[index].ptr, srcbuff, ImageSize);
glBindTexture(GL_TEXTURE_2D, buff[index].texture);
// Start texture transfer. This is async call. It returns immediatly after call. Uploading are NOT finished
glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, XRES, YRES, GL_BGRA, GL_UNSIGNED_INT_8_8_8_8_REV, buff[index].ptr);
// Set Fence
glSetFenceNV(buf[index].fence, GL_ALL_COMPLETED_NV);
buf[index].status = TRANSFER;
When you want to draw texture from some buffer you have to test is transfer finished.
if (glTestFenceNV(buf[index].fence))
{
// uploading is finshed... you can render texture
buf[index].status = FREE;
}
else
{
// do something else
}
Depending on you ImageSize uploading can take 2-50ms, but your CPU is free to do something else.
For example if you do video playback you will get delay 2-3 frames but your CPU can deal with decoder.
If your app really need to render current uploading image CPU must wait until transfer is finished, so you have to use glFinishFenceNV() call. If you really have to call this function than you don’t need PDR (it is same as classic glTexSubImage2D sync call codepath).
Note that if transfer are still pending and you change data in this buffer you can expect currupted texture data. CPU can copy to wgl mem buffer MUCH faster than GPU can copy it from wgl mem to texture.
In my player application I spend 2-3 720x576x32 buffers and I have 2-3 frames delay, but playback CPU usage are the same in my app and in MediaPlayer (less than 20% for MPEG2). Without PDR my player spent more than 60% CPU time.
Code was written online so it may have some errors but clue is there… 
You can find PBO example in it’s spec
PBO spec
yooyo