Anyone obtained results with EXT_pixel_buffer_object?

Hi,
I’m trying to use PBOs to upload texture data every frame. I’m using an Nvidia Geforce 6800 Ultra (PCIE) and a Geforce 5750 (also PCIE). I’m not looking for any amazing transfer rates, but rather I’m more interested in uploading the data asynchronously so its ready in video memory when I want to render it. The problem I’m having is that it’s a lot slower then just using glTexSubImage2D. I’d hope it would be atleast the same speed.
The code is wrapped in classes so I’ll just give a rundown of what I do here.

One time at the start:
I glGenBuffers to get an index and then call glBufferData with a NULL pointer to allocate the space. The buffer is created as a STREAM_DRAW buffer.

Every Frame:
I bind the buffer, I call BufferSubData with a pointer to my data, and then I call glTexSubImage2D with a NULL pointer. The format of the source image is BGR.
Then I unbind the buffer.

For some reason the glTexSubImage call blocks for a very large amount of time. It is my understanding that this call should return extremly fast when using PBOs… I realize I may need to block somewhere farther down the line if the data isn’t done being uploaded to the video card when I want to use it, but I shouldn’t be blocking here should I?

I’v tried using the map/unmap method also, but that yields the same huge block in the glTexSubImage2D call.

For the 5750 I’m using the latest drivers (66.93)
and for the 6800 I’m using the 66.74 drivers.

Is this extension not ready for implementation yet or am I missing something?

Thanks

Malcolm

This post might be useful for you. It talks about asynchronous readback using PBO instead of texture uploading but the points made in the thread may still be relevant.

http://www.opengl.org/discussion_boards/cgi_directory/ultimatebb.cgi?ubb=get_topic;f=3;t=012645

An update: I tried BGRA and I get very good results, this seems to be the only format that works. BGR doesn’t, which surprises me…

Btw, by ‘work’ I mean asynchronous uploads. In all cases the image gets updated correctly, its just really slow sometimes…

most cards only support a small number of internal formats. My guess is, if you don’t hit one of those formats, you don’t get DMA. It’s perhaps a little surprising that the fallback is actually worse than the normal path, but perhaps there are synchronization issues to contend with.