Pixel Buffer Objects are very slow no?

Is it just me, or are they very slow for 2d rendering of any sort?

I have a image ‘pipeline’ set up that lets me switch from regular glDrawpixels(I.e from a system mem bank) to glDrawPixels from a binded pixel buffer object.

Firstly, glDrawPixels is abnormally slow for a g5, but going on what I’ve read up on, I never expected it to be fast. But pixel buffer objects not only do not speed things up, they slow things down at least 50%.

I really don’t want to go the texture route as this means wasting vid mem to make images conform to texture sizes across older cards.

Are pBuffers helpful for speeding up 2d? IT’s the only area I’ve left untouched, after pixel buffer objects I figured it was pointless…is it?

You have to use accelerated formats (BGR or BGRA instead of RGB or RGBA).

See this page: http://developer.nvidia.com/object/General_FAQ.html#p1

Hm, thanks for the tip, but doesn’t make any difference. I can’t render 25 128x128 images per frame without crippling it to around 2fps using pixel buffers. get around 28fps using regular system mem.

Definitely not my card, as it can throw about thousands of images in languages like blitz which use directx7 for their 2d. Is it just a limitation of gl that means it’ll never match directx for 2d speed even using pixel buffers?

Can you post some code…


I use this to create the buffer, uploading a image already loaded into system memory.


and then render time I use,
glDrawPixels2 img\w,img\h,GL_BGRA_EXT ,GL_UNSIGNED_BYTE,0

to render the image. This runs at around 2fps…even system images runs at 28 using bgra(Which is still pretty awful)

PBO is not meant to be a 2D rendering system; use textured quads for that. PBO is meant to allow for async pixel transfer operations.

Korval is right. Instead of glDrawPixles, use glTexSubImage2D and upload frame to texture.
I suppose you have “non power of two” video file so you can use NV_texture_rectangle or EXT_texture_rectangle to keep memory waste.

Remember when you use PBO or PDR all image data transfer are async. This mean after glTexSubImage2D call driver initiate DMA transfer and return immediatly.

When you try to copy new frame in PBO buffer maybe previous frame are not yet uploaded and overwriting PBO data can damage your texture.


You’re still going to find that glTexSubImage2D() on a textured quad is FAR slower than what you’re accustomed to doing with DirectX. DirectX allows you to blit 2D pixels directly to video memory which is very nice. For some unknown reason, OpenGL has always had slow implementations of glDrawPixels, which I can’t understand. So you have to settle for the hack of using a textured quad, which means you have to waste (lots) of time calling glTexSubImage2D just to render 2D images on your display. Using glTexSubImage2D with textured quads will be a lot faster than glDrawPixels, but it will be nothing like the speed you get in DirectX when directly blitting to video mem.

This is the one area where OpenGL is clearly inferior to DirectX and it appears that no one cares to fix it (by making glDrawPixels fast, for example.)

I really don’t care about DX capabilities. Using OpenGL and PDR I can upload up to ~1.8GB/sec on AGP8x systems and NV hardware.


Originally posted by Claytonious:
You’re still going to find that glTexSubImage2D() on a textured quad is FAR slower than what you’re accustomed to doing with DirectX…
Ok I’ll bite…

Based on information I’ve read here it’s my understanding that frame buffer read/writes are slow due to synchronous transport over the agp bus causing pipeline stalls. Is this correct?

If so how would the graphics api have any substantial affect on the problem or is this bs?

Originally posted by Claytonious:
… For some unknown reason, OpenGL has always had slow implementations of glDrawPixels, which I can’t understand. …
glDrawPixels is not slow at all if you carefully set up the renderer states in the right manner beforehand. The problem is that glDrawPixels is not a plain blit to the framebuffer, but it generates fragments which pass through the entire fragment pipeline (texturing, the depth/alpha/etc. tests and so on) in exactly the same way as the fragments generated by the rasterization of the points/lines/polygons. This is according the OpenGL spec, but most hardware out there isn’t capable of such operation, so most commonly the drivers do it in software. But if you turn off all the per-fragment operations and the texturing (including any other exotic stuff as fragment programs, etc.) then the operation becomes a simple blit, for which the driver is capable, and you get really decent speeds.
It was a while ago when I tested this under GF2. Then I examined the conditions for the operation to be accelerated for GF2 but I lost the list. As I said, it included no-texturing and no-tests. When the conditions are met, the speed is not less than any directx/whatever other way of doing the same blit.