Fast copy from back buffer to back buffer?

When I programmed on SGIs years ago there was a fast way to copy a block of pixels from part of the back buffer to another part of the back buffer.

Now I’m programming on PCs with NVIDIA and ATI cards. Today it’s an NVIDIA Quadro4 900 XGL.

I thought glCopyPixels was a fast way to copy the contents of the current read buffer to the current draw buffer at the current raster position.

For some reason I am getting way better performance with glDrawPixels, with an image in host memory, than I am from glCopyPixels, where the from and to are both in AGP mem. That is completely the opposite of what I expected.

A quick review of the docs says that glCopyPixels is changing each pixel color component to float during the transfer, which I’m guessing is the cause the slow down.

Is there any way to turn off this conversion to float and just make a fast pixel copy (BLiT) from/to the same buffer in the same format?

I noticed the pbuffer extension spec conformance test lists a step as “Blit from one buffer to the other” so I’m hoping this blit-like copy exists and I’m just missing something obvious.


[This message has been edited by robosport (edited 03-03-2004).]

CopyPixels is fast if there’s no bias, scaling, depth buffering, or something which is not a 1-to-1 copy, that is, if it’s a simple color blit.
There won’t be a conversion to float for such case, that’d be silly.

>>where the from and to are both in AGP mem.<<

and is impossible. At least one buffer lies in video memory. Using glDrawPixels, the destination, using glReadPixels the source, using glCopyPixels both.

I think you’d have a better luck creating a temp. texture, doing a glCopyTexSubImage2D, and then creating a 2D quad with this sub texture where you want to display it.


I don’t think so.
A simple 1-to-1 glCopyPixels from back to front for example should be faster than a glCopyTexSubImage from back and glBegin(GL_QUADS);… to front.

In theory it should be faster, but as i don’t think it’s a code path frequently used by developers, it might not be super optimized by some drivers. To be safe you should test both.


From my benchmark:

5900 Ultra XP2000
glCopyPixels 4344 MB/sec
glDrawPixels 504 MB/sec
glCopyTexSubImage2D 3420 MB/sec

9800 Pro XP2500
glCopyPixels 1224 MB/sec
glDrawPixels 188 MB/sec
glCopyTexSubImage2D 5080 MB/sec

[This message has been edited by Adrian (edited 03-03-2004).]

Interesting. Is that pure copyteximage performance or was the texture actually used afterwards?
If not, what remains at the end if the texture is used to draw the exact same rectangle the copypixels blitted?

Originally posted by Relic:
Interesting. Is that pure copyteximage performance or was the texture actually used afterwards?
If not, what remains at the end if the texture is used to draw the exact same rectangle the copypixels blitted?

It’s pure copyteximage performance.

Thanks for the quick replies. That glCopyPixels benchmark is exactly the kind of performance I was expecting (and have seen in the past) from glCopyPixels.

Unfortunately on my Quadro4, with one of the latest drivers, it is really slow (multiple times slower than glDrawPixels).

Am I using glCopyPixels incorrectly? Here is the code for a back buffer to back buffer blit:

// trying to copy a square block from bottom left corner of buffer
int blockWidth = 200;
int blockHeight = 200;

// already set up ortho projection that matches screen/pixel coordinates

glReadBuffer( GL_BACK );
glDrawBuffer( GL_BACK );

// going to blit it just to the right
glRasterPos2i( blockWidth, 0);
glCopyPixels( 0, 0, blockWidth, blockHeight, GL_COLOR );

// swap buffer when completed to see result


What figures do you get with my benchmark?

What kind of MB/sec are you getting in your application?


What a cool benchmark utility.

Here are the relevant numbers on my Quadro4:

   glCopyPixels 3784 MB/s
   glDrawPixels 660
   glReadPixels 155

glReadPixels(PDR) 190
glCopyTexImage2D 2297
glCopyTexSubImage2D 2298

Must be something else wrong because the code I posted up above is only giving me a dozen blits per second at most from/to back buffer (with no swapping between the blits).


You didn’t happen to have FSAA on before did you? I just noticed a comment in my benchmark from when I had a GF4.

“// Copy pixels is running really slow with FSAA on for some reason (on a GF4600)”

These are the numbers I get now, I think the copypixels performance used to be a lot worse with FSAA on.

Readpixels doesnt seem to like FSAA at all.

glCopyPixels 4352 Mb/sec
glDrawPixels 505 Mb/sec
glReadPixels 156 Mb/sec
glReadPixels(PDR) 203 Mb/sec

glCopyPixels 1550 Mb/sec
glDrawPixels 470 Mb/sec
glReadPixels 19 Mb/sec
glReadPixels(PDR) 24 Mb/sec

glCopyPixels 1 Mb/sec
glDrawPixels 372 Mb/sec
glReadPixels 19 Mb/sec
glReadPixels(PDR) 20 Mb/sec

[This message has been edited by Adrian (edited 03-03-2004).]

This post might be relevant.


Good catch. I am using the mutlisample extension. That’s the problem.

Now that I know, I can avoid glCopyPixels and glReadPixels when using the multisample extension.

Thanks again.

Can NVidia comment on this issue please?

This problem has been around for at least a year now so it seems that there is no easy fix.

This is a major issue since users can virtually break applications that use copypixels and make those using readpixels unusable.

It would be useful to have a document that explains what the problem is, why it is occurring, will it ever be resolved and what developers should do to minimise the issues.

The obvious temporary solution is to switch multisampling off but this doesn’t prevent users from forcing it on. OK users are warned when they force FSAA on that some apps may not work correctly.

It should be mentioned in this FAQ. in the “I’m using glDrawPixels and glReadPixels in OpenGL. I’m seeing poor performance. What should I do?” section.

I’m interested to know if ATI cards have a similar problem, can someone run the benchmark and report the readpixels/copypixels speed with and without FSAA on? Thx.

Incidently on my system CopyTexImage is also affected by FSAA but not so badly.

Radeon 9600 Pro results.

glCopyPixels 867
glDrawPixels 51
glReadPixels 104

glCopyPixels 59
glDrawPixels 51
glReadPixels 39

glCopyPixels 60
glDrawPixels 51
glReadPixels 39

Catalyst 4.10.

[This message has been edited by ml (edited 03-04-2004).]

Interesting that ATIs have the same problem.

Is there a reason why you want to copy from one part to another?

It might be faster to scizzor test regions and render the same thing many times. Seems like an obvious solution.

glCopyPixels 1 Mb/sec
glDrawPixels 372 Mb/sec
glReadPixels 19 Mb/sec
glReadPixels(PDR) 20 Mb/sec

That’s incredibly bad for glCopyPixels.
There must be something wrong with the benchmark.

In theory, there shouldn’t be much difference between FSAA and non-FSAA except for the fact more data needs to be moved.

This is sad. Isn’t glCopyPixels suppose to be blit function?

NVidia’s copypixels benchmark shows the same slow performance with FSAA if I change glCopyPixels(size, 0, size, size, copyType) to glCopyPixels(0, 0, size, size, copyType)

Running NVidias readpixels benchmark with FSAA initially showed normal performance. I compared their benchmark and mine and the only difference was that they run in windowed mode. When I Added glutFullScreen() and switched on FSAA the performance was as slow as with my benchmark.

[This message has been edited by Adrian (edited 03-04-2004).]

Clarification… this isn’t just slowing down when FSAA is turned on via driver settings.

This glCopyPixels slow down happens when the context is pragmatically using multisampled antialias as well. i.e. WGL_SAMPLE_BUFFERS_ARB with GL_MULTISAMPLE_ARB enabled.