CopyPixels broken?

I just hit an old code path where I did a CopyPixels on a 256x256 (32bpp) backbuffer area. The only potential culprit I can find is my use of glPixelZoom(1,-1), as I needed the rendered part upside-down in a texture (that I later got by glCopyTexImage2D).

I was more than a little surprised to find out that glCopyPixels took almost 50ms (!) using a 7600 (+93.71). That evaluates to roughly 10MB/s transfer speed! IIRC it ran at least 25-50 times as fast on an age old ati 9600.

I have checked states and I think they are all as expected, but even if I everything and its granma had been enabled… reaching a speed as low as 10MB/s? (for the curious - no, I haven’t underclocked the card to 2.5MHz :slight_smile: )

I did an emergency hack where I now simply glReadPixels the area, flip it in system memory using CPU, and finally uploading this to the texture. This lame hack turns out to be probably at least two orders of magnitude faster than a simple flip-upside-down blit that the card should be able to handle, at my estimate, at least 10.000 of/second (which may be grossly underestimated).

Can anyone think of some state, or really just anything, that could produce such truly horrible excuse for performance? I mean, when the most naive approach turned out to be around a hundred times as fast, something must be quite seriously broken here and I want to get to find the root of this.

Anything but a direct copy is probably using some non-trivial path. Maybe pixel zoom just isn’t optimized as much as it should.

But why are you drawing it upside down and then flipping it? Can’t you draw it upside up and not have to flip it in at all?

But why are you drawing it upside down and then flipping it? Can’t you draw it upside up and not have to flip it in at all?
I figured the question would surface. I’m not drawing it upside down, or rather I am doing it all over the place (flipped projection). It’s a Q&D hack for an old DX7 engine to get it to work in GL. As I currently can’t modify src data to flip neither texture coordinates nor models (it’s still needed for the DX7 path, especially when running DX and GL in parallel to verify behaviours), I need to readback this image “flipped” into the texture (it’s for an MFD in an airborn vehicle).

Suffice to say it’s a mess and needs to be rewritten from scratch. In the meantime though, I’m trying to figure out how a 7600 can reach levels of speed one probably can’t even reach if doing it using Mesa3D in software mode on a 386 (I haven’t tested :slight_smile: ).

If you can’t do anything about the texture coordinates, can you do something with the texture matrix?

glTranslate(0, 1, 0);
glScale(1, -1, 1);

Or maybe you need to scale before.

Bob, it’s only this single texture I need to “flip” from FB -> texture. Unfortunately I can’t single out this texture and flip its matrix (even if I could I wouldn’t do it as it would make this hack even more ugly). To work “properly” the engine has to be rewritten, but that will have to wait.

I do appreciate the ideas to solve this problem in other ways, but for this case my hands are quite tied (as I aim for cross-platform, pbuffers are out, and as this is intended to run on even TNT2-level h/w FBO hasn’t even been implemented by the vendors). So while I, like any programmer, appreciate alternative solutions, right now I’m just trying to figure out how the h*ll 7600 can be this slow just copying a 256x256 backbuffer area (non-overlapping, just to make it clear) while y-flipping it.

Bob answered it already, PixelZoom is not a widely used feature so it will be implemented in software by the driver (and noone knows how inenfective this implementation is). Maybe the ATI drivers can handle this particular case better, or maybe ATI actually has hardware support for it (would surprise me, as this is utterly useless).

Did you try to NOT use glPixelZoom? Grab frame as usual and do flip on CPU, or force somehow rendering to pbuffer and then draw screen aligned quad with flipped pbuffer texture on backbuffer and grab it.