How to optimize glCopyTexSubImage2D?

I’m wondering what states ( like glPixelTransfer ) must be set and what texture internal formats must be used to optimize glCopyTexSubImage2D operation?
About texture internal format: I use RGB8 for 32bit screen mode.
About glPixelTransfer: I use default settings that seem to me to be optimal.
But I’m afraid that some operations still performed. Because when I tried to scale a red component for example I got the result and no farther FPS drop! So looks like driver anyway multiplies by 1-s and adds 0-s and do many other things like that instead of just coping pixels as they are.

So somebody said that he managed to make 512*512 glCopyTexSubImage2D with 100fps. How could this be achieved?


Additions. Tested platforms:
WinNT 4.0 FPS drop from 20 to 6 ( Creative driver, NVidia driver failed to install on my HP Kayak ).

Win2000 FPS drop from 20 to 17 - much better! ( NVidia driver )