FBOs and performance (again)

This is semi-related to my previous topic.
Im curious as to what other ppls experiences are

tests @1440x1080 nvidia geforce9500 (512mb) 128bit bus winXP

framebuffer noAA 117fps
FBO noAA 107fps

framebuffer 4xAA 99fps
FBO 4xAA 27fps!!!

framebuffer 4xAA + using copytexsubimage to copy FB into a texture 76fps

So whats up?
It seems like with decent resolutions’s + AA Im much better off just rendering to the FB + then doing a copytexture, which seems cockeyed to say the least.
What are other ppls experiences on various hardware?

NOTE - if I drop the resolution down to 640x480 then framebuffer 4xAA + using copytexsubimage + FBO 4xAA are about the same speed

ta zed

on my old ati 9600se
which didn’t really support non power of two textures
i made a FBO of size 2048x2048
then i rendered at something like 1024x786
performance was like 30fps … where as regularly it was 100+
I found out how to fix the bottle neck, just scissor the viewport
(ie glScissor)
clearly some bottle neck somewhere

On my nvidia card the scissor thing hasn’t really made any difference.