dual rgba8 still faster than one rgba_16f on G80?

I just moved this, I accidentally posted it to the wrong forum…

I have setup 2 frame buffer objects for rendering HDR in my current project. One has 2 rgba8 format textures, the other 1 rgba16f texture. I render to those + depth, then bloom and tone map. Anyway, I did it this way because the engine we are using used to only support rgba8 formats, I have added support for rgba16f. I am surprised to see that the double rgba8 multiple render target setup is still filling faster than the single rgba16f. I was wondering if I’m doing something stupid…

These are all multisample buffered with converage sample support as well. So there are 2 such fbos, one multisample version, and a normal version for the resolve blit. And we are rendering in stereo. None of these are getting mipmaps generated (I checked for fear that was the problem). Could I be running out of fast fbo memory on the 16f setup? Any thoughts?

The target hardware is the G80 based quadro fx 5600