Burning Performance in DrvCopyContext?

Hi all,

i have an application that extensively uses
FBOs and fragmentshaders.

Video data is copied into textures and then
ping-pong - manipulated until the final result
is displayed.

i switched to an amd64 system and a geforce fx 5200
card - and noticed, that the application uses
~40% of the cpu performance even when no video
data is processed at all!

i thought, using FBOs would reduce expensive
context switches.

does anyone know how to track down where
these DrvCopyContext calls are triggered,
and how to reduces them?

best regards,

(ps. using nvidia driver 81.89)