Readback texture from FBO (using PBO?)

I’m using FBO to render to several textures. Now I need to readback some states of rendering buffer(FBO textures) to host memory (PC RAM) when I’m rendering. I have read some brief from nvidia (Fast Texture Downloads and Readbacks using Pixel Buffer Object in OpenGL) how to use Pixel Buffers to faster read out the frame buffer and download data to texture.
My question is, if it is usefull use FBO and PBO TOGETHER to download and readback textures data. And what is better to use glGetTexImage or glReadPixels to read out FBO texture?

You could use PBOs and glReadPixels; this function reads from the currently bound read buffer. However, this will likely not give any greater performance than using PBOs and glGetTexSubImage.

The main drawback to PBO and glReadPixels from an FBO is that you can not change the FBO while the read pixels is going on. If you bind a different FBO, the asynchronous read will have to synchronize (probably). This will not be the case with glGetTexSubImage.

In my experience I’ve found that glGetTexImage is much slower than glReadPixels. Before FBOs I’d draw the image to a pbuffer, then read using glReadPixels rather than use glGetTexImage.
I also havn’t seen any stalls by using glReadPixels with FBOs and PBOs…

I think a driver always knows whether you read from a texture even with glReadPixels and PBO bound, so there should be technically no stalling at all. All PBO reads and writes are asynchronous by definition unless you map the PBO too soon.

However, I recall there were unnecessary synchronizations in early ATI drivers which made PBO unusable (should be fixed now).

These unnecessary synchronizations in ATI driver weren’t probably fixed. I’ve tested it on ATI Radeon 4850 with Catalyst 9.6 and I’ve found this:
When I postpone the glMapBuffer for one frame I get 2-5ms stall (I’m reading ~270Kb). When I wait two or three frames the times are much worse.

But when I call glMaBuffer right after glReadPixels the times are 0-1ms. But the stall still appears in 70% of readbacks.

Usually I postpone glMapBuffer for three frames and this works perfectly on NVIDIA cards.
Can anybody confirm this behavior or suggest a different way how to avoid this weird problem?