Curious readback performance

While putting together a graph to test the overhead of reading/writing non-contiguous subrectangles of a chunk of system memory (using glPixelStorei which I discovered today), I came across a curious effect:


Ignoring the Whole/Subrectangle disparities (they’re near enough the same performance as I’d expected), what’s interesting is the huge increase in readback performance at 1024x1024x1, 2048x2048x1 and 4096x4096x1 texture sizes. This example uses a FBO and glReadPixels with an appropriate read buffer to retrieve data back to main memory.

I’m not using the PBO extension yet, mostly because I’ve never been able to get good performance out of it (usually slightly slower or on par with my regular performance). But I wonder what causes such massive improvement in readback at power of two sizes, and if a PBO might impact this?