Diffuse convolution filter

Has anyone an idea how to diffuse convolution filter a cube map with hardware in (near) real time. It does not need be 100% accurate, a good approximation is enough.

I have tried these two ways, but there are both more or less too slow:

Take stochastical samples from the cube map in fragment shader. It works, but over a million samples is needed to get tolerable results.

Render the a cube from a point of view of each texel to a floating point buffer using near 180 deg fov and read the result back using glReadPixels. Them sum these pixels to get the filtered color of that texel. If the result cube map is 63232 texels and the used buffer is 128*128 then 100 million pixels are read back, which is too much (plus the the time it takes to sum them with a simple loop).

Any better ways?

There’s an article by Gary King in GPU Gems 2 on doing this using spherical harmonics:
http://developer.nvidia.com/object/gpu_gems_2_home.html

Note that a source cubemap resolution of 6x32x32 is probably much more than you actually need - the diffuse convolution is effectively a very wide blur. You could probably filter that down to 6x8x8 and not be able to tell the difference.