What is currently the fastest box filter technique?
I’m looking to implement blurs that work well (fast)
for radii of 50 pixels and beyond.
I’m familiar with the separable 2-pass methods but
the number of texture samples the shader needs to
take (e.g. >50) is not feasible here.
I’ve also tried using multi-pass mip-mapping: using
LOD bias and auto mip-map generation, and doing it
several times. But the result doesn’t look good enough.
On the CPU, it is possible to have a linear time
box filter (independent of width) by a 2-pass
method, where a scanline ‘accumulator’ scans thru
the X, and then Y axis, adding a new sample from the
right and subtracting old sample from the left. Is
it possible to implement this on the GPU? A naive
implementation would be to have 2 FBOs: one w x 1
and one 1 x h. That’d require rendering w * h quads
which I think is going to be too slow.
For an example of the linear-time scanline accumulator method: