downsampling a texture

this is how i have performed bloom so far:

  • render to bilinear filtered fbo, 1/4 size of the screen
  • blur this several times (vertical/horizontal)
  • add to final image

this gives very blocky results, esp. when moving. i read in some article that i have to downsample the image, but i just thought the bilinear filtering would to that for me. i was wrong i guess? i also tried glgeneratemipmapsext(), but then my fps dropped to 30%…

so how would i have to do that? fetch multiple texels in the fs?

Screenshot ?

How do you perform your blur passes ?
Sample texture FBO multiple times, between texels for bilinear interpolation (4 more samples), with gaussian weights ?

When you are saying 1/4 size of the screen, do you mean exactly that, or are you using 1/4 width and 1/4 height? If the later you will get blocky results no matter how much blurring.
For optimal quality AND downsampling blur first, then downsample, then blur some more.

ok let’s do it step by step:
i setup two fbo’s, the first one has the same width and height as the screen, the second 1/4 of the width and 1/4 of the height.
i render my scene into the first buffer.

then i render a fullscreen quad with the first buffer’s texture as input. the shader in this pass just reads from this texture.

left: original scene, right: fullscreen quad:

what to do next? how do i have to read from the first buffer to achieve better quality?


My approach:

  1. Render scene to 1280x1024 texture (to texture A)
  2. Downsample 4x to 320x256 texture (to texture B)
  3. Blur 320x256 texture horizontally (to texture C)
  4. Blur 320x256 texture vertically (to texture B again)
  5. Draw a fullscreen quad: 1.0 * texture A + 0.01 * texture B

I use 2048x1024 texture. You can use texture rectangle if you want
On GeForce 6 I downsample 4x in one pass - shader takes 4 samples, each sample is at the corner of 4 texels - so I’m averaging a block of 4x4 pixels in one pass.
On Radeon X1 I downsample 2x in first pass to 640x512 texture and 2x in second pass - shader averages a block of 2x2 pixels each time (there is no FP16 texture filtering on Radeon X1)
I’m performing 11 texture fetches wchich gives me 21-pixel kernel.
There is one fetch at center pixel and 5 fetches on each side. Every one of these fetches samples between 2 texels, but not exactly at center, giving wieghted average of 2 texels at a time.
Sampling coordinates are as follows:
-9.4, -7.4, -5.4, -3.4, -1.4, 0.0, +1.4, +3.4, +5.4, +7.4, +9.4
As you can see - each sample takes 40% of one pixel and 60% of another. Weights for every sample are as follows:
0.01 | 0.04 | 0.08 | 0.15 | 0.30 | 0.30 | 0.30 | 0.15 | 0.08 | 0.04 | 0.01
This gives final texel weights:
0.004, 0.006 | 0.016, 0.024 | 0.032, 0.048 | 0.060, 0.090 | 0.120, 0.180 | 0.300 | 0.180, 0.120 | 0.90, 0.60 …

These weights do not sum up to 1.0 - I’m darkening the image later a bit.

As you can see I’m not performing any brightness adjustments.
Also, note the contrast between the original picture and blurred overlay - I just don’t like it when every white object glows. Does your monitor bloom when you read black text on white background? If something is white it will lighten neighbour pixels by 1% and that’s it.
If something is 100x brighter than white then you’ll see bloom.
You can see it in my game. Sun and direct sun reflection are capable of producing pixels 10000x brighter than white.

Many games use some theshold as suggested in the original HDF paper by NVIDIA. If something is 2x brighter than white it will bloom alot.
But I remember one thing they wrote in that paper explaining why use HDR: “Reality is not clamped to 0…1, neither should be CG”.
I say: “There are no ‘bloom thresholds’ in reality, neither should be in CG”.
So, pure filter, no bloom boosts - it has to be much brighter than white to blind you a little and to cause bloom effect.

You can see it all in action in my game.

then i render a fullscreen quad with the first buffer’s texture as input. the shader in this pass just reads from this texture.

I assume this quad is rendered to your smaller FBO.
At this step, the shader should sample theorically 16 adjacent texels from the original large texture to average them.
The fastest is to take advantage of bilinear filtering, to sample only 4 times, but each exactly between 4 texels, so to average 16 texels in one.

Or try a 1/2 FBO, and sample only once, between 4 texels.
Performance wise it might be better or worse.

wow thanks for the detailed explanation! :slight_smile: i tried your filtering kernel and the result is indeed impressive, but i noticed some artifacts at the line where the two triangles of my fullscreen quad meet (i actually render a quad, but the driver is probably separating it).

  1. Downsample 4x to 320x256 texture (to texture B)

that’s the step that’s still missing. how would the kernel for such a shader look like?

I can’t recommend the 4x downsampling. Spots with high lightintensity (specular highlights) will start to flicker when the camera moves.
It may work with the usual LDR scenes, but its just horrible for HDR scenes with real HDR-data from cubemaps or lightmaps.
I would use a 2x downsample and then add more blur to it to keep the quality as good as possible.

Make sure to get the texel-offsets right, I had some trouble with that (thread)