Glow & Texture scale/bias & Blur on old hardware

Hi there,

I’m working on a multi platform, arcade, freeware OpenGL game. We want our game to run on old computers too. That’s the reason I decided not to use shaders and extensions. I’d like to add a HDR-like glow effect to our 2D-engine. I tried those two approaches:

  1. Render everything. Copy back-buffer to texture (using glCopyTexSubImage2D). Add said texture multiple times to the screen.

  2. Render everything at 25% resolution. Copy back-buffer to texture (using glCopyTexSubImage2D). Re-render everything at screen resolution. Add the texture multiple times to the screen.

Approach 1 won’t run fast enough on fill-rate-limited integrated cards (like Intel GMA 965, …) at resolutions 1024x768 and upwards.

Approach 2 doesn’t look as good and our engine has to render everything twice.

Is there a faster way that will work on old cards? The computers I tested on (some Notebooks and old computers) spent most of their time in the glCopyTexSubImage2D.

On a related note:

  • I wanted to bias/scale the texture to change the appearance of the glow. glPixelTransfer is non-accelerated on most HW. Is there another way to increase the contrast of the texture? Scaling *3 and biasing -2 gave nice results but is slow as heck.

  • I draw the texture multiple times to blur it. Is there a better/faster way to blur without using shaders?

Thanks for any tipps.

Simon

Even cheap HDR is costly.
There was an interesting pdf detailing how fake HDR (in fact this more like bloom) was done in the game Tron 2.0, basically it is a variant of 2) : only glowing parts are drawn, sometimes even without texture.
Then they did a blur pass, before adding it to the normal scene, which allows to hide the low resolution.

Can’t find this pdf, this page is less detailed but will help you get the idea.
http://http.developer.nvidia.com/GPUGems/gpugems_ch21.html

The difficult part without shaders is the blur.
With simple blending, you can render the low res texture multiple times with different alpha coefficients (still low res), then copy to texture the blurred rendering, then continue with Re-render everything at screen resolution. Add the texture only once to the screen.

This way all the costly operation (copy tex, blur) is done at low res, and only the normal scene + overlay quad is done full resolution.

For cheaper blur, remember to sample between texels (free 1 pixel wide blur), and use multitexture (even old cards can do 4 textures at once) to do each 1D blur with only one pass.

Thanks for the very interesting link.

Hm… In my benchmarks my code spends most of its time in the glCopyTexSubImage2D function *). Wouldn’t your suggestion need a second call to this function?

*) When polycount is low and texture resolution is at least 512x256.

reminds of my attempts to make tangent space bump mapping in TNT2. It actually worked, real-time after 3 passes…LOL!

Don’t use glCopyTexSubImage2D at all for best performance.

  1. render your scene as normally at high resolution to FBO
  2. set viewport size to something like 256x256
  3. render a full screen quad textured with said FBO to a 2nd low res 256x256 FBO (can sample more than one texel for downsampling to avoid flicker)
    You now have a 256x256 texture of your original scene, but no blur yet.
  4. Render to a 3rd 256x256 FBO a full screen quad sampling from the 2nd FBO, but sample x-5 to x+5 texels with as result a horizontally blurred image
  5. Render again a full screen quad sampling from the 3rd FBO back to the now unused 2nd FBO, sample y-5 to y+5 to make the vertically blurred image also horizontally blurred.
    You now have a 256x256 blurred image.

Now render to your final full resolution framebuffer a full screen quad, sample in a shader from your original full resolution FBO and combine with your blurred low resolution FBO (combine this in a shader rather than blending for best performance).

So in a nutshell, 1) downsample full res to low res, 2) blur horizontally, 3) blur vertically, 4) combine with fullres in shader.

FBO + GLSL FTW! Don’t bother about glCopyTexSubImage2D or pbuffers.

And Google is your <s>friend</s> comrade, there are some excellent tutorials out there that explain this with illustrations.

He doesn’t want to use extensions. Really I’ve often thought that even an old card with a lot of fill rate could do wonders! But since this is not the case I think you should either disable the feature for low end cards or raise your minimum spec and use modern features. Besides it’s not as if you couldn’t, for instance, implement bump mapping with register combiners, it’s simply a lot easier with shaders.

My bad, I’m blind.

glCopyTexSubImage2D FTW!

That is true, 2 copytex calls minimum, 3 to take advantage of the separable blur.
But as i said the blur allows the use of a much lower resolution texture.

I am surprised the copytex is so slow, I did tests a long time ago with an old Geforce3, and it was pretty quick, 12 times copytex on a 512*512 texture was above 30fps (so less than 2.8 ms per copy) : http://dedebuffer.chez.com/

Maybe it was when using glPixelTransfer ?

How to do the tex*3-2 without slow glPixelTransfer and without modern hardware :

// to stay within the clamped [0-1] range, we have to substract before multiplying :
// output = input * 3 - 2
// output/3 = input - 2/3
// output = (input - 2/3) * 3
//do this once low res scene has been drawn, and before first copytex :
glEnable(Gl_BLEND);
glBlendFunc(GL_ONE, GL_ONE);
glBlendEquation(GL_FUNC_REVERSE_SUBTRACT); // supported even on Voodoo 3 !
glColor3f(0.666666,0.666666,0.666666);
coverSceneWithQuad();
glBlendEquation(GL_FUNC_ADD); // back to default blendEq if needed
glBlendFunc(GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA); // back to default blendFunc if needed

Then you will still need to multiply this result by 3, but the blur pass give plenty of occasions to do that (ie. draw textured quad 3 times with glBlendFunc(GL_ONE, GL_ONE) ).

Great tip. This works like a charm. I noticed however that Intel drivers only support the EXT versions, not the ARB ones. But that’s not really a problem.

Thanks for the idea!

For a really cheap blur, why not directly use a lower mipmap. An alternative could be LOD bias - but really it’s the same thing.
This is definatley the fastest blue going - alhough not the best quality.