Blending multiple textures efficiently in a shader

scratt · October 19, 2008, 11:08pm

I have 4 or 5 textures I am blending over a gradient in my fragment shader.

Basically what I am doing is something like this…



// col and blender already defined at this point...

	if(blender < HighGroundBlend) col = mix(col, texture2D(Shore, txIndex), smoothstep(SeaBlend,MidGroundBlend,blender));
	if(blender < PeakBlend) col = mix(col, texture2D(MidGround, txIndex), smoothstep(ShoreBlend,HighGroundBlend,blender)); 
	col = mix(col, texture2D(HighGround, txIndex), smoothstep(MidGroundBlend,PeakBlend,blender));
		
	col = mix(col,gl_Color + gl_SecondaryColor * texture2D(BaseTex, gl_TexCoord[0].xy),0.5);

	gl_FragColor = col * BumpFactor;

Ugh! Right? I know it’s fugly, and in my gut know there must be a better way.

Ignoring the fine points of the logic, some of which has been tweaked so some of the variable names don’t make much sense if you dissect the code: Basically I am cumulatively blending texture0, to texture1, to texture 2, to texture 3 using a lot of mix commands and a bias.

Without the conditions it runs much slower, although I hate having any conditions in shaders.

Can anyone suggest a better method, or a text where I can find a better method.

Thanks.

Brolingstanz · October 20, 2008, 12:58am

I’m not quite sure what your question is, scratt.

I agree that when something is fugly there’s usually a better way, but that could mean anything from a simple tweak to a complete overhaul.

scratt · October 20, 2008, 1:09am

Thanks for your reply…

Basically doing a bunch of mix commands, with smoothstep causes a massive slow down for this shader… Not surprisingly.

I have lots of ‘options’.

1(a). Doing some of the blending at the CPU level when I create the height map. (Probably not wise - and quite complex)
1(b). Providing some pre-calculated data to the shader in the alpha section of a texture height map I pass into the shader. (This will get me about 1 - 2 fps back as I may be able to avoid interpolating some info between the vertex and fragment shader.)

Those will only garner a small seed increase, and the first one is fraught with extra problems which the OpenGL texture engine removes for me.

Reducing the number of textures I blend. (The logic in the shader does this to some extent on the fly)
Playing with the logic and blending to get the optimum out of this method.
Relying on the fact that I am running this on an X1600 and it will be released in about 12 months, so it will actually run on much more capable hardware.
I am sure I can think of more!!

What I am looking for is if anyone with a bit more experience can spot a flaw in my logic, or knows of some algorithm for blending multiple texture sources that I don’t know…

Going ‘out there’ for a moment… In a dream world there would be a mix-from-multiple-textures shader command, which does the whole thing in hardware in one cludge.

In a nutshell as it stands I have an interpolated height for an individual fragment, and 4 or 5 textures I want to blend based on the height of the fragment, and each of the textures own respective base heights on the landscape…

What makes me most uncomfortable with my code is the fact that I am doing a linear sequence of blends, where I would really like to be able to parallelize in some way the combination of multiple textures.

Thinking out loud, I wonder if this would be more efficient…

(tex1 * mod1) + (tex2 * mod2) + (tex3 * mod3) + (tex4 * mod4) / 4.0;

Or something along those lines.

_NK47 · October 20, 2008, 1:28am

you can reduce the texture count but still get the data by putting some pixels into f.e alpha channels (if not used) and thus taking 3 alpha channels from 3 textures you get another RGB tripple for the 4th texture. just an idea.
otherwise just try to avoid same calculations in your shader. cache for example one value and reuse it in next calculations.
btw. mix is a lerp and should not take long (but that depends).

scratt · October 20, 2008, 2:24am

Thanks for that… That is actually of use, but not in this case I suspect as each texture has a different level of blend, so packed ones would need to be split off and then blended in a separate operation anyway…

At the moment I am working on refining the logic / variables as you suggest, so I’ll see where that gets me…

Any other takers?

_NK47 · October 20, 2008, 2:34am

another idea is avoiding calculations at all and putting precalculated values into a texture. you can use that texture as a lookup table addressed with xy coordinates. useful for very extensive stuff but a GPU can handle most of the thing fast enough anyways.
btw. what gfx card you have?

Y-tension · October 20, 2008, 7:03am

Similar to what _NK47 said,you may use a 1D texture as a lookup table based on the height value. This texture contains contribution factors for up to 4 textures. Then use a linear combination of your 4 textures based on that.
Using a 2D texture as _NK47 said has the advantage of allowing you to edit your texture pallette and assign different contribution factors based not on height but on position.

scratt · October 21, 2008, 1:14am

Thanks for the suggestions…

My GPU at the moment is an ATI x1600. So I have some wriggle room as this is a fairly long term project and I’ll be upgrading to new MBP soon. But it would be nice to rely on the speed boost I am going to get from a new GPU as little as possible, which is one of the reasons I am developing a lot of core stuff on an older machine for now…

The texture lookup idea is a good one. I can pre-calculate shading weights on the CPU as I put the height map together. That will remove some maths and interpolation from the shader. So thanks for that. My only problem now is a lack of texture units on the x1600, but that’s a problem I can work around for now, and will not have on the target GPUs.

My CPU is under very little load right now, so I have plenty of wriggle room there, and will look at using SSE to accelerate that also.

Can I assume that the bottom line is that I am not going to get any faster blending than using multiple mix() commands?

_NK47 · October 21, 2008, 1:36am

scratt:
“Can I assume that the bottom line is that I am not going to get any faster blending than using multiple mix() commands?”

actually yes. other then those techniques described above and some others im not familiar with.

“… and will look at using SSE to accelerate that also.”

i wouldnt bother to implement SSE directly in assembly. compilers can do it for you. just make sure data is 128bit aligned if i recall correctly. (more reading recommended)

Ilian_Dinev · October 21, 2008, 1:51am


// fetch colors
vec4 c0 = texture2D(t0, txCoord);
vec4 c1 = texture2D(t1, txCoord);
vec4 c2 = texture2D(t2, txCoord);
vec4 c3 = vec4(c0.a,c1.a,c2.a,1.0);

// fetch 4 blending factors
vec4 bb = texture2D(t3_blend, txCoord);

vec4 result = c0*bb.x+c1*bb.y+c2*bb.z+c3*bb.w; // you can divide this by 4.0, if you like.

You precompute (not at runtime) the t3_blend texture according to any formula.

A dream-world is this one with shaders, not texture-combiners.

scratt · October 21, 2008, 2:37am

Oh, totally. I perhaps used the wrong phrasing.
I already have several pre-rolled c++ based functions for this.
I am more like a mechanic with that stuff, rather than an engineer. i.e. I plug some modules in that I know work.

scratt · October 21, 2008, 2:40am

Ilian_Dinev:


// fetch colors
vec4 c0 = texture2D(t0, txCoord);
vec4 c1 = texture2D(t1, txCoord);
vec4 c2 = texture2D(t2, txCoord);
vec4 c3 = vec4(c0.a,c1.a,c2.a,1.0);

// fetch 4 blending factors
vec4 bb = texture2D(t3_blend, txCoord);

vec4 result = c0*bb.x+c1*bb.y+c2*bb.z+c3*bb.w; // you can divide this by 4.0, if you like.

You precompute (not at runtime) the t3_blend texture according to any formula.

A dream-world is this one with shaders, not texture-combiners.

Ilian,
That’s a really sweet solution I can work with. Thank you.

bertgp · October 21, 2008, 6:42am

You mentioned that your target hardware is not what you currently work with. I would advise against that however, since the optimal solution on your current GPU has a very good chance of being sub-optimal for your target platform.

If at any pixel you only have at a maximum 2 textures which you want to blend, newer hardware can gain much performance by using if statements to sample only the 2 textures which contribute to the final result. That way, you only get 2 samples + 1 mix operations (+ 1 or 2 if).

Also, you might get better performance evaluating a function in the pixel shader instead of sampling a texture for the weights, depending is your whole shader is texture fetch bound or ALU bound.

The bottom-line though is that you should really benchmark on your target platform only since any performance metric on other hardware is mostly meaningless. Your bottleneck can move back and forth between the CPU and GPU and even between different parts of the GPU when you switch hardware.

bertgp · October 21, 2008, 7:20am

You can also look into texture arrays (GL_EXT_texture_array) which implement an array of 2D textures. They use one more scalar float component to choose between the textures in the texture array.

Nvidia has a sample here

Texture arrays are also contained in GL_EXT_gpu_shader4 which is apparently available on the newest ATI drivers.

scratt · October 21, 2008, 8:50pm

bertgp,

All fair comments, however, my target hardware was not available in consumer form until a few days ago! Neither is the target API, actually! Waiting for that and then starting the project was not really practical.

As it stands I test most of my stuff on everything from old ATI9600s and up… culling features based on API sanity checks. For example on the x1600 I am doing pseudo-instancing right now, designed specifically to be a ‘flip-the-switch’ modification away from employing full instancing on shader 4 cards.

I don’t see the bottleneck moving to the CPU at all in this project, not for this aspect of it anyway, and that’s as it should be because the CPU has a lot of work coming it’s way later in the project.

In any case part of the joy of OpenGL is dealing with ATI / nVidea driver inconsistencies and I am quite used to wrapping API sanity checks around OpenGL, and loading different sets of shaders and classes to deal with that.

My design ethic is to set a ‘sub-base’ GPU (currently the x1600), and aim for the next gen GPU as the ‘base’, and hope to add some extra eye-candy for the gen replacing that. That’s just my personal choice, and may or may not be your cup of tea. As it stands, I expect a 2.5 fold speed increase from my current GPU when I move the project to an 8600, and a small bump with the newer GPUs just hitting stores in consumer form now.

I am already using 3D textures as a form of array. I have not read the shader4 spec yet on that, but from your description it sounds pretty similar.

bertgp · October 22, 2008, 6:18am

Well I guess you don’t have much of a choice then! I was advocating using your target hardware since my colleague recently lost almost a week benching our app on older hardware.

Texture arrays have two major advantages over 3d textures. With 3d textures, mipmaps shrink over all dimensions, including the 3rd dimension.

Let’s say you have a 3d texture of size 512x512x4 to emulate a texture array. The level 1 mipmap will be 256x256x2 and you only have 2 slices at that point, but you need 4 textures. You could compensate for that by doubling the 3rd dimension for the level 0 texture but memory would be wasted. Texture arrays let you have all the mipmaps levels you want while keeping constant the number of slices.

Also, filtering is broken with a 3d texture unless you use nearest mode. With the 3d texture, filtering is also done across the 3rd dimension (i.e. slice boundaries). With the texture array, the filtering is exactly the same as for a 2d texture.

scratt · October 22, 2008, 11:03pm

If I only lose a week when I switch to the target GPU I’ll be happy.

I will definitely be looking at texture arrays during that time.

scratt · October 28, 2008, 12:22am

FWIW if anyone else works through a similar problem…

After a lot of trial and error (and benchmarking) I have found that for my particular application (which is real time procedural planet rendering on the ATI x1600) a 3D Texture pack (or two) for textures, and mipmapping those packs is still by far the best solution overall.

As long as you mipmap the 3D Texture, and use some pre-calculated tabled per fragment / per vertex mix values, texel lookup overall is a lot faster than trying to do the blending and logic in a shader.

I am sure you could probably emulate the entire functionality of a 3D texture in shaders with enough work, but if it ain’t broke…

system · October 19, 2021, 7:34pm

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.