Computing the tangent space in the fragment shader

Several threads here and on beyond3d forums inspired me to do some tests on data compression. I implemented a simple shader using the shader designer (superb tool!) that show how to use bump-mapping without that annoying tangent attribute per vertex :slight_smile: The tangent space is calculated per-fragment and is used to transform the bump-map normal to the camera space. Below is the fragment shader (the vertex shader just transforms light, position and normal to camera space, so I left it out)

varying vec2 texCoord; 
varying vec3 g_normal;
varying vec3 g_light;
varying vec3 g_pos;

void main()
	vec3 color  = texture2D(ColorMap, texCoord).rgb;
	vec3 normal = texture2D(BumpMap, texCoord).rgb*2.0 - 1.0;
	// compute tangent T and bitangent B
	vec3 Q1 = dFdx(g_pos);
	vec3 Q2 = dFdy(g_pos);
	vec2 st1 = dFdx(texCoord);
	vec2 st2 = dFdy(texCoord);
	vec3 T = normalize(Q1*st2.t - Q2*st1.t);
	vec3 B = normalize(-Q1*st2.s + Q2*st1.s);
	// the transpose of texture-to-eye space matrix
	mat3 TBN = mat3(T, B, g_normal);
	// transform the normal to eye space 
	normal = normal*TBN;
	// diffuse term
	float diffuse = dot(g_light, normal);
	// and the color...
	gl_FragColor.rgb = diffuse*color; 

As said, I only tried it out with the Shader Designer, and there is no noticeable difference to standard approach (using the precomputed vertex attributes). The question is, of course, the performance. Unfortunately, I have no “real” bump-mapping code at hand to benchmark it have , so I thought, maybe someone here who already does camera-space bump-mapping can give it a try :slight_smile:

Just a side note: I did some tests with YCoCg color space and S3tc compression, being able to store a RGB+height texture at 4:1 compression rate, without any visible quality loss. The normal can be generated from height (this is pretty cheap too, only three texture fetches and some basic math). This is 1024*1024 color+bump map in 1Mb! I am not sure if it is still too early for per-pixel bump-map computation, but YCoCg color decompression is almost free.

This is very interesting code. I’m guessing that the tension, performance wise, is going to be between not passing two attributes (which means smaller vertex sizes and less data transfer) and the performance of the dFdx/y functions (and the added opcode costs).

Obviously, as hardware progresses, smaller vertex data is better, since bandwidth isn’t improving as fast as processing speed. How useful this is depends on how vertex bottlenecked you are, and how close to being fragment program bottlenecked.

this is pretty cheap too, only three texture fetches and some basic math
Three texture fetches isn’t an “only”, though at least in this case, they are fairly localized. This is a performance vs memory tradeoff.

So we’re trading 2 vertex attributes for ~18 instructions in the fragment shader? I don’t expect this to be faster anytime soon, if ever.

You are both right, of course, I don’t expect this code to be any faster then the classical approach, nor did I intend to. It is just that modern cards have plenty of horsepower, why not utilize it? I am pretty sure that G80 and R600 do have resources to do it in real time. But of course, in the end this is trading performance for quality. But to be honest, I would prefer my game running at 50 Fps and have nicer visuals (larger textures, more detailed meshes with the same memory usage) then at 80 Fps without them…

Maybe you could replace Doom3’s shaders, so that they use your approach. You can’t prevent Doom from uploading the tangent-normals, so you don’t get the memory-savings, but you could use it as a test-case how much performance your shader costs, in a real world game-application.

Or does your shader need some information, Doom doesn’t provide? As i understand the shader, it should be possible.

In the end the question is, whether there is enough gfx memory to store all that is currently seen and how fast the cards can sample from the vertex-arrays (which should be unbeatable fast). So as long as all data, that you need to render a frame, can reside in the GPUs memory, you gain nothing.

I would rather like to see compressed vertex-formats, to save space. AFAIK D3D9 already supports to “compress” normals. In my opinion these are issues to be solved by IHVs, not by developers. Shaders are more and more written by people, who don’t have a degree in engineering (“artists”). If we really want to have great visuals, a lot of stuff needs to be done behind the scenes.


There is an article about tangent calcution in Vertex Programs in ShaderX5 (or 4?). Maybe you want to try this instead. But it seems it is the same method in a vertex program (haven’t read the complete article, just skimmed through it…)

I’m all for this sort of thing. I’m really looking forward to the days of worlds made from (mostly) procedural geometry and textures.

Originally posted by lodder:
There is an article about tangent calcution in Vertex Programs in ShaderX5 (or 4?). Maybe you want to try this instead. But it seems it is the same method in a vertex program (haven’t read the complete article, just skimmed through it…)
Well, I saw someone mention this at beyond3d forums, that is why I got the idea in the first place… But I don’t have that book nor the money to buy it :-/ Vertex shader, you say… I can’t imagine how he does it without having access to neighbor vertices. But this should be possible in the geometry shader :slight_smile: Can’t wait for that 8600 to come out…

Originally posted by Leghorn:
I’m really looking forward to the days of worlds made from (mostly) procedural geometry and textures.
Well, I could see it make sense to use this technique for procedural geometry. But with geometry shaders I think it would normally be a better approach to compute it in the GS instead of for each pixel.

When you have procedural geometry, you can calculate the true normal/tangent/binormal per fragment instead of approximating it with linear interpolation.

Of course that’s rather expensive, so I guess we’ll have to wait a few years before that becomes possible :wink:

Well, this technique relies on interpolated position and normal too, so it’s not like it’ll give you a 100% correct tangent space on a curved surface either.

I didn’t mean the method proposed in the first post.

What I meant is that when we have procedural geometry, we usually have a closed formula for the vertex position, so we can also get a closed formula for normals, tangents and binormals.

This formula obviously has to be evaluated in the vertex shader and interpolated linearly for position. But anything else can be evaluated in the fragment shader, and this gives near-perfect results when the parameter space is chosen carefully (that is, as linear as possible).

My point is that with procedural geometry it doesn’t make sense to approximate tangents and binormals from interpolated normals when we can just calculate them directly.

we usually have a closed formula for the vertex position, so we can also get a closed formula for normals, tangents and binormals.
But the binormals and tangents are relative to the texture. So they need to be computed with regard to the computed texture coordinates, not the positions.

Ok, let me reformulate:

We have a function for position calculating position from some kind of parameters space. That’s what procedural geometry is all about. And we can choose some arbitrary function from parameter space to texture space (after all, we are the ones who produce the texture).

So we can find a formula f(u,v) that calculates position from texture coordinates.

The tangent is then df(u,v)/du, the binormal is df(u,v)/dv and the normal is the cross product of both.

It’s just simple calculus :wink:

Remove the bitangent calculation and use a crossproduct instead, that will save some instructions. And some code to othogonalize the tangent to the normal, wil help to improve the quality

I’m hoping someone can help me figure out how to use this to convert texture-space offsets to world-space.

I’m rendering fur using the shells method (blending transparent shell textures with hair cross-sections splatted on each shell), and since the hairs are wavy and not uniform, I need to compute the normal of each hair per-pixel rather than interpolate from per-vertex. The normal depends on the du,dv offsets of the hair from the previous shell (precomputed and stored in offset textures during fur texture generation), which is in texture/tangent space, and on the separation between shells, which in my case is not constant because I implement fur dynamics per vertex (wind, gravity, stroking the fur). So I have to convert the du,dv offsets to world-space to add them to the world-space shell offset, and normalize to give me the actual hair normal. Using the tangent basis provided by the skinning library I’m using is impossible since it’s normalized and orthogonalized, so scaling and shear information is lost, making it useless for this case.

Looking at the code from the first post here, the normalization removes scaling information. But would it be correct for me to transform du,dv by simply multiplying by a TBN formed by the un-normalized T,B, and N? And how do I get the N scale correct, just cross the un-normalized T and B?

I’m really stuck on this so help would be greatly appreciated.

From what I’ve read on the subject (which isn’t much), one method is to use normal maps in the layers for this sort of variation, with translation to simulate combing-like effects (though I think that’s mostly for whiskers).

On the other hand “wavy” suggests to me something more like hair than fur. Nvidia’s Nalu demo (see GPU Gems 2) describes a more general method for hair that’s perhaps a bit more friendly to lighting/physics. I’d personally opt for something along these lines going forward, but I’m just a spectator in this area so far…

Normal maps encode only relative depth of a displacement, since it’s in texture space. When you use a normal map, depending on the texture distortion due to the UV-mapping, you get a varying effect.

In this case this is unacceptable since I need relatively precise normals to light hair pixels correctly, not ones that vary by large factors depending on how stretched the texture is etc. Moreover, the distance from one shell to the next is dynamic in my case, so I can’t just estimate a constant scaling value.

I have around a million hairs, so I cannot realistically render them with the Nalu method (which is specialized for long hair anyway, not short fur).

My best guess, but I’m not sure is exactly correct, is to take the above code and use UNnormalized T and B, then
mat3 const TBN = mat3(T, B, cross(T, B));
vec2 const offset = texture2D(dudvMap, gl_TexCoord[0].st);
vec3 const hairN = normalize(TBN * vec3(offset, 0.0) + shellOffset);
I think that should take care of the texture scale/shear. Not sure about the sign of the cross product…

How do I get the right handedness of the computed normal? In general one can’t assume that UV isn’t mirrored on some triangles, and then the cross product would give a normal pointing inwards…

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.