DOOM3 texture unit use.

The speed increase is very interesting, but I wonder how much artistry is bypassed without the lookups; you can do some pretty neat stuff with some fancy ramps. They may have used the lookups for more than speed; indeed, they may have been willing to take one on the chin for the atristic freedom it allows.

Nice boost though. I’d be interested in an explanation for the discrepancy. I get the strange feeling that this maybe another case of IHV ankle-bitting.
:slight_smile:

By the way, when you say a 40% increase, do you mean that brings it on par with the nVidia path?

All things being equal, I’d rather have the artistic freedom of textures, as opposed to relatively inflexible math (with cost in mind here). You can stuff some massively complex math into a texture. Of course, it all depends on what you’re doing; normalizations don’t need to be fancy.

Are you guys sure there aren’t multiple versions of the same shaders in there?

It was my understanding that NV recommends using look up texture and normalization cubemaps instead of math, while ATI GPUs prefer the other way around.

Oh well, I’m sure there is a 100MB service pack coming soon :slight_smile:

Good job discovering that Humus.

I tried the mod and since I have an FX 5600 card what I got was expected. It seemed like as I walked around the fps was a little lower. This is kind of a no-duh thing as everyone knows it’s better to use texture lookups rather than math in FX’s. I also got those little white dot artifacts in certain areas which is probalby due to the compression of spec maps in medium quality mode. I’m not 100% on why using pow instead shows the compression artifacts like it does, maybe due to the higher precision of the pow instruction? That’s my guess anyway.

When I get my 6800 GT I’ll try it again and see what happens. I don’t much of an increase in speed but you never know, the NV40 fragment processor is pretty good.

Hmm, I thought something else was looking weird. Just as I expected, each surface has its own specular exp map. No wonder certain surfaces’ specular part looked odd. Everything after the mod is using the same specular exp which is not right. :slight_smile: But like said on that other board, this can be fixed by JC passing the exp into the shader in a variable.

It seems strangely absurd to me, the notion that after 5 years of development, they never thought to pick up a phone and ask ATi how their hardware works. I would assume that Id has a red-phone for every IHV in existence. I’m sure they could have the entire ATi driver team over for lunch, with charts, slides, and belly dancers.

Id is in a unique position to shape the industry. I suspect that is what they are doing; they’ve done it before. I’d be first in line to give them the benefit of the the doubt in this case.

Anyway, to my mind, it’s clearly the case that (dependent) texture reads scale far better than brute force math. This may not be exactly critical in Doom3 (haven’t played with it yet), but it will be. Who was it that said that programming was the art of caching? (always liked that) :slight_smile:

Q, math is the way of the future. Texture reads to implement math functions is powerful but clearly an ugly hack, ultimately the compiler could implement the LUT if it’s really faster, heck that’s how some hardware works anyway. You need a really warped sense of aesthetics to think that shader writers should be loading texture units with ramps and then doing reads & dependent reads instead of calling a simple pow function.

Dunno if id & ATI are on good terms after ATI leaking the E3 DOOM3 demo. Let’s remember that the math has artifacts it may need the LUT for the reasons mentioned.

SirKnight, you sure about the specular exp map? There’s a gloss map but an exponent map? That would be cool, but I dunno it’s there. The fetch is a 2D fetch but from recollection at least one of the shaders calls the other axis in the LUT the divergence factor (or similar) and I don’t recall a dependent read in there for that coord from memory so I could be way off, and any exponent map would at least have to be a dependent texture read to the texcoord of the fetch. Very cool if he did this, it could be pretty inexpensive with for example a 2 component gloss map, I just don’t think that’s what is going on. I’ll need to take another look.

Texture reads to implement math functions is powerful but clearly an ugly hack, ultimately the compiler could implement the LUT if it’s really faster, heck that’s how some hardware works anyway.
It’s beautiful to me. Are you going calculate a BRDF in a shader?

You need a really warped sense of aesthetics to think that shader writers should be loading texture units with ramps and then doing reads & dependent reads instead of calling a simple pow function.
Warped? I call it art. Suppose you want a specular lookup in the form of a rose? Textures give you great artistic freedom. I agree that in the distant future math will prevail, as it does on the CPU today (though lookups are still quite common). And as I stated before, it depends on what you’re doing.

It’s beautiful to me. Are you going calculate a BRDF in a shader?

Sure, if the performance was there.

Warped? I call it art. Suppose you want a specular lookup in the form of a rose? Textures give you great artistic freedom.
That’s reasonable for non-photorealistic rendering, but realistic rendering is based on actual mathematical functions. A look-up table in realistic rendering is just a performance optimization (when it actually helps performance) compared to actual math.

Sure, if the performance was there.
That’s all I meant to suggest. LUTs give us what they’ve always given us: the ability to do today what we would otherwise have to wait for tomorrow to experience. And to me, that’s a beautiful thing.

A look-up table in realistic rendering is just a performance optimization (when it actually helps performance) compared to actual math.
Absolutely. After all, it was the actual math that created the table entries in the first place.

I’m just a freak when it come to LUTs. I’m a LUT freak.

Q there’s a difference between a LUT where it is required and a LUT for everything. It is laborious to implement a pow function as a LUT when it should be a single operator or call in the program. Specious examples don’t make the case for texture LUTs as the optimal path, LUTs do have their legitimate uses. If a LUT is optimal for a basic supported math operator then ultimately the shader compiler should use a LUT under the covers where resources permit, this has the added benefit of hardware abstraction and running well everywhere. When you toss out libmath and hand implement cache resident tables everywhere in your C program code then I’ll believe you’re really a LUT nut. That’s ultimately the case you’re making.

Originally posted by dorbie:
SirKnight, you sure about the specular exp map? There’s a gloss map but an exponent map? That would be cool, but I dunno it’s there. The fetch is a 2D fetch but from recollection at least one of the shaders calls the other axis in the LUT the divergence factor (or similar) and I don’t recall a dependent read in there for that coord from memory so I could be way off, and any exponent map would at least have to be a dependent texture read to the texcoord of the fetch. Very cool if he did this, it could be pretty inexpensive with for example a 2 component gloss map, I just don’t think that’s what is going on. I’ll need to take another look.
Right, there’s no specular exponent map, which would be difficult to make use of in older hardware paths.

Having looked at the nv20 path, the specular power is approximated via register combiner math (it seems to be roughly power 12, but shifted a bit, so it saturates earlier). The specular table in the arb2 path may be attempting to match this quasi-power function.

Re the divergence factor, that’s in the test.vfp shader, which I believe isn’t used by any of the rendering paths (it would be arb2 or possibly exp if any). The factor in question comes from the normal map’s alpha component; my guess is that this shader is an experiment to anti-alias specular highlights (which JC also talks about in an old .plan).

Dorbie, I think we’re on the same page. I’m not saying anything that hasn’t been said 10 million times already. Perhaps I said it badly. Sorry for the OT digression.

Q, it’s just a fun discussion, np.

sk, Ahh, the divergence makes sense then. You measure the rate of local vector change in the normal map, store it in alpha then adjust a texture LUT (I assume that adjustment would be a convolution filter for high exponents so an exponent LUT would look crisp & bright on one end and blurred diffuse & darker on the other). I see some funny related things in one shader I see is an attempt to move localnormal.a to localnormal.x (which seems like a swizzle & nothing else) but in the another shader w is extracted and almost used :slight_smile:
How’s this for some legacy code:

MOV R1.y, localNormal.w;
MOV R1.y, 0.2;
MOV R1.w, 1;
TEX R1, R1, texture[6], 2D;

Must’ve slipped through the cracks. I suppose that 0.2 is a fixed constant convolution (& possible clamp) reducing sparkle on high exponents & might explain the quality diff with a straight math exponent vs LUT.

Years ago I considered a similar alpha term in a different context as a MIP lod bias on bump mapped environment maps. If you use the normal divergance as a MIP LOD bias term for the LOD selection to reduce aliasing of the environment map in situations where you can’t predict the post dependent read derivatives of s & t in hardware (maybe 5+ years ago now). Ideally today you’d want to do something smarter considering the potential for anisotropic probes.

IIRC, the alpha and red components of localNormal are swapped because the alpha channel provides more bits when compressed using DXT.

Originally posted by dorbie:
sk, Ahh, the divergence makes sense then. You measure the rate of local vector change in the normal map, store it in alpha then adjust a texture LUT (I assume that adjustment would be a convolution filter for high exponents so an exponent LUT would look crisp & bright on one end and blurred diffuse & darker on the other).

Right, that’s the general idea.

NVIDIA have a paper on a solution which uses the filtered normal length as a direct measure of variation:
http://developer.nvidia.com/object/mipmapping_normal_maps.html

Also “Algorithms for the Detection and Elimination of Specular Aliasing”:
http://www.cs.yorku.ca/~amana/research/

This talks about clamping the exponent as well.

And as I mentioned before, see JC’s last plan (near the end, which is quite some way) for his early take on this issue:
http://www.webdog.org/plans/1/

Originally posted by dorbie:[b]
I see some funny related things in one shader I see is an attempt to move localnormal.a to localnormal.x (which seems like a swizzle & nothing else) but in the another shader w is extracted and almost used :slight_smile:
How’s this for some legacy code:

MOV R1.y, localNormal.w;
MOV R1.y, 0.2;
MOV R1.w, 1;
TEX R1, R1, texture[6], 2D;

Must’ve slipped through the cracks. I suppose that 0.2 is a fixed constant convolution (& possible clamp) reducing sparkle on high exponents & might explain the quality diff with a straight math exponent vs LUT.[/b]
As CatAtWork said, that a to x move is there for compressed normal maps, which isn’t worth eliminating (at a complexity and maintainability cost) for the uncompressed case.

If that code is from test.vfp again – I don’t recall seeing it anywhere else – I wouldn’t classify it as a legacy code fragment so much as just debug assembler in an experimental shader, which seems to have slipped through as a whole.

Originally posted by dorbie:
Must’ve slipped through the cracks. I suppose that 0.2 is a fixed constant convolution (& possible clamp) reducing sparkle on high exponents & might explain the quality diff with a straight math exponent vs LUT.
Oh I think I see what you’re saying now: the production version (interaction.vfp) could be using a LUT which folds in a fixed ‘convolution’ based on the average divergence? That’s an interesting idea but I think you really want the convolution to vary per-pixel. I’m still inclined to believe that the 0.2 is just there from idle testing.

Originally posted by sk:
Having looked at the nv20 path, the specular power is approximated via register combiner math (it seems to be roughly power 12, but shifted a bit, so it saturates earlier). The specular table in the arb2 path may be attempting to match this quasi-power function.
JC confirms in an interview that the lookup is there to match the specular power approximation of older paths:
http://www.beyond3d.com/interviews/carmack04/index.php?p=2

Edit: Also my description of the power function above seems to be a bit off as I misread the RCs (not the easiest API to follow!). Other people seem to be doing a good job of matching the LUT with a couple of instructions.

It used to vary per texel, look at the code I pasted, “localNormal” is the bumpmap fragment, but he replaced it with a fixed value and kept the legacy texel based MOV in there. That’s why I was saying it slipped through the cracks, it’s erm… less than optimal. Hopefully something in the driver optimizes it out but it’s interesting to notice that it has been changed from a per texel value to a constant ‘convolution’ (or whatever) that ignores the bump alpha, AND that the constant isn’t 0 or 1, but 0.2 hinting strongly at some constant modification of the exponent function (probably that reduces specular aliasing through frequency and/or contrast reduction post-exponent).

Originally posted by dorbie:
It used to vary per texel, look at the code I pasted, “localNormal” is the bumpmap fragment, but he replaced it with a fixed value and kept the legacy texel based MOV in there. That’s why I was saying it slipped through the cracks, it’s erm… less than optimal. Hopefully something in the driver optimizes it out but it’s interesting to notice that it has been changed from a per texel value to a constant ‘convolution’ (or whatever) that ignores the bump alpha, AND that the constant isn’t 0 or 1, but 0.2 hinting strongly at some constant modification of the exponent function (probably that reduces specular aliasing through frequency and/or contrast reduction post-exponent).
Perhaps I wasn’t clear before; optimality isn’t an issue here as firstly this is an unused test shader and it’s the sort of code one writes when in the middle of testing/debugging.

I understand it looks like experimental code and it’s obviously the kind of thing you get with a work in progress. The shader code fragment I posted was intercepted on the way to the graphics card while retail DOOM3 was running with a GeForce 6800. That doesn’t guarantee it was actually used for much rendering but it is still informative.

P.S. I missed one of your earlier posts, thanks for the links.