Cg To NV_fragment_program performance

I have written a cg script as follows to perform a color-lookup for a 3D texture and compile it with the profile fp30.
The resultant fp30 assembly code was as it is in the program.

CG Script as follows …

half4 main(
uniform half AlphaModulation,
half3 TexCoord0 : TEXCOORD0,
uniform sampler3D Volume3D,
uniform sampler1D ColorTable) : COLOR
half4 outcolor = tex3D(Volume3D,TexCoord0);
outcolor = tex1D(ColorTable,outcolor.x);
outcolor.z *= AlphaModulation;
return outcolor;
Graphics card : GeForceFX Go5650
AlphaModulation is used to modulate the final alpha value which is obtained from ColorTable.
When outcolor.z *= AlphaModulation is executed the performance is 5.0 fps but when it is omitted, the performance shot to 11.9 fps.
Is this normal? I can achieve 11.9 fps inclusive of alpha modulation using texture shaders and register combiners.


Why don’t you take fixed? it is faster and will have the same precision as half

I have tried with “fixed” as well.
It is faster like 6.7fps but still not on par with 11.9 fps I expect.

My question is why a multiplication operation can cause such drop in performance on GeForceFX.


Using the output register as a source operand may be a problem for the Cg compiler (it’s not even allowed in ARB_fragment_program).

Can you try using a temporary?
Or better yet, post the compiled code?

Here is the updated cg script and the generated fp30 code. I am still getting the same performance.

=== CG script =======
half4 main(
uniform fixed AlphaModulation,
half3 TexCoord0 : TEXCOORD0,
uniform sampler3D Volume3D,
uniform sampler1D ColorTable) : COLOR
fixed4 outcolor = tex3D(Volume3D,TexCoord0);
outcolor = tex1D(ColorTable,outcolor.x);
fixed alpha = outcolor.w;
outcolor.w = alpha * AlphaModulation;
return outcolor;

== FP30 code ===


NV_fragment_program generated by NVIDIA Cg compiler

cgc version 1.1.0003, build date Jul 7 2003 11:55:19

command line args: -profile fp30

#vendor NVIDIA Corporation
#version 1.0.02
#profile fp30
#program main
#semantic main.AlphaModulation
#semantic main.Volume3D
#semantic main.ColorTable
#var fixed AlphaModulation : : : 0 : 1
#var half3 TexCoord0 : $vin.TEXCOORD0 : TEXCOORD0 : 1 : 1
#var sampler3D Volume3D : : texunit 0 : 2 : 1
#var sampler1D ColorTable : : texunit 1 : 3 : 1
#var half4 main : $vout.COLOR : COLOR : -1 : 1
DECLARE AlphaModulation;
TEX R0.x, f[TEX0].xyzx, TEX0, 3D;
TEX R0, R0.x, TEX1, 1D;
MOVR, R0.xyzx;
MULX H1.x, R0.w, AlphaModulation.x;
MOVX H0.w, H1.x;

6 instructions, 1 R-regs, 2 H-regs.

End of program

The program is a bit funny. Try this:

DECLARE AlphaModulation;
TEX H0, f[TEX0].xyzx, TEX0, 3D;
TEX H0, H0.x, TEX1, 1D;
MULX H0.w, H0.w, AlphaModulation.x

I think it does the same…

Thanks Zengar,
It works at 11.9fps.
Is there something I m not specifying right to cg to get an output performance like what you did? Or do I have to learn hardcore fp30 assembly coding to get the best performance?
In what ways were the cg output fp30 code weird?


Cg compiler does a great job on optimising. But sometimes it also does funny things. In your case, for example it used one temporary more as it would need. BTW, what version of cg do you have? I would advice you to post-optimise programs after compilation. Cg is good but it still has bugs… I’m afraid of glslang

Something that I’ve noticed with Cg, is that it REALLY likes to put in useless instructions.
I keep getting junk instructions like:

PARAM c0={ 1.0, 0.0, 0.0, 0.0 };
MUL r0.x, r0.x, c0.x;

When doing complex calculations…

And it really likes to add tremendous amounts of useless arbitrary swizzles.
Look here .

I wouldn’t recommend anyone to use it. It’s just horrible.

I m using compiler Release 1.1. Isn’t this the latest?
I guess I better learn the assembly way then. I need to implement shading etc. Guess cg does not fit the bill for now.
I thought cg would output the best code possible…

Thanks for all the help especially Zengar