I am hand-tuning a Cg fragment program
so that the number of instructions
generated becomes fewer. But my situation
is that the framerate decreases (121 fps ->
115 fps) even I got a decrease in number of instructions in the fragment program from
128 -> 108. Any idea why?
FYI, the texture accesses is the same. Only
the math part changes. I group two scalar
calls to atan2 into one single atan2 with
float2 plus a few math changes.
Thanks in advance.
Could you post the Cg and ASM shaders?
Sorry. But appreciate for your help.
After I rewrite some conditional stuff after
the two atan2’s, the problem’s got and the
performance goes up.
Remember that there’s quite a lot of optimization that goes on inside the driver. Reducing the number of instructions won’t necessarily improve performance. The assembly generated by the compiler only bears a slight resemblence to what’s actually executed by the hardware these days.
but sometimes there is a problem like a texture lookup with comparision(shadow) on half precision, which isnt supported in CG(posibly bug) but in assembler its well supported. Thats a funy situation when i must use ASM instead of GLSL(or CG)