I’m battling with pixel shader for 4+ lights now.
Problem I currently have is performance of it…
So far I’m doing attenuation coef calculation, light distance calculation, transform to tbn in fragment shader, as I’m runing out of varyings otherwise.
Shader is pretty costly as it’s based on relief mapping + multi point lighting/attenuation.
What I would like is to implement is some sort of conditional that is responsible for lighting calculation.
What I was trying to do so far is:
calc diff, calc spec
But that doesnt seem to work well, overhead seems to be too huge to give any speedup.
What can I do to get frames up???
Multiple drawbuffers and conditionals based on one of them?
Use vertex program to get number of active lights, then pass array of them to fragment and iterate through it (interpolated lenght != lenght calculated from interpolated positions in fp)?
Often there is only one light or two lights covering the region…
The platform needs real branching instruction support or it’s going to unroll and multiply to zeroes doing the math anyway, there’s also possibly negative effects of branching on pipelining/parallel execution. You may be better off stitching a program together based on light count and recompiling but that only gets you fixed light switches not attenuation based optimization that I see you’re attempting.
Moving the thread to shaders where you might get a better response.
I had a similar problem and it seems that the current compiler always uses dynamic branching for the loop which is very expensive. A REP instruction in an ARB_fp shader, which the compiler should ideally generate, shows almost no speed decrease.
The if statement seems only to use dynamic branching if there are many commands. With Cg you can set a threshold. You can force a dynamic branch in GLSL by replacing the if with a while loop
I examined output of GLSL program assembly and it seems that it’s compiled under fp40 profile
# cgc version 1.3.0001, build date Mar 17 2005 15:50:22
# command line args:
#vendor NVIDIA Corporation
so loops shouldn’t be unrolled, what was instruction for conditional looping so I can check…
BTW, how can I get GLSL to use NRM instruction???
NRM works only with the fp16 format. You’ve to use the half3 type.
I know, so if both input and output is half3, then compiler uses NRM?
I don’t know if it depends on the output format, but I checked the assemly output and there was a NRM instruction for the normalize() function.
The compiler used the BRK command for dynamic branching out of the for loop and there was also a maximum interation count of 256 introduced.
NRM normalizes BOTH FP16 and FP32.
NRMH is what you’re looking for to normalize your SHORT TEMPs.
This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.