I have an application that uses around 500-1000 shader programs.
I tried to compile the shaders at startup instead of when needed.
The problem is that there is a huge performance hit when doing that, around 40% less fps.
I have a 580gtx(windows7-64) and tried it on 285.62 and 290.36 and there were no difference.
The only difference I can see is that when i use the lazy(when needed) pattern I upload the uniforms from the material short after the compiling/link step.
Do you know if there are any usage patterns or guidelines for this type of scenario?
I can only think that the driver has decided to ‘release’ some of the pre-compiled shaders from GPU memory - based on the fact that they weren’t used immediately.
The lazy method has the advantage that GL driver knows they are needed and they are in the optimum memory location.
There is nothing I know of which can infulence driver memory optimisation or shader compiling.
Just a random thought - could the compile time state affect the compilation? You could verify this by resetting all OpenGL state during compilation, both in lazy and up front cases, and see if that would make any difference.
Yes, very good point by tksuoran. A lot of OpenGL states can in fact affect the resulting shader, in fact the reason behind pre-compilation is slower is that the driver has to recompile your shader because of some state differences between the point you compiled the shader and the point you used it.
interesting… but if the affected shaders are recompiled shouldn’t that make the performance to gradually be better, especially if you holding the camera still. I tested and waited several minutes and it was exactly the same performance. It took 30-40 seconds to pre-compile all the shaders, so it would be enough with 3-4 minutes I believe.
agnuep, you mentioned that “A lot of OpenGL states can in fact affect the resulting shader”. Do you have more information of this?
Actually it heavily depends on the GL implementation and the hardware. What first comes into my mind is the many deprecated features that are sometimes solved by driver-baked shaders (AFAICT) like alpha testing, quads as input primitives, point sprites, line stipple and many others. But I think in practice there are a lot of other more general use cases when this might be needed.
I can see that the reason why the performance doesn’t increase over time is because the recompiled version is not reused, though this is just pure speculation. Maybe if we would know about the states you set or the type of rendering you perform.
AFAIK I don’t use any deprecated features, I’m doing my best to stay away from them and it actually helps in the long run. For me the code becomes much cleaner.
The thing that worries me is that performance actually could be dependent on WHEN(and/or general state) you compile. If it would be a minor performance degradation I would accept it, but when fps drops from 45 to 28 it becomes a problem.
I know that topic well, believe me, but Eric Lengyel never said that this is true also for the Evergreen Radeon GPUs as an example. While I’m not 100% sure whether any of the newer Radeon or Intel GPUs support FF alpha testing or not, but considering it was removed I really believe that there are some.