When do shaders relink?

I am adding more general purpose shader support and I was wondering how to handle uniforms? I have a ShaderProgram with setUniform methods, do I need to set all uniforms every time I use it in case the shader has been relinked and the values reset? Or am missing something? Or does setting the values every time cost so little that it’s not worth bothering about?

I thought you only had to set the uniforms when:

  • the values of the uniform change
  • after relinking the program

correct, once a uniform is set it’s value is kept. if a shader has uniforms which never will change they can be set after successful linking once. if shader gets linked again the uniform location can/will change so getting the uniform location is needed, thats the reason why glGetUniformLoc is called after linking. avoid setting uniform values every time if no change is made since this is emmidiate mode sortof like glVertex3f/glNormal3f/glUniform3f etc.

I guest uniform buffer gives a great answer to that!

Neither. As _NK47 said, after you glLinkProgram, you set them once, and then there’s no reason for you to ever set any of those uniforms again unless the values changes.

Put some lazy state set code in your app so that GL uniforms never get set again unless the value changes.

For some reason I thought the driver was free to relink the program under certain conditions? e.g. a uniform changing. Or did I dream it?

For some reason I thought the driver was free to relink the program under certain conditions? e.g. a uniform changing. Or did I dream it?

The driver is free to do that sort of stuff, but it must behave to you as though it hadn’t. That is, it cannot execute the internal equivalent of glLinkProgram. It can modify the actual program bytecode itself, but it cannot cause any of the changes that glLinkProgram can (resetting uniforms, rebinding locations, etc).

Exactly. And in fact, to the original point, real drivers sometimes do perform this kind of “late optimization”, as late as inside the first draw call to use a shader with a specific uniform setting permutation.

For us as graphics engineers trying to keep frame rates regular, that is of course way too late. Which is why we end up pre-rendering with shaders (just like textures, for the same reason) before we really need them, to make the driver get off its butt and perform the actions we already told it too, and that yes we’re serious about wanting them on the card now. And this is of course one of the obstacles to real-time shader compilation and loading.

Perhaps we need a ARB_really_finish extension, which augments glFinish(), with a glFinish__I_Mean_It__Dont_Make_Me_Come_in_There(). Or at least pre-compiled shaders, which should get rid of much of this deferred run-time parse tree optimization.

Perhaps we need a ARB_really_finish extension

That won’t help at all. OpenGL doesn’t specify performance. The specification of “really link it this time” will just be what GL normally does.

Or at least pre-compiled shaders, which should get rid of much of this deferred run-time parse tree optimization.

No, it won’t. Pre-compiled shaders don’t guarantee that the driver won’t optimize the compiled form based on uniforms or whatever. Indeed, the pre-compiled form could simply be a parse-tree, as any pre-compile extension will almost certainly not define exactly what hardware vendors have to put in them.

I’m sorry, but I think that such an “extension” would help greatly. The “theoretical” specification is all nice and dandy, but in the end what really matters is the actual behavior you get in the real world.

If I understand correctly, what Dark Photon (and I for that matter) would like is a way to tell the driver: “don’t try to second guess my actions, just do what I ask when I ask for it.” We spend a lot of time getting to know the quirks of drivers in order to get around some “optimizations” like only uploading textures for real on first draw. A flag that disables all this logic and switches to “advanced API usage mode” would be a godsend for some applications.

As for the actual implementation of this mode, I don’t care if it’s a driver option, a vendor extension, etc. I just want it to work :slight_smile:

If I understand correctly, what Dark Photon (and I for that matter) would like is a way to tell the driver: “don’t try to second guess my actions, just do what I ask when I ask for it.”

Even if you had such an extension or core feature, you cannot force drivers to abide by it. If they want to re-optimize their shaders whenever, they will do it. There is no specification wording that would allow you to point to something and call it non-conformant behavior. After all, you don’t know why the application slowed down on rendering that one time; all you know is that it was slower than subsequent renderings.

That’s most of the problem right there!

The thing is that the customer doesn’t care about that detail though. He only wants a glitch free rendering… So to make it happen you have to take precious time dissecting the command stream sent to OpenGL to reproduce what the exact condition is that triggered a glitch. After that comes the tricky part: you have to work around the driver behavior somehow.

Here’s an example of “workaround” : draw a fake geometry every N frames with a shader to make sure that the driver doesn’t “flush” its GPU cached compiled version. Believe it or not, we had a test app that demonstrated that after a certain number of frames, with a certain overall number of shaders, the driver would have to recompile a shader if it wasn’t used for a number of frames.

I agree that many OpenGL drivers over-complicate their resource management, and this can cause problems like the one you experienced.

The simple fact is that OpenGL cannot dictate to drivers that they not do this. Even if you added such an API, drivers could simply ignore it.

Absolutely right. I made the unjustified assumption that the form of precompiled shaders would be some kind of already-hyper-optimized cross-vendor assembly (post-parse-tree pruning and sharing). Similar to what you get from the NVidia Cg compiler:

cgc -oglsl -profile gpu_vp vert.glsl

except of course this yields NVidia-specific NV_vertex_program4 assembly, not cross-vendor assembly. We perhaps need something like ARB_{vertex,fragment}_program refreshed for modern GPUs.

Thanks for the correction.

We perhaps need something like ARB_{vertex,fragment}_program refreshed for modern GPUs.

The ARB has already shown that they are interested only in improving GLSL, not defining a new shading language.

Also, it would be no guarantee of help. The driver can still re-compile these shaders if it feels the need to.

Regardless, whether cross-vendor assembly or not, it is likely to be a post-optimized assembly form of some type, in which case drivers wasting the time to “reoptimize” it is very unlikely, as it would defeat the whole purpose of having precompiled shaders.

Regardless, whether cross-vendor assembly or not, it is likely to be a post-optimized assembly form of some type, in which case drivers wasting the time to “reoptimize” it is very unlikely, as it would defeat the whole purpose of having precompiled shaders.

The writers of GL implementations want one thing: for programs on their hardware to run fast. They will therefore do whatever it takes, even if it requires later re-optimization, to get to this.

Also, why would it be “post-optimized”? An IHV who assumes that any assembly they get has been run through a compiler already would be foolish. They will run their own, separate, independent optimizers just to make sure. Dead-code elimination, pin-hole optimization, and so forth.

And what makes you think that “post-optimized” assembly would even be possible? It could only be possible if the assembly language were reasonably close to the final hardware assembly. By this point, no assembly language is close, and it will be even farther when things like Larabee start showing up.

NV_{gpu,geometry,vertex,fragment}_program4 is apparently close enough for NVidia’s taste (used by Cg), which discounts your last point.

And while it was probably obvious, my point wasn’t that there wasn’t any work for the driver to do to use the assembly source (assembly maybe?), just try timing the load/compile/link/first-render of a non-trivial GLSL ubershader program with the load/first-render of the same program pre-compiled into assembly. From what I gather, you would be shocked.

In any case, you and I could pick nits all day to no avail as ultimately its up to the vendors. Will be interesting to see what they come up with for pre-compiled shaders.

just try timing the load/compile/link/first-render of a non-trivial GLSL ubershader program with the load/first-render of the same program pre-compiled into assembly. From what I gather, you would be shocked.

You’re missing the point. Of course assembly compiles faster than GLSL. That is expected and not really the problem (or at least it isn’t a surprise). The problem is the “first-render” issue.

Who’s fault is it that the “first-render” of a large GLSL program takes longer than the assembly version? It is the fault of the driver. It has nothing to do with assembly vs. GLSL.

Also, have you done this test on ATI hardware? Do ATI drivers have the same anti-GLSL bias as NVIDIA ones? Somehow, I doubt it.

Maybe that’s because ATI isn’t trying to promote an assembly language the way NVIDIA does. Maybe the problem is that NVIDIA doesn’t care about making GLSL run fast. Maybe the problem isn’t the complexity of compiling and linking GLSL; maybe NVIDIA just want to promote NV_assembly, so they make GLSL slower by comparison.

Will be interesting to see what they come up with for pre-compiled shaders.

It will not be by exposing an assembly language. The ARB has shown that they have no interest in that.