Unbound sampler declaration andd GLSL shader code

You got that twisted. Conditional statements which are dynamically uniform and evaluate to false will not trigger execution of the code. Since your statement isn’t dynamically uniform, potentially both branches are executed as GClements pointed out and if then excuted code triggers undefined behavior which isn’t handled graciously by the implementation, you get into trouble.

Here’s why your condition isn’t dynamically uniform:

  • it’s not a constant expression, so the trivial case is out
  • gl_FragColor is not a uniform but a built-in fragment shader input:
in vec4 gl_FragCoord;

The GLSL Spec states:

A fragment-shader expression is dynamically uniform if all fragments evaluating it get the same resulting value.

It should be obvious that this is generally not (if ever)the case for an expression involving gl_FragCoord - even though it will always evaluates to false.

I didn’t get it twisted. My problem came from the samplers not being bound, regardless of whether I was using dynamically evaluted uniforms or something else (I was indeed using a dynamically evaluated ‘isBound’ uniform to prove that the problem indeed came from the unbound samplers). That was my original problem and question, let’s not forget that.

Now, the discussion slipped towards GPU execution with your reply:

and the one of GClements. I am learning something new, as I didn’t know the GPU could execute code in the wrong code path. However I can’t take that for granted because you say it. Can you prove me, can you explain me the reason, the rationale behind which the GPU would execute the wrong code path? Especially since it doesn’t have a branch prediction unit?

[b]How can the GPU execute both branches? How is this possible?

[/b]From http://docs.nvidia.com/cuda/pdf/ptx_isa_3.1.pdf:

"If threads of a warp diverge via a data-dependent conditional branch, the warp serially executes each branch path taken, disabling threads
that are not on that path, and when all paths complete, the threads converge back to the
same execution path"

I understand the code is not executed when it shouldn’t. Fortunately not! imageStores() would better not be executed!

The issue is that OpenGL just defines the GLSL specification, and not the underlying implementation. So it’s quite possible that leaving a sampler unbound will work on some platforms and not others. Even driver updates could potentially change the behaviour of your program if the compiler changes significantly enough (I have had this happen in other cases).

For example, one implementation might recompile the shader to optimize out uniform conditionals. Another might serialize the cases, and a third might run both cases for all pixels/vertices but only commit results for the ones that passed the conditional test. All are valid compiler strategies. You just don’t know the exact details of how the shader is compiled and executed, so it’s possible that bad things might happen if you leave a sampler unbound.

The rationale is that GPUs typically use a “Single Instruction, Multiple Data” (SIMD) architecture (similar to e.g. MMX or SSE, or std::valarray in C++). Each (non-uniform) variable in a shader is actually an array of (typically 32 or 64) such variables. Each operation (addition, multiplication, etc) is performed on entire arrays, element-wise. GLSL models this as multiple shader invocations running in parallel.

Any condition may be true for some elements and false for others. To implement branching, the GPU evaluates both branches but any side-effects (e.g. assignment, imageStore()) are limited to elements for which the condition is true.

[QUOTE=fred_em;1252183]
[b]How can the GPU execute both branches? How is this possible?

[/b]From http://docs.nvidia.com/cuda/pdf/ptx_isa_3.1.pdf:

"If threads of a warp diverge via a data-dependent conditional branch, the warp serially executes each branch path taken, disabling threads
that are not on that path, and when all paths complete, the threads converge back to the
same execution path"

I understand the code is not executed when it shouldn’t. Fortunately not! imageStores() would better not be executed![/QUOTE]
Both branches are executed. “Threads” on the wrong path are “disabled”, i.e. side-effects (such as assignments or imageStore()s) are ignored. However, while stores are side-effects, loads aren’t. In the wrong branch, the result of any load will ultimately be ignored. But if the load triggers an error, that may propagate for the remainder of the shader’s execution.

The problem is that a condition which involves only uniforms shouldn’t be treated as “data-dependent”. However, older (GLSL 3.x) hardware treats all conditions in the same manner, data-dependent or not. Newer (GLSL 4.x) hardware can perform a “real” branch in cases where a condition has the same value for all elements.

How? My isBound uniform value is invariably 0, under all circumstances. No thread is trying to access the wrong branch.

Which ones? What are these threads you are talking about? Again, my isBound uniform value is invariably 0, under all circumstances.

If I have 30 fragments to process in a triangle, the GPU will create 1 warp of 32 threads and block off the last two threads (32-2=30 fragments to process). The last two threads will still see that isBound=0 and as a result, they won’t take the wrong path.

[QUOTE=GClements;1252199], i.e. side-effects (such as assignments or imageStore()s) are ignored. However, while stores are side-effects, loads aren’t. In the wrong branch, the result of any load will ultimately be ignored. But if the load triggers an error, that may propagate for the remainder of the shader’s execution.

The problem is that a condition which involves only uniforms shouldn’t be treated as “data-dependent”. However, older (GLSL 3.x) hardware treats all conditions in the same manner, data-dependent or not. Newer (GLSL 4.x) hardware can perform a “real” branch in cases where a condition has the same value for all elements.[/QUOTE]
I got that part earlier, believe me. But that does not answer the two questions, above.

How? My isBound uniform value is invariably 0, under all circumstances. No thread is trying to access the wrong branch.

Yes, but the hardware simply executes both branches and disables side effects on the not-taken one. As far as I understand it the reason is that there is only a single instruction pointer for a whole warp, so individual threads can not have their own control flow. Now, in your case something could detect that all threads in the warp actually branch the same way and there really is no need to execute the not-taken branch at all, but that is purely an optimization not a correctness thing (at least for the case where both branches are error free).

On a more practical note: Unless a high profile game developer runs into the same problem, I would not hold my breath for vendors to fix it, even if it is a legitimate bug :wink:

Given that all threads always branch the same way, how come the GPU tries to execute both branches?

You know what… don’t feel obliged to answer here :wink: I am on the brink of drawing the conclusion that I am too tired. Also the bold words are not there to annoy anybody, really.

If samplers must be bound at all times I am OK with it. I wouldn’t call it a bug, just a lack of precision in the spec.

If samplers must be bound at all times I am OK with it.

This thread started with a shader freeze which you thought is caused by a in-active path that would access an unbound sampler if called.

I run lots of shaders with un-attached samplers when I know that shader logic will not try to fetch from the sampler. The shaders do not freeze. Also even if I do access the sampler I just get junk colours.
This is on both nVidia and AMD drivers.

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.