Varying float accuracy

Sumaleth · August 14, 2010, 12:36am

I’m attempting to do this in a vertex shader:

varying highp float v_myvalue;

void main()
{
    v_myvalue = 120.0;
    etc...
}

And this in the corresponding fragment shader:

varying highp float v_myvalue;

void main()
{
    if(v_myvalue == 120.0)
        gl_FragColor = vec4(1.0, 0.0, 0.0, 1.0); // red
    else
        gl_FragColor = vec4(0.0, 0.0, 1.0, 1.0); // blue
}

Unexpectedly, the 3D models end up a mess of red and blue pixels. ie. myvalue == 120.0 doesn’t always test TRUE.

I’m using OpenGL ES 2.0 on a Mac Mini and the iPhone simulator, so I can’t use varying integers or bools.

Ultimately, I’d like to pack the float with a variety of flags, and use those to control the fragment calculations (with/without specular, size of specular, with/without diffuse, etc).

Is there a way to do this, without resorting to >&&< range testing?

Rowan.

arekkusu · August 14, 2010, 6:51am

If you want a variable to be constant in the fragment stage, use a uniform.

However, you are attempting to write an uber shader.

Don’t do that.

Instead of branching per pixel (a dozen branches * a billion fragments = a dozen billion branches) hoist your flow control logic back up to the CPU. Build a dozen fragment shaders with exactly the logic you need. Branch on the CPU, pick the correct shader, then draw with it (a dozen branches + a billion fragments = a dozen branches.)

DmitryM · August 14, 2010, 7:14am

If you want to use real branching, your hardware probably supports integer operations, so you can use regular integer bit flags scheme.

If you want branches that are converted to the straight code, consider doing it yourself. For that, keep a float value for each flag, compute both branches and take the result as “mix(rez0,rez1,flag)”.

Typically, such properties like specular contribution, shininess, diffuse texture are set per material basis, so should be passed as uniforms.

In my engine (link below) there are various GLSL functions of get_diffuse,get_specular,etc that implement the behavior based on the material state. On the material linking stage the root shader is linked together with these small implementation shaders to compose a final program. This way my root shaders don’t know about the material specifics, but just do the functions (e.g. phong lighing), and no branches are involved.

Sumaleth · August 14, 2010, 7:20am

My understanding was that you want to have as few glDraw*() calls as possible, so I’m basically working to single-batch all of my 3D objects; an uber shader as you put it.

It’s difficult to get a feel for what a glDraw*() call costs, compared to using if() statements in the fragment shader. Should I use two glDraws instead of even a single ‘if’?

Thanks for the feedback.

Rowan.

ZbuffeR · August 14, 2010, 7:30am

The important thing to keep in mind, is that a fragment shader is evaluated at every fragment (pixel). So as resolution increase, the cost increases too.
Whereas glDraw cost is fixed.

Sumaleth · August 14, 2010, 8:00am

That’s an interesting idea. I’ll have to think on it a bit.

Yeah, I guess I’m trying to single-batch lots of different material setups, to avoid having a couple of dozen different glDraw calls.

Apple have said that if you have more than 10 glDraws in your iOS project, you have too many. Looks like I just need to be clever with the way I calculate lighting, and that mix() function might have potential to that end.

Rowan.

Dark_Photon · August 14, 2010, 9:24am

This is never a good idea, except for special values such as 0.0, due to obvious issues with floating point round-off. And on top of that with GPUs, which aren’t necessarily IEEE math compliant, you have no guarantees as to how it will even represent the values.

As DmitryM suggests, try to use integer flags or integer “enum values” for branch tests. You’ll be much happier.

There is nothing wrong with ubershaders, and they save a lot of dev time and maintenance. ubershader merely means having shared source code for multiple shading paths.

However, these shading paths (“if” expressions) can either be evaluated at run-time on the GPU (as Sumaleth is doing) OR at compile time. I’d suggest the latter. That is, instead of branching based on the result of VARYING or UNIFORM expressions, branch based on the result of CONSTANT expressions. For instance:


// constants
const int  FOGMODE_PIX_LIN  = 0;
const int  FOGMODE_PIX_EXP  = 1;
const int  FOGMODE_PIX_EXP2 = 2;

// shader permutation
const int  FOG_MODE = FOGMODE_PIX_EXP2;

// shader code
if ( FOG_MODE == FOGMODE_PIX_LIN )
{
  ...
}
else if ( FOG_MODE == FOGMODE_PIX_EXP )
{
  ...

Then you just pre-build a shader for the permutations you need by changing the shader permutation variables. So you can have one ubershader which results in N GL program objects each of which often have no run-time branching.

Dark_Photon · August 14, 2010, 9:40am

10 draw calls per frame! Are you serious? Really low-end hardware or a paranoid manufacturer (naturally Apple doesn’t want perf on their devices to appear to suck, so they’ll stear you from doing anything non-trivial).

What a difference from the desktop world, where we’re talking millions of polys with potentially thousands of draw calls and hundreds of groups of state changes, and all at 60Hz.

I guess if 10 draw calls is your absolute top-level requirement, performance be damned, then leaning toward ubershaders with run-time evaluated conditions (or as DMitryM suggests – always doing both and “mixing in” contributions as needed, if the paths are cheap and conditional branches are expensive or not supported) makes sense to get your batch count down.

If I were you, I’d be tempted to say heck with Apple edicts and feel out the perf of more batches/draw calls vs. more complex shaders so you can be sure their edicts make sense.

DmitryM · August 14, 2010, 5:58pm

Dark Photon, your proposal to use constants doesn’t fit: Sumaleth renders combined meshes having different material, so his parameters may change per vertex.

Dark_Photon · August 14, 2010, 8:20pm

Yes, I acknowledged this in my last post:

I guess if 10 draw calls is your absolute top-level requirement, performance be damned, then leaning toward ubershaders with run-time evaluated conditions (or as DMitryM suggests – always doing both and “mixing in” contributions as needed, if the paths are cheap and conditional branches are expensive or not supported) makes sense to get your batch count down.

The key being run-time evaluated conditions (“real” shader if’s), not compile-time evaluated conditions as I was originally suggesting (if’s removed at compile time by the GLSL compiler). And all to keep his batch count down to the detriment of shader complexity.

Alfonse_Reinheart · August 14, 2010, 9:30pm

Apple have said that if you have more than 10 glDraws in your iOS project, you have too many.

I’d love to see a link to that recommendation. I rather doubt that most iOS games go through the hoops necessary to distill all of their rendering down to 10 or less Draw calls.

I’d also like to see some benchmarks that test this theory.

Sumaleth · August 14, 2010, 10:15pm

It was mentioned a couple of times in Apple’s WWDC 2010 presentation titled “Game Design and Development for iPhone OS, Part 2”, which is available via iTunes to Apple developers.

He doesn’t give any benchmarks, only notes that glUseProgram and glDraw calls are expensive and that “you should be thinking less than 10 draw calls per frame”.

Even if it’s only a rough guide, it’s still useful to know what ballpark I want to be aiming for. And I think I can do what I want using mix() and step() and a couple of extra draws, so I appreciate the feedback.

Rowan.

nickels · August 16, 2010, 1:55pm

Is the GPU not able to make branching on uniforms very fast? For instance:

uniform int lightType;

main() {

vec4 color;
if (POINT == lightType) {
color = GetPointLight(…);
} else if (SPOT == lightType) {
color = GetSpotLight(…);
}

}

I am wondering, since this causes no branching (all pixels/vertices/whatever) take the same path, which is known at glDraw dispatch time.
Even in this case is it better to compile different versions of the shader? And why can’t the branches be optimized?
Hopefully not off topic… very related anyway…

arekkusu · August 16, 2010, 5:25pm

See all of the threads on this forum about nvidia recompiling shaders mid-frame when a uniform is changed to zero.

Dark_Photon · August 17, 2010, 4:43am

And note that’s a Geforce 7 and earlier phenomenon AFAIK. Haven’t yet seen that on the last 5 generations of cards.

nickels · August 17, 2010, 6:54am

Actually that would be great for me, provided they kept the old compiled ones around in a quick to access hash table It would save the work of building the variants!!

system · October 19, 2021, 7:26pm

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.