Out var of vertex shader disconnected from in var of fragment shader?

Hello,

OpenGL ES 3.0 here. I’ve got a weird problem. The program I am writing works of some 50 different (mobile) devices I’ve checked, but recently it has been brought to my attention the the program does not work on Samsung Galaxy J4+ (Adreno 308). I’ve got access to this model and indeed, I can reproduce what has been reported.

Using the tactic of the ‘smallest code that reproducees the problem’ I’ve been shaving various pieces of code off and I’ve arrived at a very simple program with essentially displays a textured quad. Here’s what it does:

Vertex shader:

#version 300 es

// …
in vec2 a_TexCoordinate; // Per-vertex texture coordinate.
out vec2 v_TexCoordinate;

// …
void main()
{
// …
v_TexCoordinate = a_TexCoordinate;
// …
}

Fragment Shader:

#version 300 es

precision mediump float;

in vec2 v_TexCoordinate;
out vec4 fragColor;
uniform sampler2D u_Texture;

void main()
{
fragColor = texture(u_Texture,v_TexCoordinate);
}

Using those shaders, when run on any ‘normal’ device, my textured-quad-program displays the following:

Which is exactly what I expect - this is the input texture.

The very same program, when run on the ‘suspicious’ Galaxy J4+ however, displays the following:

Which is the texture moved by vec2(0.5,0.5) [ I’m using GL_CLAMP_TO_EDGE for texture wrapping ]

So it really looks like the J4+ somehow moves the texture coordinates by (0.5,0.5). I’ve then tried the following for the fragment shader:

#version 300 es

precision mediump float;

in vec2 v_TexCoordinate;
out vec4 fragColor;
uniform sampler2D u_Texture;

void main()
{
fragColor = texture(u_Texture, vec2(0.5,0.5) );
}

and that displays (on any device including the J4+)

Which is expected - this is the color of the middle of the texture. Then I’ve tried

Vertex shader:

#version 300 es

// …
in vec2 a_TexCoordinate; // Per-vertex texture coordinate.
out vec2 v_TexCoordinate;

// …
void main()
{
// …
v_TexCoordinate = vec2(0.5,0.5);
// …
}

Fragment Shader:

#version 300 es

precision mediump float;

in vec2 v_TexCoordinate;
out vec4 fragColor;
uniform sampler2D u_Texture;

void main()
{
fragColor = texture(u_Texture,v_TexCoordinate);
}

i.e. hardcoding all the texture coordinates to (0.5,0.5) in the vertex shader and passing those on to the fragment shader - the effect on any ‘normal’ device is expected (blue rectangle just like above), but on the J4+ it displays

i.e. again the input texture moved by (0.5,0.5).

So it would seem like on the J4+ the out ‘v_TexCoordinate’ variable of the vertex shader is completely disconnected with the in ‘a_TexCoordinate’ variable of the fragment shader.

How is that possible?

So if it does not take ‘v_TexCoordinate’ in the fragment shader from the ‘v_TexCoordinate’ in the vertex shader, then where does it take it from? I am pretty sure it takes it from the values of the vertex coordinates themselves. In case of the quad, they are (-0.5,-0.5) [lower left] to (0.5,0.5) [upper right].

Try incorporating the value of v_TexCoordinate itself in the output colour.

With the original vertex shader, what happens if you vary the values being fed to a_TexCoordinate? Do you get a +0.5 shift, or do you get a result which ignores the attribute value? Or something else?

Issue (mostly) worked-around.

So this turned out to be two separate issues, both IMHO in the Adreno 308 driver V@331 ( OpenGL ES 3.0 V@331.0 (GIT@e6de3b7, I22091d40c2) (Date:08/07/19) ).

First, when one uses Transform Feedback, this particular driver messes up the mappings between the corresponding ‘out’ vars of the vertex shader and ‘in’ vars of the fragment shader. This is why my ‘v_TexCoordinate’ in fragment var was mapped not to the corresponding ‘v_TexCoordinate’ out vertex var, but to another vertex var declared two rows up.
Removing a call to ‘glTransformFeedbackVaryings()’ completely fixes the mappings.

Second, this driver only supports an UBO of max size 1920 bytes. Anything larger just disappears into the void.
glGetIntegerv(GL_MAX_UNIFORM_BLOCK_SIZE) returns 65536 i.e. 64k.

I managed to work around both issues, the first one completely, and the second one - at the cost of some functionality. Now I am scratching my head how to detect this bug programatically and only switch on the workaround if it is actually needed.

Correction: after some more debugging it turns out that the nature of the 2nd bug is slightely different: it’s not max 1920 bytes per a single UBO, but max 6000 bytes for all uniforms in the vertex shader.

Shaders compile, glGetError() keeps returning NO_ERROR, the whole thing runs, but if I allocate more than 6000 bytes for uniform variables in the vertex shader, my objects get drawn only partially or not at all.

Is there a way to query the driver for its limit on number of uniforms in the vertex shader?

Assuming GLSL-ES supports it (and if you haven’t already tried it), you might try specifying the transform feedback varying output specification in the shader:

Thanks DarkPhoton!

Actually the trasform feedback part doesn’t bother me too much. I have managed to completely work around that.

Much bigger problem is the second bug: my objects only get partially (or not at all) drawn if I try allocating more than 6000 bytes for uniform variables in the vertex shader, and that includes the UBOs.

ES 3.0 is supposed to support 1024 components in the default block plus 16384 bytes (4096 components) for each uniform block, with a limit of 12 uniform blocks for the vertex shader.

Are you allowing for padding in calculating the size of uniform variables? UBOs align almost everything to the size of a vec4, which can result in a lot of padding for smaller types. In particular, a float[] will align each element to a four-word boundary, meaning that 75% of the space is padding.

I have three UBOs in the vertex shader, each one composed entirely of one array of vec4’s:

layout (std140) uniform vUniformEffects
{
vec4 vEffects[3*NUM_VERTEX];
};

So no, there is no padding.

And yes, the ‘1024 components in the default block plus 16384 bytes (4096 components) for each uniform block’ thing holds true 99% of the time, except on the Adreno 308 driver V@331 ( OpenGL ES 3.0 V@331.0 (GIT@e6de3b7, I22091d40c2) (Date:08/07/19) ).

Given your finding on Adreno OpenGL ES drivers, I find this gem in the ANGLE project (main page link) test code very curious:

// Test to transfer a uniform block large array member as an actual parameter to a function.
TEST_P(UniformBlockWithOneLargeArrayMemberTest, MemberAsActualParameter)
{
    ANGLE_SKIP_TEST_IF(IsAdreno());

    constexpr char kVS[] = R"(#version 300 es
...
layout(std140) uniform UBO1{
    mat4x4 buf1[90];
} instance;

layout(std140) uniform UBO2{
    mat4x4 buf2[90];
};
...
})";
...
    ANGLE_GL_PROGRAM(program, kVS, kFS);
    EXPECT_GL_NO_ERROR();
}

Notice that this test would pass in UBO storage totaling 11,520 bytes, as well as the complete skip of this test if running on Adreno drivers. It would seem someone may have hit similar “UBO size” troubles with Qualcomm Adreno drivers…

Random thought. You might query the value for these uniforms:

  • GL_MAX_VERTEX_UNIFORM_COMPONENTS (== 1024? 1536?)
  • GL_MAX_COMBINED_VERTEX_UNIFORM_COMPONENTS (== 197504?)

While only the 2nd is supposed to include total space for UBOs (OpenGL ES 3.0 Spec link), I wonder if the Adreno driver folks might have incorrectly treated the 1st as the limit including UBOs. If the value of GL_MAX_VERTEX_UNIFORM_COMPONENTS were 1536, that’d be pretty close to 6000 bytes (1536*4 = 6144).

Also, I’d be remiss if I didn’t suggest you post a short repro for this bug to this Qualcomm Adreno forum:

Responses from knowlegeable Qualcomm folks are hit-and-miss. But you might get something useful from them out of it. Or another Adreno dev might follow-up with the workarounds they’ve already found for this bug.

Hmm…

Their Vulkan guide contains the same recommendation (never mind that Vulkan GLSL doesn’t have “default uniform block uniforms”). And further they recommend also preferring UBOs “over push constants on Adreno hardware for performance reasons”:

I wonder how big the “hardware constant RAM” is on Adreno 308 GPUs/graphics drivers, and if it’s configurable…

Wild speculation:

In OpenCL land on Adreno (not necessarily the same)…

Hmm. I wonder if internally their GLSL compiler prefers to locate UBOs in Adreno GPU on-chip constant memory, but there’s a flaw in its falling back to off-chip system RAM when the total UBO space required is too large for constant memory.

Anyway…

If you can’t find some reasonable workaround for the “Max 6000 bytes total vertex uniform space incl UBOs” bug/feature on Adreno graphics drivers, the above at least suggests trying a fallback to using SSBOs instead. SSBOs may not suffer this same bug. And given the above, it sounds like it’s possible that if the UBO space required is > 3KB, the driver may punt the UBO out to off-chip system RAM anyway, which may might render it roughly equivalent in access performance to an SSBO on Adreno drivers. That’s something to check anyway…

On the troublesome platform, we have

GL_MAX_VERTEX_UNIFORM_COMPONENTS = 1536
GL_MAX_COMBINED_VERTEX_UNIFORM_COMPONENTS = 230400

So indeed, your suspicion might be true.

I have now managed to just about fit (with - I guess - about 50 bytes to spare) inside the ‘permitted’ 6000 bytes by basically converting things like

layout (std140) uniform vUniformProperties
{
vec4 vProperties[NUM]; // properties of the effect, x=name, y=unused, z=height, w=unused
};

to

layout (packed) uniform vUniformProperties
{
vec2 vProperties[NUM]; // properties of the effect, x=name, y=height
};

Now I need to dynamically figure out the stride in the array, and to make things worse I need the CPU-side of it already created before I compile the shaders, so I need to guess that the stride will be 8, allocate the CPU side, compile the shaders and figure out the stride, and re-allocate the CPU side if that turns out not to be 8.

But fortunately on the problematic Adreno driver the stride is equal to 8.

Interesting! Thanks for following up! I find that fascinating.

Yeah, sometimes dev’ing on OpenGL and OpenGL ES feels like divination, trying to guess what the heck is going on down there in the driver.

layout (packed) uniform vUniformProperties
{
vec2 vProperties[NUM];  // properties of the effect, x=name, y=height
};

Good idea.

You could also instead use std140, vec4, and div2 / mod2 math to locate the correct property. That’d avoid having to query offsets.

And it’s too bad UBOs in GL/GL-ES don’t support std430 packing, like they do in Vulkan. That’d be the cleanest solution to this problem.