Issues with UBO std140 shared between different shaders using latest nVidia drivers

Ido_Ilan · July 7, 2013, 4:55am

Hi,

I have a two programs, one is a fixed pipeline emulator, and the other is an extension that adds additional functionality to it. I want to share the same state (matrices, light, etc…), so i put all these uniforms in a one big uniform block (std140), and shared it with the other program.
I have a uniform (the modelview inverse matrix) that is used only in the second program, i used std140 layout so that uniform will not be removed by optimization, and it worked in nVidia 310.xx but not the latest versions (the second program doesn’t see that uniform).

Am I doing the right thing for sharing UBO ?

Thanks
Ido

malexander · July 7, 2013, 7:17am

As of driver 319, Nvidia stopped reporting unused members of a uniform block. If you inspect the structure and size of the UBO, you should notice that the used uniforms are in the same locations and the UBO is the same size in both cases. The driver just isn’t reporting the unused uniforms to you. This is consistent with the intent of the GL_UNIFORM_BLOCK_ACTIVE_UNIFORMS enum (ie, “Active”), but isn’t very convenient in the cases where you’re sharing UBOs. The UBO with the ‘missing uniforms’ should still be compatible with both shaders and contain the data of the unused uniforms.

Ido_Ilan · July 7, 2013, 11:24pm

Hi,

Thanks for the reply.

I’ve installed latest 320.49 which behaves a little different then the previous drivers that broke my code.
The UBO size returned is the size of all uniforms active or not so my allocated buffer is the correct size, I can see the active/non-active uniforms correctly as you said in each program but still nothing renders and I see no error.

Do I need to pay attention to specific things in the change to get this to work, the code have been working for almost two years and I’m lost

Thanks,
Ido

malexander · July 8, 2013, 7:08am

The problem I had with the new driver was that my code was no longer initializing the uniform block members that were unused by the first shader that reported the block, so the second shader would use uninitialized data and produce bad results. Are you certain that all members are initialized?

Ido_Ilan · July 8, 2013, 10:20am

Hi,

I query the uniform location ,offset, size, stride and type in each program, although I use std140 the code query everything from OpenGL, this may not be the most efficient way in std140 but make it easier to change the shader code and the client, is that what you mean by initialization of members?
Actually I was not sure when first written if is it legal to query only once for all shared programs, and it looked like good practice if I wanted to use other layouts.

I can share part of the code by mail if someone is willing to browse it and help (and not laugh at it )

Malexander I really appreciate your answers but I’m lost.

Thanks,
Ido

imported_tonyo_au · July 8, 2013, 6:31pm

If you want to protect yourself from having different programs having different versions of a shared buffer. I suggest you write a pre-processor that includes the shared buffer in each program as it is compiled. Now they must all match and your cpu code can just use a standard include for the shared buffer. I use a


#pragma include file.h

The file is the .h for my c++ code which I parse into glsl code

malexander · July 9, 2013, 7:17am

Malexander I really appreciate your answers but I’m lost.

Perhaps you could post the shader that’s giving you problems?

Also, have you tried narrowing down the problem by using various shader debugging techniques?

[ul]
[li]Setting a dummy vec4(1.0) output from the fragment shader to determine if the fragment processing is the problem (if your model then appears white and in the correct position, it is) [/li][li]Removing or replacing complex calculations with a simple result until the shader ‘works’ again [/li][li]Use transform feedback to read the gl_Position output, or other varyings, to ensure that the vertex processing is correct [/li][/ul]

Those are the steps I usually take when I’m baffled by a shader.

If you want to protect yourself from having different programs having different versions of a shared buffer.

The problem is that the UBO structures are the same between both programs - but as of Nvidia 319, only the members used by the shader itself are returned by queries. So two shaders with the same block structure may report different members depending on how the shaders access the structure. It’s not really a question of the same structure in two shaders getting out of sync (though what you suggest would certainly help with that).

skynet · July 9, 2013, 10:51am

One way to deal with this is to write a dummy shader that uses all the UBO members and doesn’t give the compiler a chance to optimize them away (i.e. all members must contribute to some shader output). The dummy shader doesn’t have to make sense.
Compile this shader, query the offsets and then use these offsets for all other shaders sharing the same uniform block.

Ido_Ilan · July 11, 2013, 6:04am

Hi all,

Thanks for all the help.

I have found the issue with my code and the new nVidia drivers, and although eventually was a simple mistake it took a few hours of debugging and code changes after I fully grasped the change with your help and online resources. The problem was that my programs had a functions that 'share" the uniform storage and biding point and the assumption was that all uniform are active so when I’ve tried to update the buffer with my “shared” uniforms in one program to use in the other program it failed, each program parsed its own uniform (and now the ids and amount was different) and my code didn’t update correctly.

My current approach is to share a more complex object that actually parse the uniforms block in each programed it is shared with and cache all offsets and meta data needed for upload, the program update the uniform using this shared object and not directly using their own cache, making each shared program seeing full view of the uniforms block even if not active.
I will maybe try when I have the time to statically parse and generate the update code (and structure) so it will be competently decoupled from the program as was suggested, but for now I’m content with the application working again.

A side questions: Is nVidia implementations is according to the spec?, as I’ve read it again and again and it does not say it should be all active or not, user wise the old approach is nicer I think, maybe the spec could be update?

Again,
Thank you,
Ido

malexander · July 11, 2013, 8:31am

Good that you tracked it down.

By using the “active” keyword in the enum, the spec implies that the driver only needs to report the uniforms actually used. As I reported in this post, http://http://www.opengl.org/discussion_boards/showthread.php/181747-Nvidia-319-320-drivers-and-shared-layout-UBOs?p=1251070#post1251070, it’s not very convenient for shared-style blocks.In response, Alphonse submitted a spec bug for shared UBOs (The Khronos Group · GitHub).