Linking GLSL fragment shader causes sigsegv on NVIDIA (but not Intel) GPU

I’ve been trying to track down the source of this issue for several days now, and I’ve finally narrowed it down enough that I think I can coherently ask for help. I have this shader that I’m compiling and linking for use in my rendering program, that when I link it when running my program on my Intel iGPU, it links fine and my program can continue on its merry way, but when I try to link it (not compile it — link it) when running my program on my NVIDIA dGPU, it sigsegvs, like this:

* thread #12, name = 'render', stop reason = signal SIGSEGV: address not mapped to object (fault address: 0x0)
    frame #0: 0x00007fffe8512e78 libnvidia-eglcore.so.550.76`___lldb_unnamed_symbol7134 + 1688
libnvidia-eglcore.so.550.76`___lldb_unnamed_symbol7134:
->  0x7fffe8512e78 <+1688>: movq   (%r12), %rax
    0x7fffe8512e7c <+1692>: movq   (%rbx), %r15
    0x7fffe8512e7f <+1695>: movl   0x10(%rax), %eax
    0x7fffe8512e82 <+1698>: andl   $0x6000000, %eax ; imm = 0x6000000 

Here’s the full backtrace:

 thread #12, name = 'render', stop reason = signal SIGSEGV: address not mapped to object (fault address: 0x0)
  * frame #0: 0x00007fffe8512e78 libnvidia-eglcore.so.550.76`___lldb_unnamed_symbol7134 + 1688
    frame #1: 0x00007fffe84dc66a libnvidia-eglcore.so.550.76`___lldb_unnamed_symbol6590 + 250
    frame #2: 0x00007fffe8513d2b libnvidia-eglcore.so.550.76`___lldb_unnamed_symbol7153 + 59
    frame #3: 0x00007fffe8513e30 libnvidia-eglcore.so.550.76`___lldb_unnamed_symbol7155 + 176
    frame #4: 0x00007fffe848b66f libnvidia-eglcore.so.550.76`___lldb_unnamed_symbol5634 + 703
    frame #5: 0x00007fffe8490cef libnvidia-eglcore.so.550.76`___lldb_unnamed_symbol5667 + 575
    frame #6: 0x00007fffe84916e5 libnvidia-eglcore.so.550.76`___lldb_unnamed_symbol5669 + 373
    frame #7: 0x00007fffe8a78fe2 libnvidia-eglcore.so.550.76`___lldb_unnamed_symbol30933 + 674
    frame #8: 0x00007fffe8a7c010 libnvidia-eglcore.so.550.76`___lldb_unnamed_symbol30945 + 560
    frame #9: 0x00007fffe8a80dea libnvidia-eglcore.so.550.76`___lldb_unnamed_symbol30946 + 5434
    frame #10: 0x00007fffe8a5b2ab libnvidia-eglcore.so.550.76`___lldb_unnamed_symbol30768 + 1499
    frame #11: 0x000055555566dcae embryo`gl::bindings::Gl::LinkProgram::hb7cf7bef01549de7(self=0x00005555566654c0, program=8) at bindings.rs:4015:88
    frame #12: 0x000055555563862d embryo`embryo::render_gl::shaders::Program::from_shaders::h100c8e20c499bb4e(gl=0x00007fffd65ff438, shaders=&[embryo::render_gl::shaders::Shader] @ 0x00007fffd65feea8) at shaders.rs:115:13
    frame #13: 0x00005555556854c5 embryo`embryo::render_thread::RendererState::shader::h6864b58321fc928d(self=0x00007fffd65ff1c0, shader_name=9, shaders=&[&str] @ 0x00007fffd65ff0a8) at render_thread.rs:288:13
    frame #14: 0x00005555556856ca embryo`embryo::render_thread::RendererState::load_shaders::h46a123c07f1355ca(self=0x00007fffd65ff1c0) at render_thread.rs:298:9
    frame #15: 0x000055555560272f embryo`embryo::main::_$u7b$$u7b$closure$u7d$$u7d$::ha823e9af17ed340e at main.rs:259:21
    frame #16: 0x000055555566f6f6 embryo`std::sys_common::backtrace::__rust_begin_short_backtrace::ha4980b53eab5a9d3(f=<unavailable>) at backtrace.rs:155:18
    frame #17: 0x0000555555656c73 embryo`std::thread::Builder::spawn_unchecked_::_$u7b$$u7b$closure$u7d$$u7d$::_$u7b$$u7b$closure$u7d$$u7d$::haa7681a8b0a82454 at mod.rs:529:17
    frame #18: 0x00005555555f17f3 embryo`_$LT$core..panic..unwind_safe..AssertUnwindSafe$LT$F$GT$$u20$as$u20$core..ops..function..FnOnce$LT$$LP$$RP$$GT$$GT$::call_once::h13438f39cd484106(self=<unavailable>, (null)=<unavailable>) at unwind_safe.rs:272:9
    frame #19: 0x000055555565776f embryo`std::panicking::try::do_call::h712dfbdc03c650bf(data="\U00000001") at panicking.rs:554:40
    frame #20: 0x0000555555657a6b embryo`__rust_try + 27
    frame #21: 0x0000555555657588 embryo`std::panicking::try::h17759d502c347510(f=<unavailable>) at panicking.rs:518:19
    frame #22: 0x0000555555656512 embryo`std::thread::Builder::spawn_unchecked_::_$u7b$$u7b$closure$u7d$$u7d$::h1ec9e359f4461b12 [inlined] std::panic::catch_unwind::h154cea2eef7de5f0 at panic.rs:142:14
    frame #23: 0x000055555565650d embryo`std::thread::Builder::spawn_unchecked_::_$u7b$$u7b$closure$u7d$$u7d$::h1ec9e359f4461b12 at mod.rs:528:30
    frame #24: 0x00005555555d8e2e embryo`core::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::h7105474c005e7222((null)=0x0000555555fbcbc0, (null)=<unavailable>) at function.rs:250:5
    frame #25: 0x0000555555bab245 embryo`std::sys::pal::unix::thread::Thread::new::thread_start::h3631815ad38387d6 [inlined] _$LT$alloc..boxed..Box$LT$F$C$A$GT$$u20$as$u20$core..ops..function..FnOnce$LT$Args$GT$$GT$::call_once::h6b630278c760b971 at boxed.rs:2015:9
    frame #26: 0x0000555555bab23d embryo`std::sys::pal::unix::thread::Thread::new::thread_start::h3631815ad38387d6 [inlined] _$LT$alloc..boxed..Box$LT$F$C$A$GT$$u20$as$u20$core..ops..function..FnOnce$LT$Args$GT$$GT$::call_once::h70462b441b6c0e1f at boxed.rs:2015:9
    frame #27: 0x0000555555bab236 embryo`std::sys::pal::unix::thread::Thread::new::thread_start::h3631815ad38387d6 at thread.rs:108:17
    frame #28: 0x00007ffff7b041f7 libc.so.6`start_thread + 887
    frame #29: 0x00007ffff7b8642c libc.so.6`__clone3 + 44

As you can see, the problem occurs inside the LinkProgram library function. I’ve confirmed this using the OpenGL debug mode that logs every call made, as well.

This is despite the fact that I compile and link eight other shaders in my program, and it only crashes on this, the final one.

Here’s the code for the shader:


#version 430 core

#define RGB_TO_LUM vec3(0.2125, 0.7154, 0.0721)

layout (location = 0) out vec4 FragColor;
layout (location = 1) out vec4 BrightColor;

layout (binding = 0, rgba16f) uniform readonly image2D gPosition;
layout (binding = 1, rgba16f) uniform readonly image2D gNormal;
layout (binding = 2, rgba16f) uniform readonly image2D gDiffuseColor;
layout (binding = 3, rgba16f) uniform readonly image2D gSpecShininess;
uniform vec2 bloomThreshold = vec2(0.0, 1.2);

layout (binding = 4, std140) uniform Light {
    // ...
} light;

uniform vec3 cameraDirection;

subroutine void RenderLight(
    vec3 position,
    vec3 normal,
    out float specular,
    out float diffuse,
    out float attenuation
);

layout(location = 0) subroutine uniform RenderLight renderLight;

layout(index = 0) subroutine(RenderLight) void ambientLight(
    vec3 position,
    vec3 normal,
    out float specular,
    out float diffuse,
    out float attenuation
) {
    // ...
}

layout(index = 1) subroutine(RenderLight) void directionalLight(
    vec3 position,
    vec3 normal,
    out float specular,
    out float diffuse,
    out float attenuation
) {
    // ...
}

layout(index = 2) subroutine(RenderLight) void pointLight(
    vec3 position,
    vec3 normal,
    out float specular,
    out float diffuse,
    out float attenuation
) {
    // ...
}

layout(index = 3) subroutine(RenderLight) void spotLight(
    vec3 position,
    vec3 normal,
    out float specular,
    out float diffuse,
    out float attenuation
) {
   // ...
}

void main()
{
    // ...

    renderLight(
        // ...
    );

    // ...
    FragColor = vec4(rgb, 1.0);
    BrightColor = vec4(rgb * 4.0 * smoothstep(bloomThreshold.x, bloomThreshold.y, dot(rgb, RGB_TO_LUM)), 1.0);
}

Does anyone have an idea where the source of the segmentation fault when linking might be coming from? I’d assume it has to be in the in, out, and uniform definitions right? Something doesn’t match up correctly with the vertex shader linked to it, maybe? So here’s the vertex shader:

#version 430 core

layout (location = 0) in vec3 aPos;

uniform mat4 model_matrix;
uniform mat4 view_matrix;
uniform mat4 projection_matrix;

void main() {
    gl_Position = projection_matrix * view_matrix * model_matrix * vec4(aPos, 1.0);
}

I’ve confirmed I’m not giving a bad ID to the library, either:

(lldb) frame select 11
frame #11: 0x000055555566dcae embryo`gl::bindings::Gl::LinkProgram::hb7cf7bef01549de7(self=0x0000555556664d30, program=8) at bindings.rs:4015:88
   4012	#[allow(non_snake_case, unused_variables, dead_code)]
   4013	            #[inline] pub unsafe fn LineWidth(&self, width: types::GLfloat) -> () { __gl_imports::mem::transmute::<_, extern "system" fn(types::GLfloat) -> ()>(self.LineWidth.f)(width) }
   4014	#[allow(non_snake_case, unused_variables, dead_code)]
-> 4015	            #[inline] pub unsafe fn LinkProgram(&self, program: types::GLuint) -> () { __gl_imports::mem::transmute::<_, extern "system" fn(types::GLuint) -> ()>(self.LinkProgram.f)(program) }
   4016	#[allow(non_snake_case, unused_variables, dead_code)]
   4017	            #[inline] pub unsafe fn LogicOp(&self, opcode: types::GLenum) -> () { __gl_imports::mem::transmute::<_, extern "system" fn(types::GLenum) -> ()>(self.LogicOp.f)(opcode) }
   4018	#[allow(non_snake_case, unused_variables, dead_code)]
(lldb) p program
(unsigned int) 8
(lldb) 

You’ve got some good info here. To summarize:

ERROR       : SIGSEGV: address not mapped to object (fault address: 0x0)
CRASH IN    : libnvidia-eglcore.so.550.76
THREAD      : "render"
FUNCTION    : glLinkProgram()
INSTRUCTION : movq   (%r12), %rax

GPU Vendor  : NVIDIA
GPU         : ????
DRIVER      : 550.76 (NVIDIA)
GL          : ???? (OpenGL ES? OpenGL?)
OS          : Linux (likely X64/AMD64, not ARM64)

Generally NVIDIA’s drivers are usually pretty solid, but I have found the occasional bug in their GLSL compiler/linker.

First, make sure that you are have a GL Debug Callback plugged in, so the NVIDIA driver can notify your application immediately when it finds a problem (ERROR) or wants to offer a warning, performance tip, or informational message about something the app is doing which may be problematic. This saves manually checking for GL errors after each call and nets you much more useful information than just errors.

I suggest this just to make sure that the GL / GLES driver hasn’t detected some error condition that might hint at why you’re getting a crash in glLinkProgram().

Also makes sure you are querying GLSL shader compile message results and logging them. There could be a warning or tip in there that hints at what is causing problems for glLinkProgram().

And of course, make very sure that your glCompileShader() success vs. failure checking is working properly. It’d be unfortunate to chase down the glLinkProgram() crash rabbit hole and glCompileShader() actually failed for one of your shader stages.

Finally, in the absence of any other clues from the above, in my experience tracking GLSL compiler/linker bugs, you’ll want to pair back your shader gradually (or better: build it up from-scratch) to correlate something specific you’re doing in your GLSL shader code that triggers the crash. Then you can focus specifically on that GLSL syntax and what about it might be causing the linker grief.

Maybe your specific use of shader subroutines? Maybe you’re using a huge amount of UBO data (have seen that crash the NV compiler; use SSBO instead)? Maybe there’s an error in your GLSL code that the GLSL compiler isn’t catching (but should be) and the linker is then unprepared to handle the compiler’s result?

1 Like

Thank you for the in-depth reply! I’m really happy to see it, I was half expecting no one to ever respond to this just because it seems like such a weird bug. Also yeah, I did my best to provide what information I could :smiley:

I already did that — OpenGL bindings I’m using for Rust have a mode where you can turn on a detailed debug callback that logs everything that happens (GL function calls, and any errors/warnings), and I’ve set up a feature flag for it and everything, and when running my program with this, all it did was confirm the problem was at the linking of this specific shader.

I also ran my program through renderdoc and it didn’t show any opengl warnings or informational notices or anything despite me turning basically every checking function renderdoc has on.

I do collect the GL Shader compilation messages and log them, as well as correctly checking that the compile step was a success, and that’s kind of what’s stumping me, because the compile is happening seemingly fully successfully and yet the link error is occurring. Can there be Shader compilation messages even if the compilation is successful that I can print out? If that’s the case I can do that in case there are any warnings that might give me a hint.

That’s definitely an idea that occurred to me, I just figured I’d ask on here first in case there was something I was missing before I embarked on that challenge.

I suspect it’s the subroutines, because this current version of the shader is actually a rewritten version of a previous shader that was compiling and linking perfectly fine, and the only new variable involved is the subroutine itself, so I guess I can start there. I’m not sending that much data in the UBO.

Okay, I commented out everything related to the subroutines, and now the shader links just fine, so the problem is somewhere in those. I just have no idea what the issue could be or how I could fix it. Time for some mad science!

Edit: turns out if I leave all the subroutine declarations in, but comment out the subroutine uniform call, everything’s fine, but as soon as I introduce the subroutine uniform call it segmentation faults. Maybe I’ve done something subtly incorrect?

Edit: Okay, figured it out. Apparently, NVIDIA GLSL doesn’t like it when you put dynamic expressions in the arguments of calls to subroutine uniforms. So this didn’t work:

    renderLight(
        imageLoad(gPosition, ivec2(gl_FragCoord.xy)).rgb,
        imageLoad(gNormal, ivec2(gl_FragCoord.xy)).rgb,
        specular,
        diffuse,
        attenuation
    );

but this did:

    vec3 position = imageLoad(gPosition, ivec2(gl_FragCoord.xy)).rgb;
    vec3 normal = imageLoad(gNormal, ivec2(gl_FragCoord.xy)).rgb;

    renderLight(
        position,
        normal,
        specular,
        diffuse,
        attenuation
    );

In any case this seems to qualify as a bug in the shader linker part of the driver - it should not segfault even on invalid input, but diagnose it (I haven’t tried to understand if your input is valid or not). You could try reporting it to NV, their developer forums have a section for driver bugs (or had last time I looked).

1 Like

No problem!

I have this vague recollection that I’ve seen this before, yes. After each compile, even if success, you could try looking for log messages regardless. For example with:

    int length = 0, bytesWritten = 0;
    char logStr[ 4096 ];

    glGetObjectParameterivARB( obj, GL_OBJECT_INFO_LOG_LENGTH_ARB, &length );

    if ( length > 0 )
    {
        glGetInfoLogARB( obj, 4096, &bytesWritten, logStr );
    }

where obj is the GL shader stage handle.

Good find!

1 Like