glCompileShader causing glibc free invalid next?

[b]OS…Linux/Fedora 11 x64
Compiler…GCC 4.1.2
GL Version…2.1.2 NVIDIA 173.14.12

Issue: glCompileShader() call throws glibc detected error, specifically - free(): invalid next size(fast): followed by a 64-bit address.[/b]

Hello all. Bit of an odd problem here that I’m not sure how to go about resolving it - especially given that the issue wasn’t always there. This isn’t code that I wrote, but up until a couple of days ago it was working just fine.

There’s a function InitializeShaders() which does the standard glCreateShader, glShaderSource, glCompileShader routine for the vertex shader and the fragment shader.

Goes a bit like this:


void InitializeShaders()
{
    /////////////////////////////////
    //
    // Create our vertex shader...
    //
    /////////////////////////////////

    const char *vertexShaderStrings[1];
    GLint bVertCompiled = false;

    m_VertexShader = glCreateShader(GL_VERTEX_SHADER);

    std::string vertexShaderFilename = "/shaders/vert_shader.vert";
    
    // ReadShaderFile() returns the contents of the file as a cstring
    unsigned char *vertexShaderAssembly = ReadShaderFile( vertexShaderFilename.c_str() );

    vertexShaderStrings[0] = (char*)vertexShaderAssembly;

    glShaderSource(m_VertexShader, 1, vertexShaderStrings, NULL);    

    glCompileShader( m_VertexShader );  // works okay here!

    delete vertexShaderAssembly;


    // Did it compile okay?
    glGetShaderiv(m_VertexShader, GL_COMPILE_STATUS, &bVertCompiled);
    if( bVertCompiled == false )
    {
        PrintShaderInfoLog(m_VertexShader);
    }




    //////////////////////////////////////
    //
    // Create our fragment shader...
    //
    //////////////////////////////////////
    const char *fragmentShaderStrings[1];
    GLint bFragCompiled = false;

    m_FragmentShader = glCreateShader(GL_FRAGMENT_SHADER);

    std::string fragShaderFilename = "/shaders/frag_shader.frag";

    unsigned char *fragmentShaderAssembly = ReadShaderFile( fragShaderFilename.c_str() );                

    fragmentShaderStrings[0] = (char*)fragmentShaderAssembly;

    glShaderSource(m_FragmentShader, 1, fragmentShaderStrings, NULL);    

    glCompileShader( m_FragmentShader ); // CRASH CRASH CRASH

    delete fragmentShaderAssembly;

    // Did it compile okay?
    glGetShaderiv(m_FragmentShader, GL_COMPILE_STATUS, &bFragCompiled);
    if( bFragCompiled == false )
    {
        PrintShaderInfoLog(m_FragmentShader);
    }

    /////////////////////////////////
    //
    // Create a program object and attach our compiled shaders to it...
    //
    /////////////////////////////////

    GLint bLinked = false;
    m_ShaderProgram = glCreateProgram();
    
    glAttachShader( m_ShaderProgram, m_VertexShader );
    
    glAttachShader( m_ShaderProgram, m_FragmentShader );
    
    //
    // Link the program object and print out the info log...
    //
    glLinkProgram( m_ShaderProgram );
    
    glGetProgramiv(m_ShaderProgram, GL_LINK_STATUS, &bLinked);
    
    if( bLinked == false )
    {
        PrintProgramInfoLog( m_ShaderProgram );
    }
}

I’ve narrowed it down to being the second glCompileShader through liberal use of cout and cout.flush… it’s definitely breaking somewhere in that glCompile call. I can’t understand what would cause a failed free() call in the shader, though, given there are no pointers or arrays of any kind being used in the shader. I’m not sure I can post the shader code for proprietariness (new word!) reasons, but… if anyone can suggest something to look for…?

The only parts of the code that have been touched are way deeper in the code where I experimented with changing a couple of variables LONG after glCompileShader has been called. I only commented them and changed a 1.0 to a 0.0, and have since changed them back, so I’m very very very hesitant to think this is the cause.

I’d think maybe libc or something had been updated, but this machine isn’t network connected so it has no way to pull updates…

EDIT: I’ve also cout’ed and verified that the shaders ARE being copied in correctly to the shaderAssembly/shaderStrings variables… and like I said, no pointer use in the shaders, so I don’t know where a free() call would be happening…

Memory corruption. Likely something your app is doing, and NVidia is just getting stuck with the resulting garbage state, but it’s possible it’s a driver bug.

…especially given that the issue wasn’t always there. This isn’t code that I wrote, but up until a couple of days ago it was working just fine.

More ammo that the mem corruption likely isn’t caused here, but elsewhere.

To cut to the chase, run valgrind on your app. This should be a standard package you can install.

Instead of running:

myapp

(where myapp is an ELF executable) you’d run:

valgrind myapp

Some options I like to run with:

valgrind -v --smc-check=all --tool=memcheck --num-callers=16 --trace-children=yes --error-limit=no --leak-check=full myapp

You can run this on libtool wrapper scripts too, but you probably don’t care.

Usually this will point you directly to the bug. Just route the output to a file, and then once complete, just search from the top for the word “Invalid”. Boom! – there’s your problem.

The malloc lib also has some built-in debugging tools, but in practice I’ve never found them useful, so I recommend you just use valgrind.

Well how bout that… it turns out valgrind was already on the machine the code was on and I did some grindin’… I got… interesting results. It essentially wrote YourCodeSucks.txt with invalid complaints all OVER - BUT the app worked fine and got past the error…? Is that normal?

I don’t understand the invalid read/writes that it’s complaining about. A lot of the functions it complains about don’t even have any pointer use other than setting a few pointers to null (though in the code they actually use 0 rather than null). Does the fact that these function calls are in libraries matter?

Lots of complaints of an invalid write size of 8, or something similar, or an invalid read size of 1, or something similar. I’m having trouble making most of this out… arg.

:slight_smile:

…with invalid complaints all OVER - BUT the app worked fine and got past the error…? Is that normal?

That kind of output is only normal if you’ve got lots of memory bugs in your code (in my experience). That’s not uncommon if you’ve got an app that’s been around forever and folks haven’t been running a memory checker to make sure these bugs aren’t creeping in.

But to your question, yes, it’s not uncommon when you have invalid memory write/read problems for the program to seemingly “work” when you run it under valgrind but core dump outside of it. This is because the memory organization shifts around, and it’s less likely that you’re going to stomp something running inside valgrind that’ll cause the program to belly-up. Though it can happen.

I don’t understand the invalid read/writes that it’s complaining about. … Does the fact that these function calls are in libraries matter?

Not in my experience.

Normally I valgrind with an app compiled “-g -O2”. But if you want more detailed stack traces (inline functions not inlined) and guaranteed accurate line numbers (unaffected by compiler optimizations), then I’ll compile without optimization at all and full debug: “-O0 -g3 -ggdb3 -fno-inline”.

Also, before even going there, let the compiler help catch as much as it can. Compile with “-Wall -Werror” and fix anything that comes up. If necessary, you can selectively disable the detection of specific warnings with other options (e.g. -Wnon-virtual-dtor, -Wno-parentheses, etc.).

Lots of complaints of an invalid write size of 8, or something similar, or an invalid read size of 1, or something similar. I’m having trouble making most of this out… arg.

Uh oh. :frowning: That’s usually not good. What I often find this refers to is a point in the code where the code has written (“Invalid write of size…”) or read (“Invalid read of size…”) outside the bounds of any valid memory block. That’s bad. This can of course crash the program (on invalid addresses), and those “invalid writes” are the worst because they can silently corrupt memory, leaving you to core dump or assert later someplace that has nothing to do with the original error other than it depends on correct memory contents, which have been corrupted.

Lots of complaints of an invalid write size of 8, or something similar, or an invalid read size of 1, or something similar.

Sounds like an off-by-one error. Like you’re reading/writing beyond the bounds of an array somewhere.

I threw the debug tag onto the compiler for the library and rebuilt it, got some more specific information on where some of these complaints were happening… and I’m even more confused.

I was expecting something like writing 0 to a pointer to be where it was complaining, but I got three complaints of an invalid write of size 8 on a line where the number 45.0 was being written to a GLfloat. The same thing is being done on the line above it (first is x, second is y)… something is awry here. >_>

There are other similarly weird errors popping up elsewhere… are these possibly false positives or something more sinister? They’re the first handful to pop up so…

Ok, if you post a little code, I and other folks can help. First-off, writing 45.0 to a GLfloat should be a 4-byte write, not 8, if it’s truly a GLfloat you’re writing to.

And whatever you’re writing to, question the memory this variable is contained in. Maybe you’ve written to an object with a bogus pointer.

Also, many times below those invalid write stack traces there’s a comment that tells you more about the address you’re writing to (e.g. “which is 10 bytes past the end of a block allocated from here <stack trace>”). That usually gives you more clues as to what you did wrong.

This thread may be on a temporary hiatus… I’m away from my work PC and won’t be able to get at the code / valgrind it for about a week…

It is writing to a GLfloat, though this is on x64… maybe that explains the 8-byte write? I’m not sure. Though why would valgrind think it wrong then… Hmm…

My advice is: learn C++ and use smart pointers/arrays and std and boost containers. The memory problems then disappear.

Okay, I’ve gone back and had some more time to valgrind / read through the errors / check the code… just about ALL the errors I can find are related to either:

  1. An invalid read/write size to a GLfloat / GLdouble
    or
  2. A conditional jump or move depending on an uninitialized value from something in libGLcore

I’ve looked through gl.h and from what I can see, GLfloat is just a typedef’d float (remember I’m still new to opengl so I’d never actually looked at that before… be gentle on me :P) so I’m not sure why this would be having size issues… I’m guessing it’s probably something related to the libGLcore problems?

Yeah, the 2nd is pretty typical IIRC, so long as it occurs inside of libGL (not in your code). Strictly speaking, this could indicate that some values you’re passing to OpenGL are unitialized, but not necessarily so.

The first errors are by far the worse of the two. Whenever I’ve seen this, it’s a real, serious problem in our code. I never see this in the GL driver. But on that thread…

What options are you passing valgrind? For NVidia drivers, you need to make sure that you’re passing --smc-check=all. Per the NVidia driver README:

Valgrind

    The NVIDIA OpenGL implementation makes use of self modifying code. To
    force Valgrind to retranslate this code after a modification you must run
    using the Valgrind command line option:

    --smc-check=all

    Without this option Valgrind may execute incorrect code causing incorrect
    behavior and reports of the form:

    ==30313== Invalid write of size 4

If this isn’t your problem, perhaps you could pull out a small section of code that demonstrates the problem (i.e. right where valgrind is telling you an invalid write occurs in “your” code, not libGL’s), so folks here can see the problem, reproduce it, and advise.

has anyone actually looked at the code he’s posted?!


delete vertexShaderAssembly;

so you say that the ReadShaderFile function returns a cstring, and presumably it’s allocating the returned string using the ‘new’ operator? In which case you should be deleting it with:


delete [] vertexShaderAssembly;

same for the fragment shader bit.

that code is so awful it brought tears to my eyes. Rewrite it as soon as possible.
But first, tell us more about the ReadShaderFile function. Post the source to it here, as I would imagine that too is riddled with confusion and (frankly) school boy errors.
Where’s the error checking? The code doesn’t even check if it’s successfully loaded the bleedin’ shader source from file…unless it’s throwing an exception that’s caught somewhere further down the line, but to be honest I doubt that very much.
BTW, I know you didn’t write this rubbish, omdown. I really would be hesitating to write your predecessor a reference, truth be told.

I’ve been using the smc-check=all flag… I’ll post a small code chunk below of where the first flurry of errors are being reported by Valgrind…

BAH. No, he may have written it, but I should have caught that. There are actually a lot of different shaders being loaded that had the same issue, went back and fixed all of those, completely overlooked it. Dumb error. That said I agree with you on error checking / etc. The code I posted isn’t a DIRECT copy paste, a little debug code and whatnot removed, but for the most part in the program (all 14Kish lines), any error checking consists of :

  1. reporting the error and continuing along like nothing happened as long as the program can go without crashing.
    or
  2. just returning 0 (or something similar) and crashing.

Been begging for time to go back and actually clean up the existing code rather than adding NEW code for some time now… but the thing is behind schedule and there’s a demo and blah blah blah. lol oh the joys of contracting.

Anyway, the example of where valgrind is throwing the first errors… I can’t DIRECTLY copy / paste for the sake of the code being private and whatnot, but… this is close enough, the names and faces have just been changed to protect the innocent. :wink:


Render::Render() {
    m_Scene = NULL;
    m_Database = NULL;
    m_Near = 0.1;
    m_Far = 10000000.0;
    m_FOVy = 45.0;
    m_Aspect = 1.0;
    m_Width = 640;
    m_Height = 480;
    m_xPixels = m_Width;
    m_yPixels = m_Height;
    m_xFOV = 45.0;
    m_yFOV = 45.0;
}


//Render.h
class Render {
    Scene *m_scene;
    Database *m_Database;

    GLdouble m_Near;
    GLdouble m_Far;
    GLdouble m_FOVy;
    GLdouble m_Aspect;

    GLint m_Width;
    GLint m_Height;

    GLint m_xPixels;
    GLint m_yPixels;

    GLfloat m_xFOV;
    GLfloat m_yFOV;

    //
    //...
    //
}

Valgrind complains of an invalid write for every single line in the Render constructor other than m_Scene and m_Database - only GL* related variables…

I’ve been trying to recreate it too in a small side standalone app actually, thus far no success (or failure?)…

Edit: By the way… I’m not sure if any of this is relevant or not, I’ve never used much of anything like Valgrind before, but there’s this chunk of text that comes up just before the first set of errors:


--8089-- REDIR: 0x3a9467f830 (rindex) redirected to 0x4a07990 (rindex)
--8089-- REDIR: 0x3a94681070 (memset) redirected to 0x4a08c60 (memset)
--8089-- REDIR: 0x3a9467f400 (strlen) redirected to 0x4a07da0 (strlen)
--8089-- REDIR: 0x3a9467a780 (malloc) redirected to 0x4a07570 (malloc)
--8089-- REDIR: 0x3a94682510 (memcpy) redirected to 0x4a081c0 (memcpy)
--8089-- REDIR: 0x3a9467edf0 (index) redirected to 0x4a07ab0 (index)
--8089-- REDIR: 0x3a94679880 (free) redirected to 0x4a06270 (free)
--8089-- REDIR: 0x3a94679e30 (calloc) redirected to 0x4a05340 (calloc)
--8089-- REDIR: 0xffffffffff600000 (???) redirected to 0x3803c8b3 (vgPlain_amd64_linux_REDIR_FOR_vgettimeofday)
--8089-- REDIR: 0x3a94684e80 (rawmemchr) redirected to 0x4a08d70 (rawmemchr)
--8089-- REDIR: 0x3a94680920 (memchr) redirected to 0x4a08180 (memchr)
--8089-- REDIR: 0x3a9467b4f0 (realloc) redirected to 0x4a07690 (realloc)
--8089-- REDIR: 0x3a9467ee70 (strcmp) redirected to 0x4a08090 (strcmp)
--8089-- REDIR: 0xffffffffff600400 (???) redirected to 0x3803c8bd (vgPlain_amd64_linux_REDIR_FOR_vtime)
--8089-- REDIR: 0x3aa30c28a0 (operator new[](unsigned long)) redirected to 0x4a067f0 (operator new[](unsigned long))
--8089-- REDIR: 0x3aa30c2770 (operator new(unsigned long)) redirected to 0x4a06f30 (operator new(unsigned long))
--8089-- REDIR: 0x3a9467f5d0 (strncmp) redirected to 0x4a08020 (strncmp)
--8089-- REDIR: 0x3a94681ba0 (mempcpy) redirected to 0x4a08d90 (mempcpy)
--8089-- REDIR: 0x3aa30c08a0 (operator delete(void*)) redirected to 0x4a05d70 (operator delete(void*))

I’ve looked around online and I see a lot of places where that’s showing up in other people’s valgrind logs, but not much on what it means. I’d presume it’s just moving the function pointers to different locations… But for what?

I would imagine because you’re using the shared library libc. It statically links to stubs, which then get relinked to the so versions of the functions at runtime when they’re mapped into your process address space. Equivalent of the implicit linking seen on Windows, which effectively issues a load of GetProcAddress calls on application startup. It’s all perfectly normal, and nothing to worry about.

What this suggests to me is that at some point, you’re using a block of memory that’s only 8 bytes long (if 32-bit arch) or 16-bytes long (if 64-bit arch) behind a “Render” pointer or reference.

Try moving the m_Scene and m_Database members down and see if you now get complaints about m_Near and m_Far.

I’ve been trying to recreate it too in a small side standalone app actually, thus far no success (or failure?)…

Suggests there’s some funny business going on with your memory blocks in your main app, but not in your test program.

Got any custom allocators in your main app? e.g. operator new? Maybe have multiple definitions of class/struct Render in your app, one that is only 8 or 16 bytes long?

By the way… I’m not sure if any of this is relevant or not, I’ve never used much of anything like Valgrind before, but there’s this chunk of text that comes up just before the first set of errors:


--8089-- REDIR: 0x3a9467f830 (rindex) redirected to 0x4a07990 (rindex)
...

That’s all normal. This is valgrind saying it’s plugging itself into various system routines to perform instrumentation.

[quote]I’ve looked around online and I see a lot of places where that’s showing up in other people’s valgrind logs, but not much on what it means. I’d presume it’s just moving the function pointers to different locations… But for what?

To plug in its own implementation (or a wrapper around the default implementation). The reason being so it can check what you’re doing with memory more effectively. For instance, to plug in its own malloc implementation so it can keep track of what blocks you’ve allocated.

  • Moved them down… doesn’t complain about those two at all, still complains about the others. I’m beyond being worried about it being GLfoo related, tried changing them all to non-GL variables and got the same errors, so…

  • No, no custom new or anything.

  • Wrote an instance of the Render that gets the same results. But works fine on another machine with extremely similar specs…? Valgrind doesn’t complain about the writes on those lines anymore…

I found a spot there was an array getting written out of bounds and repaired that, but that actually went from bad to worse… right now valgrind itself is crashing and telling me “the impossible happened”. lol

I’m about to run splint on it and see what it comes up with. Thanks to everyone who’s sticking around with me on this insanity. :slight_smile:

Interesting. Don’t know what’s going on there with “the impossible happened”. I haven’t seen that for many years, since valgrind’s been stable.

Might try downloading OpenSUSE 11.2 and install on a spare partition. Should only take you about an hour. Then retry. Been running/deving on OpenSuSE for years and the dev tools including valgrind work great.

The backtrace I get when the app crashes is it fails trying to fclose a file pointer… which is strange, given it JUST opened it successfully, read / loaded everything from it successfully, and can print the address of fp after it opens and just immediately before it closes and they match… as we say in the south… “somethin’ jus’ dun sit right about this.” :stuck_out_tongue:

Solved! For anyone still following this godforsaken issue… the problem was the last place I’d have thought of: the make files.

As valgrind was giving me errors right off the bat in the Render’s constructor, saying it was writing outside the X bytes allocated space, I thought “hm, let’s cout << sizeof(Render) and see what I get”. In the driver app, I got the correct size. When I cout << sizeof(Render) in the CONSTRUCTOR of the Render (which is built to a library, a separate project as you’ll remember), I got a difference of over 100 bytes! So no wonder Render was accessing memory outside what it had been allocated!

SOOO I couldn’t figure out what could be causing this discrepancy until someone asked the question that turned out to be the key: Are there, by chance, some #ifdef’s surrounding variable declarations in the class declaration in Render? And of course, there were, and of course they were being defined when Render was built and not when the driver app was built and thus the memory difference yeah. I’m kicking myself for not realizing it sooner. Hahaha.

So yeah. I just wanted to post the resolution just in case anyone ever runs into something similar. :slight_smile: Manymanymany thanks to everyone for following and offering their input, especially for pointing me to Valgrind! :slight_smile: It is by far one of my new favorite apps, and I only wish they had it for Windows! (I have a development team that LOVES some MFC, shudder, so I do a lot of dev in Windows…)

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.