Segfault in glDrawArrays

k3rnL · January 20, 2020, 4:17pm

Hello everyone,

I’m stuck on a segfault since few days, I tried many things to find and debug the program, it was crashing randomly but I finally found what case makes segfault.

It’s a basic 2d application who draw only triangles, I have two shader, one for basic polygones and another one for text.
I have one draw call for each of those two category, to be able to do only one draw call I made a c++ class that handle an opengl buffer with a preallocated size. A ManagedBuffer is able to increase the size of its buffer, it basically double the size (like in std::vector)

The segfault happens when a certain amount of items is reached. More specifically, it happens when the triangles buffer increased its size 3 times, and the text buffer 2 times. I must precise that the segfault happens only if the text buffer increased its size.
From this moment, the next draw call for triangles will crash.
BUT, if I comment the texts’s draw call, it will not crash ! The same if I comment triangle’s draw call.

After some tries, I figured out that one line in my text shader lead in segfault.

    #version 110
    
    // 0b10000000000000000000000000000000 in hexa :
    #define ALL_VIEWS_VISIBILITY_FLAG 0x80000000
    // because literral binary is not supported in GLSL 110
    
    // view_id and flags are supposed to be uint view_id and uvec2 flags. But is not 
    supported by glsl 110

    uniform mat3 camera;
    uniform vec2 viewport;
    uniform int view_id;

    attribute vec4 vertex;
    attribute vec4 color;
    attribute vec2 char_position;
    attribute vec2 flags;

    varying vec2 textureCoordFrag;
    varying vec4 colorFrag;

    // Tricks to get a bit from a float number. As OpenGL 2.0 does not support bitwise operation..
    float getBit(float num, int position) {
       bool tmp = fract(floor(num / pow(2., float(position))) / 2.) != 0.;

       // We can't return a boot in GLSL
       return tmp ? 1. : 0.;
    }

    void main() {
       // Get the visibility of this character. 22 is the MSB.
       // MSB and the bit associed to the view must be to 1 to draw the character.
       // This line could be written with if statement, but it's inefficient on GPU.
       float visibility = getBit(flags.x, 22) * getBit(flags.x, 22 - view_id);

       // If visibility is 0, v will be equal to (0,0)
       // all vertex of the glyph will be equal to (0,0) and the drawing will be ignored.
       vec2 v = visibility * ((camera * vec3(char_position.xy, 1)).xy + vertex.xy / viewport * 22.5 * 2.);
       v.y *= -1.;

       // z component as close as possible to always have text over other items
       gl_Position = vec4(v, 0, 1);
       textureCoordFrag = vertex.zw;
       colorFrag = color; <---- THIS LINE
    }

If I comment out the copy of the color attribute, I don’t segfault anymore. For example :

    colorFrag = vec4(vertex.xy, 1, 1);

works perfectly.

I checked my buffers to see if I ask OpenGL to draw more items than the size of the buffers, but It seems to be correct.
Also, the SAME program works perfectly on another machine.

This is how I draw texts

    mProgram->bind();
    buffer.bind();

    mProgram->setAttributeBuffer("vertex", GL_FLOAT, 0, 4, sizeof(GLfloat) * 12);
    mProgram->enableAttributeArray("vertex");
    mProgram->setAttributeBuffer("color", GL_FLOAT, sizeof(GLfloat) * 4, 4, sizeof(GLfloat) * 12);
    mProgram->enableAttributeArray("color");
    mProgram->setAttributeBuffer("char_position", GL_FLOAT, sizeof(GLfloat) * 8, 2, sizeof(GLfloat) * 12);
    mProgram->enableAttributeArray("char_position");
    mProgram->setAttributeBuffer("flags", GL_FLOAT, sizeof(GLfloat) * 10, 2, sizeof(GLfloat) * 12);
    mProgram->enableAttributeArray("flags");

    glBindTexture(GL_TEXTURE_2D, font.getTexture());
    mProgram->setUniformValue("tex", 0);

    auto mat = camera.getMatrix();
    gl.glUniformMatrix3fv(mProgram->uniformLocation("camera"), 1, false, (GLfloat *) &mat);

    glm::vec2 viewport = camera.getViewport();
    mProgram->setUniformValueArray("viewport", (GLfloat *) &viewport, 1, 2);

    mProgram->setUniformValue("view_id", (unsigned int) viewId);

    // buffer.count returns the count of characters, and there is 6 vertex per characters
    glDrawArrays(GL_TRIANGLES, 0, buffer.count() * 6);

On GDB, the segfault happens in the nvidia shared library with no stack.
The machine is a Linux 32bit on nvidia Quadro FX 1500.

I tried also to check memory with valgrind and opengl error with apitrace, without any success…

Some idea ?

Dark_Photon · January 21, 2020, 2:46am

Are you checking to make sure that your shaders Compile and Link succesfully? And Checking for OpenGL Errors?

If you fail to locate your bug, I’d suggest posting a short GLUT (or GLFW) stand-alone test program that illustrates the problem.

k3rnL:

    float getBit(float num, int position) {
       bool tmp = fract(floor(num / pow(2., float(position))) / 2.) != 0.;

       // We can't return a boot in GLSL
       return tmp ? 1. : 0.;
    }

Where did you get that you can’t return a bool in GLSL 1.1? Sure you can:

bool getBit(float num, int position) {
   bool tmp = fract(floor(num / pow(2., float(position))) / 2.) != 0.;

   // We CAN return a boot in GLSL !!
   return tmp;
}
...
    bool  visibility = ( getBit(flags.x, 22) && getBit(flags.x, 22 - view_id) );

k3rnL · January 21, 2020, 10:35am

Thanks for your answer !

Yes I do, the program would stop if something wrong with the shaders, as well as with the possible OpenGL errors.
I check at least one time per frame using glGetError.

Mmmhh, you are right, I’ll modify it !

About my problem, I got more information, I figured out that it’s the buffer recreation that lead into a segfault. If I allocate the buffer with enough memory at the start of the application, everythings works well.
I firstly thought that it’s concurrency problem (the buffer is used in multiple thread, I protected it with mutex), but after adding more security, the problem is still there.

The point is that I destroy the old buffer, but it can’t happens while I’m doing the rendering thanks to the mutex.
So there is my question, is OpenGL keeping a reference on this old buffer and try to use it anyway ?

This is how I increase buffer size, there is no reference to mutex there because this function is called from another function (add(item)) which is protected by mutex (the same one for every action) :

    void ManagedBuffer::increaseBufferSize() {
       // Create the new buffer
       QGLBuffer newBuffer;
       newBuffer.setUsagePattern(QGLBuffer::DynamicDraw);
       newBuffer.create();

       void *oldBufferMap;
       void *newBufferMap;

       // Bind both old and new buffer and map the memory
       mBuffer.bind();
       oldBufferMap = mBuffer.map(QGLBuffer::ReadOnly);

       // allocate the memory
       newBuffer.bind();
       newBuffer.allocate(0, mElementSize * mSize * 2);
       newBufferMap = newBuffer.map(QGLBuffer::WriteOnly);

       // Copy data from old buffer to new buffer
       memcpy(newBufferMap, oldBufferMap, mElementSize * mSize);

       // unmap old buffer as it is still bind
       newBuffer.unmap();

       // Bind old buffer, unmap it, and delete it
       mBuffer.bind();
       mBuffer.unmap();
       mBuffer.release();
       mBuffer.destroy();

       mBuffer = newBuffer;

       // Generate free indexes
       for (int i = mSize * 2 - 1; i >= mSize; --i) {
           mFree.push_front(i);
       }

       mSize *= 2;
    }

Is this the best way ?

GClements · January 21, 2020, 11:46am

Deleting a buffer object (via glDeleteBuffers) won’t release its data store for re-use so long as there are pending commands which may access it. But this should be transparent to the application. A buffer object cannot be accessed through the OpenGL API after deletion.

Copying data between buffers should be done with glCopyBufferSubData or glCopyNamedBufferSubData rather than using the CPU. If you can’t guarantee support for the latter, you’ll need to bind them to different targets (conventionally GL_COPY_READ_BUFFER and GL_COPY_WRITE_BUFFER) rather than using QGLBuffer::bind.

k3rnL · January 21, 2020, 12:03pm

Yes, it’s what I wanted to do when I started to work on this project. But unfortunatly those functions does not exists in OpenGL 2.1, which is the maximum version that I can use due to the hardware where the program run (old Matrox card and nvidia card).
In OpenGL 2.1 there is only copy function for textures and colors. The only way that I found is to map the buffer.

Then my code should work, I must miss something else. I didn’t said it before because I think that it’s not relevant, but I use context sharing to share buffers accross differents views. Could it be a problem ?
Thanks for your reply

GClements · January 21, 2020, 12:46pm

It “could be” a problem in the same sense that using multiple threads could be a problem. Both create additional possibilities for introducing bugs. But of the two, threading is far more likely to be an issue than sharing data across contexts.

k3rnL · January 22, 2020, 2:03pm

So, after long investigation and long debugging, I finally found the solution.

It seems that on this driver (304.xx nvidia 32bit), opengl don’t keep an instance of the deleted buffer even if it should be used in a stacked draw call.
The solution was to sync my program with opengl using glFinish().

GClements · January 23, 2020, 4:15am

Bear in mind that glFinish can have a massive performance penalty.

FWIW, I suspect that this is a threading-related issue. Are you calling glFlush before releasing mutexes?

k3rnL · January 27, 2020, 9:50am

Came back to this work this morning and found again the same segfault. It seems that it works sometimes.
So 100% agree with you, it’s threading-related issue !
But I don’t understand why, every actions on Scene/Buffers are mutex protected. On your advice I added this morning glFlush before each mutex releasing.

What’s strange also is that is crashes only when the size increase from 4718592 bytes to 9437184 bytes. But the buffer creation does not fail and the glBufferMap return a correct pointer.