Fast Readbacks on Intel and NVIDIA

One more update, I modified your doReadbackFAST algorithm to use glGetTexImage as the following:

void WaylandEgl::doFastReadBackTexture() // 12.4  cpu load :)
{
    // Work-around for NVidia driver readback crippling on GeForce.

    if (!buffCreated)
    {
        qDebug() << "Heiht" << mWinHeight << "Width" << mWinWidth;
        pbo_size = mWinHeight * mWinWidth * 2;
        nBytesPerLine = mWinWidth ;
        Readback_buf = (GLchar *) malloc( pbo_size );

        glGenFramebuffers(1, &textFrameBuffer);
        glBindFramebuffer(GL_FRAMEBUFFER, textFrameBuffer);

        glGenTextures(1, &boundTex);
        glBindTexture(GL_TEXTURE_2D, boundTex);
        glTexImage2D(GL_TEXTURE_2D, 0,GL_RGB, mWinWidth, mWinHeight, 0,GL_RGB, GL_UNSIGNED_SHORT_5_6_5, 0);

        glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
        glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);

        glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0,GL_TEXTURE_2D, boundTex, 0);

        if( glCheckFramebufferStatus(GL_FRAMEBUFFER) != GL_FRAMEBUFFER_COMPLETE)
        {
            qDebug() << "Framebuffer error is " << glGetError();
        }
        else
        {
            qDebug() << "Texture Framebuffer is OK" << glGetError();
        }
        buffCreated = true;

        glBindFramebuffer(GL_FRAMEBUFFER, 0);

        glGenBuffers( PBO_COUNT, pboIds );

        // Buffer #0: glReadPixels target
        GLenum target = GL_PIXEL_PACK_BUFFER;

        glBindBuffer( target, pboIds[0] );
        glBufferData( target, pbo_size, 0, GL_STATIC_COPY );


        glGetBufferParameterui64vNV = (PFNGLGETBUFFERPARAMETERUI64VNVPROC)eglGetProcAddress("glGetBufferParameterui64vNV");
        if (!glGetBufferParameterui64vNV)
        {
            qDebug() << "glGetBufferParameterui64vNV not fouynded!";
            return;
        }

        glMakeBufferResidentNV = (PFNGLMAKEBUFFERRESIDENTNVPROC)eglGetProcAddress("glMakeBufferResidentNV");
        if (!glMakeBufferResidentNV)
        {
            qDebug() << "glMakeBufferResidentNV not fouynded!";
            return;
        }

        glUnmapBufferARB = (PFNGLUNMAPBUFFERARBPROC)eglGetProcAddress("glUnmapBufferARB");
        if (!glUnmapBufferARB)
        {
            qDebug() << "glUnmapBufferARB not fouynded!";
            return;
        }

        glGetBufferSubData = (PFNGLGETBUFFERSUBDATAPROC)eglGetProcAddress("glGetBufferSubData");
        if (!glGetBufferSubData)
        {
            qDebug() << "glGetBufferSubData not fouynded!";
            return;
        }

        qDebug() << "Run the optimizatiosn16";


        GLuint64EXT addr;
        glGetBufferParameterui64vNV( target, GL_BUFFER_GPU_ADDRESS_NV, &addr );
        glMakeBufferResidentNV( target, GL_READ_ONLY );

        // Buffer #1: glCopyBuffer target
        target = GL_COPY_WRITE_BUFFER;
        glBindBuffer( target, pboIds[1] );
        glBufferData( target, pbo_size, 0, GL_STREAM_READ );

        glMapBufferRange( target, 0, 1, GL_MAP_WRITE_BIT);
        glUnmapBufferARB( target );
        glGetBufferParameterui64vNV( target, GL_BUFFER_GPU_ADDRESS_NV, &addr );
        glMakeBufferResidentNV     ( target, GL_READ_ONLY );
        buffCreated = true;

        int rowL;
        glGetIntegerv(GL_PACK_ROW_LENGTH, &rowL);
        qDebug() << "Rowl before" << rowL;

        glPixelStorei( GL_PACK_ALIGNMENT, 1 );
        glPixelStorei(GL_PACK_ROW_LENGTH,nBytesPerLine);

        qDebug() << "Pixel st" << glGetError();
        glGetIntegerv(GL_PACK_ROW_LENGTH, &rowL);
        qDebug() << "Rowl after" << rowL;
    }


    glFinish();
    Timer t1;
    t1.start();

    glBindFramebuffer(GL_READ_FRAMEBUFFER,mwindow->openglContext()->defaultFramebufferObject());
    glBindFramebuffer(GL_DRAW_FRAMEBUFFER,textFrameBuffer);
    glBlitFramebuffer(0, 0, mWinWidth, mWinHeight, 0, 0, mWinWidth, mWinHeight, GL_COLOR_BUFFER_BIT, GL_LINEAR);

    // Do a depth readback to BUF OBJ #0
    glBindBuffer( GL_PIXEL_PACK_BUFFER, pboIds[0] );
    glBindTexture(GL_TEXTURE_2D, boundTex);
    //glReadPixels( 0, 0, mWinWidth, mWinHeight,
      //            GL_RGB, GL_UNSIGNED_SHORT_5_6_5, 0 );
    glGetTexImage(GL_TEXTURE_2D,0,GL_RGB,GL_UNSIGNED_SHORT_5_6_5,0);

    t1.stop();
    readTime = t1.getElapsedTimeInMilliSec();

    t1.start();
    // Copy from BUF OBJ #0 to BUF OBJ #1
    glBindBuffer( GL_COPY_WRITE_BUFFER, pboIds[1] );
    glCopyBufferSubData( GL_PIXEL_PACK_BUFFER, GL_COPY_WRITE_BUFFER, 0, 0,
                         pbo_size );

    // Do the readback from BUF OBJ #1 to app CPU memory
    glGetBufferSubData( GL_COPY_WRITE_BUFFER, 0, pbo_size,
                        Readback_buf );

    //sendImage((unsigned char*)Readback_buf,pbo_size);
    t1.stop();
    processTime = t1.getElapsedTimeInMilliSec();
    glBindBuffer( GL_PIXEL_PACK_BUFFER, 0 );
    //qDebug() << "Read Time " << readTime;
    //qDebug() << "Process Time " << processTime;
}

and glGetTexImage consumes 12-14 % CPU . Read Time 0.216 ms and Process Time 3.296 ms.
It is similar to performRenderBuffer16 algorithm CPU load which is good number :slight_smile:

Regards

Hello ,

I have some more questions as the following:

I can not find GL_BUFFER_GPU_ADDRESS_NV,GLuint64EXT etc inside <GLES* includes.They are existing on #include <GL/glext.h> which is for Desktop. Since I want to run the algorithms on embedded device I need to know what should I do ? Should I add <GL/gl.h> and <GL/glext.h> includes ?

Also, glReadPixels are transforming pixels in wrong direction because of nature of glReadPixels internal implementation. In order to transform the pixels I performed an algorithm that may cause a bit cpu load as well.So Do you know if there is a ready function on opengl to transform pixels in correct order before sending to another device ?

Best Regards

If you want to run code on non-NVIDIA platforms, then don’t use NVIDIA-only extensions, like most of their bindless stuff.

Hello ,

I am sorry but I could not understand what you mean by telling “don’t use NVIDIA-only extensions”, if you mean not using GL_BUFFER_GPU_ADDRESS_NV because it is not exist in <GLES* includes on nvidia platform ,do you have any other suggestions? Since it seems to me it is helpful for reducing cpu usage on nvidia xavier nx platform I used it. Also should I add or is it logical to add <GL/gl.h> and <GL/glext.h> when running the algorithm on embedded platform ?

Also, glReadPixels are transforming pixels in wrong direction because of nature of glReadPixels internal implementation. In order to transform the pixels I performed an algorithm that may cause a bit cpu load as well.So Do you know if there is a ready function on opengl to transform pixels in correct order before sending to another device ?

Best Regards

I can’t suggest an alternative because I don’t know your code. I don’t know what your algorithm is, nor do I know why you’re using bindless memory and such like this.

I could say to use SSBOs, but that wouldn’t explain how to use them to get equivalent behavior, since I have no idea what that behavior would be. Without knowing anything about your particular use case, I can’t say.

You said that before, but it still doesn’t make sense. What is the “wrong direction” exactly, and what would the right direction be?

Hello,

In fact I mentioned my requirement and algorithms on previous posts.So I will
try to summarize my requirement and algorithms as the following:

I am trying to save screenshot of a qml quick controls application on nvidia jetson xavier nx platform (running QT on wayland) by using native opengl functions.
What I need is to get 16 bit RGB color buffer pixels and send it to another device which does not have dma buf or opengl support, so I have to send color pixels as
byte array. I managed to implement my goal with GL_RGB565 render buffer and with PBO with asychronous read back.And tried to reduce the cpu load with different
optimizations as ment in previous posts.My two different algorithms are as the following:

First Algorithm:

/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
void WaylandEgl::createRenderBuffer16()
{
    if (!buffCreated)
    {
        qDebug() << "Heiht" << mWinHeight << "Width" << mWinWidth;
        pbo_size = mWinHeight * mWinWidth * 2;
        nBytesPerLine = mWinWidth ;
        Readback_buf = (GLchar *) malloc( pbo_size );

        glInfo glInfo;
        glInfo.getInfo();
        glInfo.printSelf();

        glGenRenderbuffers( 1, &renderBuffer16 );
        glBindRenderbuffer( GL_RENDERBUFFER, renderBuffer16 );
        glRenderbufferStorage( GL_RENDERBUFFER, GL_RGB565, mWinWidth, mWinHeight );
        glBindRenderbuffer(GL_RENDERBUFFER, 0);

        if (glGetError()==GL_NO_ERROR)
        {
            qDebug() << "Render buff storage is OK" << glGetError();
        }
        else
        {
            qDebug() << "Render buff storage error is " << glGetError();
        }

        glGenFramebuffers( 1, &frameBuffer16 );
        glBindFramebuffer( GL_FRAMEBUFFER, frameBuffer16);
        glFramebufferRenderbuffer( GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_RENDERBUFFER, renderBuffer16);

        if( glCheckFramebufferStatus(GL_FRAMEBUFFER) != GL_FRAMEBUFFER_COMPLETE)
        {
            qDebug() << "Framebuffer error is " << glGetError();
        }
        else
        {
            qDebug() << "Framebuffer is OK" << glGetError();
        }
        buffCreated = true;

        GLint format = 0, type = 0;
        glGetIntegerv(GL_IMPLEMENTATION_COLOR_READ_FORMAT, &format);
        glGetIntegerv(GL_IMPLEMENTATION_COLOR_READ_TYPE, &type);

        qDebug() << "Format" << format;
        qDebug() << "Type" << type;

        int rowL;

        glGetIntegerv(GL_PACK_ROW_LENGTH, &rowL);
        qDebug() << "Rowl before" << rowL;

        glPixelStorei( GL_PACK_ALIGNMENT, 1 );
        glPixelStorei( GL_UNPACK_ALIGNMENT, 1 );
        glPixelStorei(GL_PACK_ROW_LENGTH,nBytesPerLine);
        qDebug() << "Pixel st" << glGetError();
        glGetIntegerv(GL_PACK_ROW_LENGTH, &rowL);
        qDebug() << "Rowl after" << rowL;

        glGetBufferSubData = (PFNGLGETBUFFERSUBDATAPROC)eglGetProcAddress("glGetBufferSubData");
        if (!glGetBufferSubData)
        {
            qDebug() << "glGetBufferSubData not fouynded!";
            return;
        }

        glBindFramebuffer(GL_FRAMEBUFFER, 0);

        glGenBuffers(PBO_COUNT,pboIds);
        glBindBuffer(GL_PIXEL_PACK_BUFFER,pboIds[0]);
        glBufferData(GL_PIXEL_PACK_BUFFER, pbo_size, 0, GL_STREAM_READ);
        glBindBuffer(GL_PIXEL_PACK_BUFFER,pboIds[1]);
        glBufferData(GL_PIXEL_PACK_BUFFER, pbo_size, 0, GL_STREAM_READ);
        glBindBuffer(GL_PIXEL_PACK_BUFFER, 0);
    }
}

void WaylandEgl::performRenderBuffer16()
{
    Timer t1;
    createRenderBuffer16();

    glFinish();
    t1.start();

    glBindFramebuffer(GL_READ_FRAMEBUFFER,mwindow->openglContext()->defaultFramebufferObject());
    glBindFramebuffer(GL_DRAW_FRAMEBUFFER,frameBuffer16);
    glBlitFramebuffer(0, 0, mWinWidth, mWinHeight, 0, 0, mWinWidth, mWinHeight, GL_COLOR_BUFFER_BIT, GL_LINEAR);

    t1.stop();
    blitTime = t1.getElapsedTimeInMilliSec();

    t1.start();
    glBindBuffer(GL_PIXEL_PACK_BUFFER, pboIds[0]);
    glBindFramebuffer( GL_FRAMEBUFFER, frameBuffer16);

    //glReadBuffer(frameBuffer16); // frameBuffer16 also works
    //glBindBuffer(GL_PIXEL_PACK_BUFFER, pboIds[0]);

    glReadPixels( 0, 0, mWinWidth, mWinHeight, GL_RGB, GL_UNSIGNED_SHORT_5_6_5, 0);
    //glBindBuffer(GL_PIXEL_PACK_BUFFER, pboIds[0]);

    GLubyte *ptr = (GLubyte*)glMapBufferRange(GL_PIXEL_PACK_BUFFER, 0, pbo_size, GL_MAP_READ_BIT);
    if (ptr)
    {
        memcpy(Readback_buf, ptr, pbo_size);

        glUnmapBuffer(GL_PIXEL_PACK_BUFFER);
    }
    else
    {
        qDebug() << "NULL bokk";
    }
    t1.stop();
    processTime = t1.getElapsedTimeInMilliSec();
    glBindBuffer(GL_PIXEL_PACK_BUFFER, 0);
    glBindFramebuffer(GL_FRAMEBUFFER, 0);
    //eglMakeCurrent(eglGetCurrentDisplay(), eglGetCurrentSurface(EGL_DRAW), eglGetCurrentSurface(EGL_READ), eglGetCurrentContext());
    //qDebug() << "Err"<< eglGetError();

    //t1.stop();
    //readTime = t1.getElapsedTimeInMilliSec();

    qDebug() << "Blit Time " << blitTime;
    qDebug() << "Read Time " << processTime;
}
/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

Second Algorithm:

/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
void WaylandEgl::initFastBuffers16()
{
    if (!buffCreated)
    {
        checkTypes();
        createRenderBuffer16();

        glGenBuffers( PBO_COUNT, pboIds );

        // Buffer #0: glReadPixels target
        GLenum target = GL_PIXEL_PACK_BUFFER;

        glBindBuffer( target, pboIds[0] );
        glBufferData( target, pbo_size, 0, GL_STATIC_COPY );


        glGetBufferParameterui64vNV = (PFNGLGETBUFFERPARAMETERUI64VNVPROC)eglGetProcAddress("glGetBufferParameterui64vNV");
        if (!glGetBufferParameterui64vNV)
        {
            qDebug() << "glGetBufferParameterui64vNV not fouynded!";
            return;
        }

        glMakeBufferResidentNV = (PFNGLMAKEBUFFERRESIDENTNVPROC)eglGetProcAddress("glMakeBufferResidentNV");
        if (!glMakeBufferResidentNV)
        {
            qDebug() << "glMakeBufferResidentNV not fouynded!";
            return;
        }

        glUnmapBufferARB = (PFNGLUNMAPBUFFERARBPROC)eglGetProcAddress("glUnmapBufferARB");
        if (!glUnmapBufferARB)
        {
            qDebug() << "glUnmapBufferARB not fouynded!";
            return;
        }

        glGetBufferSubData = (PFNGLGETBUFFERSUBDATAPROC)eglGetProcAddress("glGetBufferSubData");
        if (!glGetBufferSubData)
        {
            qDebug() << "glGetBufferSubData not fouynded!";
            return;
        }

        qDebug() << "Run the optimizatiosn16";


        GLuint64EXT addr;
        glGetBufferParameterui64vNV( target, GL_BUFFER_GPU_ADDRESS_NV, &addr );
        glMakeBufferResidentNV( target, GL_READ_ONLY );

        // Buffer #1: glCopyBuffer target
        target = GL_COPY_WRITE_BUFFER;
        glBindBuffer( target, pboIds[1] );
        glBufferData( target, pbo_size, 0, GL_STREAM_READ );

        glMapBufferRange( target, 0, 1, GL_MAP_WRITE_BIT);
        glUnmapBufferARB( target );
        glGetBufferParameterui64vNV( target, GL_BUFFER_GPU_ADDRESS_NV, &addr );
        glMakeBufferResidentNV     ( target, GL_READ_ONLY );
        buffCreated = true;
    }
}

void WaylandEgl::doReadbackFAST16() // perfect on intel..
{
    initFastBuffers16();

    glFinish();
    Timer t1;
    t1.start();

    glBindFramebuffer(GL_READ_FRAMEBUFFER,mwindow->openglContext()->defaultFramebufferObject());
    glBindFramebuffer(GL_DRAW_FRAMEBUFFER,frameBuffer16);
    glBlitFramebuffer(0, 0, mWinWidth, mWinHeight, 0, 0, mWinWidth, mWinHeight, GL_COLOR_BUFFER_BIT, GL_LINEAR);

    // Do a depth readback to BUF OBJ #0
    glBindBuffer( GL_PIXEL_PACK_BUFFER, pboIds[0] );
    glBindFramebuffer(GL_FRAMEBUFFER,frameBuffer16);
    glReadPixels( 0, 0, mWinWidth, mWinHeight,
                  GL_RGB, GL_UNSIGNED_SHORT_5_6_5, 0 );
    t1.stop();
    readTime = t1.getElapsedTimeInMilliSec();

    t1.start();
    // Copy from BUF OBJ #0 to BUF OBJ #1
    glBindBuffer( GL_COPY_WRITE_BUFFER, pboIds[1] );
    glCopyBufferSubData( GL_PIXEL_PACK_BUFFER, GL_COPY_WRITE_BUFFER, 0, 0,
                         pbo_size );

    // Do the readback from BUF OBJ #1 to app CPU memory
    glGetBufferSubData( GL_COPY_WRITE_BUFFER, 0, pbo_size,
                        Readback_buf );

    //sendImage((unsigned char*)Readback_buf,pbo_size);
    t1.stop();
    processTime = t1.getElapsedTimeInMilliSec();
    glBindBuffer( GL_PIXEL_PACK_BUFFER, 0 );
    //qDebug() << "Read Time " << readTime;
    //qDebug() << "Process Time " << processTime;
}
/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

So I am thinking that using nvidia specific function calls are reducing the cpu load when I run the algorithm
on nvidia platform maybe I am wrong because I could not see some of the function in <GLES* includes.For this
reason I asked to add <GL/gl.h> and <GL/glext.h> includes to understand if it is logical when running the algorithm on
embedded platforms. Also Does <GL/gl.h> and <GL/glext.h> includes should only be used for Desktop systems or can be added
when running the algorithms on embedded devices?

Since I will transfer the color pixels to another device , the glReadPixels read pixel should be flipped because
it is reading the pixels in vice versa order which cause the screenshot as reversed.For this reason I have to
flip the pixels before sending it to another device and want to know if there is a ready function inside opengl libs.

Best Regards

For ES, you shouldn’t use any of those headers, only <GLES3/gl3*.h>.

ES extensions are distinct from desktop OpenGL extensions.

glReadPixels always returns data starting at the lower-left corner (you can’t specify a negative value for GL_PACK_ROW_LENGTH). You can use glBlitFramebuffer to flip an image by specifying Y1<Y0 for either the source or destination rectangle. That may or may not be faster than performing the flip in software. You could also read the data a row at a time with multiple glReadPixels calls, but I suspect that would be slow.

You could try rendering the image inverted in Y. Then the glReadPixels() should give you the pixel order you want.

Read up on a Y-inverted projection matrix and glFrontFace().

Hello,

As you suggested , I removed include <GL/gl.h> and #include <GL/glext.h> for ES and only using <GLES3* includes but for example some functions such as glGetTexImage does not exist inside <GLES* includes and this function is not bindless on my nvidia xavier platform since I performed a test with this function which is succesfull.So for such kinds of functions , Should I create a header file and put the declaration such as :

GLAPI void GLAPIENTRY glGetTexImage( GLenum target, GLint level,
GLenum format, GLenum type,
GLvoid *pixels );

or what option do I need to use ?

Also, I used glBlitFramebuffer which is successfull and less cpu consuming since I was already having a glBlitFramebuffer call in my algorithm.

Best Regards

Hello,

I will check Y-inverted projection matrix and glFrontFace().

Best Regards

It’s academic where the header prototypes come from. The issue is which API you are targeting: OpenGL or OpenGL ES. This determines:

  1. which physical library you link with: libGLESv2 or libGL,
  2. which API prototypes you need to match the APIs in that library (whether included or not), and
  3. which type of graphics context you create.

You just have to choose.

According to the NVIDIA Linux Jetson Developer’s Guide (Software Features : Graphics), the platform supports both OpenGL 4.6 and OpenGL ES 3.2. So you have a choice.

Flipping back and forth between them dynamically on one graphics context like there’s no difference might work on some vendors. But AFAIK, this is not required to work per spec. Your app could start crashing or misbehaving at any time. To avoid problems, you should choose one and stick with it.

If you’ve decided that you do need glGetTexImage(), then you should choose OpenGL, as this doesn’t exist in OpenGL ES. This then suggests you should be:

  • including the GL includes (e.g. <GL/gl.h> and <GL/glext.h>, not <GLES/*>),
  • linking with the GL library (e.g. -lGL, not -lGLESv2)
  • creating an OpenGL context (not an OpenGL ES context).
  • calling OpenGL APIs (not OpenGL-ES APIs).

Hello,

Thank you for good and clear description.Now Everything is really clear in my mind.
Above algorithms, I have observed that I have issue for glBlitFramebuffer on nvidia which tells me GL_INVALID_OPERATION for GL_COLOR_BUFFER_BIT.According to glBlitFramebuffer documentation :

GL_INVALID_OPERATION is generated if mask contains GL_COLOR_BUFFER_BIT and any of the following conditions hold:

The read buffer contains fixed-point or floating-point values and any draw buffer contains neither fixed-point nor floating-point values.

The read buffer contains unsigned integer values and any draw buffer does not contain unsigned integer values.

The read buffer contains signed integer values and any draw buffer does not contain signed integer values.

One of the above is my problem but could not find a fix for that. Do you have any idea how I can fix that issue since user interface is drawn by qt , I do not know how to debug/handle it.
Also I checked GL_SAMPLE_BUFFERS is 0 and framebuffer is complete, not an issue in my case.

Best Regards

You can probe this out and figure out what’s going on.

I would first verify that these cases are not coming into play:

GL_INVALID_OPERATION is generated if filter​ is GL_LINEAR and the read buffer contains integer data.

GL_INVALID_OPERATION is generated if the value of GL_SAMPLES​ for the read and draw buffers is not identical.

GL_INVALID_OPERATION is generated if GL_SAMPLE_BUFFERS​ for both read and draw buffers greater than zero and the dimensions of the source and destination rectangles is not identical.

  • What is GL_SAMPLE_BUFFERS for this framebuffer: mwindow->openglContext()->defaultFramebufferObject()
  • And how are you checking that?
    (We know that GL_SAMPLE_BUFFERS for frameBuffer16 should be 0.)
  • What is the color format of that framebuffer? Is it fixed-point, floating-point, or integer?
  • What happens if you blit with GL_NEAREST instead of GL_LINEAR?
  • What happens if you blit to an FBO with GL_RGBA8 format instead of GL_RGB565? Does that work?

Also, you can plug in a GL error callback to get more information from the NVIDIA GL driver on why it’s emitting this GL_INVALID_OPERATION error. See

Hello ,

Sorry for late answer. So The issue seems to me that it may be a bit qt related design or my wrong usage of some qt api, since previously I was manually calling the algorithm after swapBuffers function call.Now I am using the following function call to copy color pixels:
connect(m_quickWindow, &QQuickWindow::afterRendering, this, &FrameBufferHelper::performUICopy);

and it is working without an issue.

I will try to check your suggestions and debug hint from the link you provided for OpenGL Error checking.

Thanks and Best Regards

Resurrecting this thread as I had some troubles getting glReadPixels to be async on an nvidia jetson nano, though it was working on other platforms.

It turns out that it worked when I started using 2 PBOs instead of a single one (as explained here this article).

I don’t know why it seems to work on more powerful platforms with a single PBO, and not on a less powerful one, I’m not even sure it’s related to ‘powerfullness’… (I will check again if it does not work anymore with a single PBO).

Anyway, my experience can prove useful to others.