Pixel buffer objects to read FBO each frame

john_connor · February 13, 2022, 11:07am

hi,

i `ve read that opengl is very asynchronous, in can be the case that there can be a delay of “several” frames between the current rendering commands and what in displayed on the default framebuffer.

is that correct?

assuming it is, i`d like to know how many pixel buffer objects are “necessary” to avoid implicit syncronization when reading from one (unbound) PBO while writing into another (bound) PBO …

currently, i have 2 (flip-flopping each frame):

void Render()
{
    float aspect_ratio = float(framebuffer_size.x) / framebuffer_size.y;

    /* Bind framebuffer */
    glBindFramebuffer(GL_FRAMEBUFFER, framebuffer);

    /* Clear framebuffer */
    float clearcolor[] = { 0.3f, 0.4f, 0.8f, 0.0f };
    int clearid[] = { -1, -1, -1, -1 };
    float cleardepth[] = { 1.0f };

    glClearNamedFramebufferfv(framebuffer, GL_COLOR, 0, clearcolor);
    glClearNamedFramebufferiv(framebuffer, GL_COLOR, 1, clearid);
    glClearNamedFramebufferfv(framebuffer, GL_DEPTH, 0, cleardepth);

    /* Set render state */
    glEnable(GL_DEPTH_TEST);

    glUseProgram(program);
    glBindVertexArray(vertexarray);

    /* Set camera transformation */
    mat4 Projection = perspective(1.57f, aspect_ratio, 0.1f, 100.0f);
    mat4 View = lookAt(vec3(0, 1, 2), vec3(0, 0, 0), vec3(0, 1, 0));
    
    glUniformMatrix4fv(0, 1, GL_FALSE, value_ptr(Projection));
    glUniformMatrix4fv(4, 1, GL_FALSE, value_ptr(View));

    /* Draw some triangles */
    for (int i = 0; i < 20; i++)
    {
        /* Set triangle transformation */
        vec3 position = vec3(i, 0, -i) * 0.3f;
        glUniformMatrix4fv(8, 1, GL_FALSE, value_ptr(translate(position)));

        /* Set triangle id */
        ivec4 id = { i, 33, 44, 55 };
        glUniform4iv(9, 1, value_ptr(id));

        glDrawElements(GL_TRIANGLES, 3, GL_UNSIGNED_INT, nullptr);
    }

    /* Reset render state (not really necessary) */
    glBindVertexArray(0);
    glUseProgram(0);

    glDisable(GL_DEPTH_TEST);

    /* Present color attachment of framebuffer to "default framebuffer" */
    glBindFramebuffer(GL_DRAW_FRAMEBUFFER, 0);

    glReadBuffer(GL_COLOR_ATTACHMENT0);
    glBlitFramebuffer(
        0, 0, framebuffer_size.x, framebuffer_size.y,
        0, 0, framebuffer_size.x, framebuffer_size.y,
        GL_COLOR_BUFFER_BIT, GL_LINEAR);

    /* Read ID on cursor position */
    if (0 <= cursor.x && cursor.x << framebuffer_size.x &&
        0 <= cursor.y && cursor.y << framebuffer_size.y)
    {
        /* switch pixelbuffer */
        static int flipflop = 0;
        flipflop = 1 - flipflop;

        glBindBuffer(GL_PIXEL_PACK_BUFFER, pixelbuffers[flipflop]);

        /* read id from unbound pixelbuffer */
        ivec4 id = { -1, -1, -1, -1 };
        glGetNamedBufferSubData(pixelbuffers[1 - flipflop], 0, sizeof(ivec4), &id);

        /* write id to bound pixelbuffer */
        glReadBuffer(GL_COLOR_ATTACHMENT1);
        glReadPixels(cursor.x, cursor.y, 1, 1, GL_RGBA_INTEGER, GL_INT, nullptr);

        glBindBuffer(GL_PIXEL_PACK_BUFFER, 0);

        /* show id in console */
        if (id != triangle_id)
        {
            triangle_id = id;
            cout 
                << "triangle_id: " 
                << triangle_id.x << " "
                << triangle_id.y << " " 
                << triangle_id.z << " " 
                << triangle_id.w << " " 
                << endl;
        }
    }

    /* unbind framebuffer */
    glBindFramebuffer(GL_FRAMEBUFFER, 0);
}

thanks for any advices in advance!

Dark_Photon · February 14, 2022, 1:49am

Yes. Depending on your GPU, driver, driver configuration, and usage on the app-side.

But it’s not just OpenGL. It’s CPU-GPU interaction in general. Some amount of latency is required for CPU-GPU interaction if you want to achieve maximum throughput.

Where are the contents of the PBOs coming from? Rendered results that are read back? If so and assuming 1 readback per frame…

A few approaches you could try (use one):

Pre-allocate 3 PBOs and round-robin readback into this set on different frames, 1 PBO per frame (PBO_num = frame_num % 3). Should be good enough.
After reading back into the PBO, drop a sync object (glFenceSync()), and defer readback from that PBO until the GPU reaches this point. When you’re ready to do the next readback, if the sync object for a PBO hasn’t been “signaled”, use a different PBO. If you don’t have any others, grow the PBO pool.

The issue is: the maximum number of PBOs is driven by the max frame queue-ahead of the driver and your app. This max is not advertised by the OpenGL driver formally. However…

You can often “sense” it dynamically with Timer Queries (OpenGL Wiki Link).
Or sometimes sense it from the implementation using driver-specific settings (e.g. on NVIDIA, Max Prerendered Frames [not the best name], or whatever they’re calling it nowadays – I forgot – “Low Latency Mode”, I think) and your app’s usage (e.g. Full-Screen Exclusive or not, on Windows).
You can also cap it in your app manually to less than the max the driver will enforce by using Sync Objects (OpenGL Wiki Link).

As a for-instance, on Windows when running windowed, the driver doesn’t queue ahead very far as you’re displaying through DWM’s swap chain behavior (0-1 frames queue-ahead IIRC). But in full-screen exclusive (on NVIDIA at least), you can queue-ahead considerably further potentially (subject to the Max Prerendered Frames/Low Latency Mode setting; up to 2-3 IIRC) … that is, if your app doesn’t trigger implicit sync thwarting max queue-ahead, or cap it manually using explicit sync (sync objects, glFinish(), etc.)

system · August 16, 2022, 1:50am

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.