Pixel buffer objects to read FBO each frame

hi,

i `ve read that opengl is very asynchronous, in can be the case that there can be a delay of “several” frames between the current rendering commands and what in displayed on the default framebuffer.

is that correct?

assuming it is, i`d like to know how many pixel buffer objects are “necessary” to avoid implicit syncronization when reading from one (unbound) PBO while writing into another (bound) PBO …

currently, i have 2 (flip-flopping each frame):

void Render()
{
    float aspect_ratio = float(framebuffer_size.x) / framebuffer_size.y;

    /* Bind framebuffer */
    glBindFramebuffer(GL_FRAMEBUFFER, framebuffer);

    /* Clear framebuffer */
    float clearcolor[] = { 0.3f, 0.4f, 0.8f, 0.0f };
    int clearid[] = { -1, -1, -1, -1 };
    float cleardepth[] = { 1.0f };

    glClearNamedFramebufferfv(framebuffer, GL_COLOR, 0, clearcolor);
    glClearNamedFramebufferiv(framebuffer, GL_COLOR, 1, clearid);
    glClearNamedFramebufferfv(framebuffer, GL_DEPTH, 0, cleardepth);

    /* Set render state */
    glEnable(GL_DEPTH_TEST);

    glUseProgram(program);
    glBindVertexArray(vertexarray);

    /* Set camera transformation */
    mat4 Projection = perspective(1.57f, aspect_ratio, 0.1f, 100.0f);
    mat4 View = lookAt(vec3(0, 1, 2), vec3(0, 0, 0), vec3(0, 1, 0));
    
    glUniformMatrix4fv(0, 1, GL_FALSE, value_ptr(Projection));
    glUniformMatrix4fv(4, 1, GL_FALSE, value_ptr(View));

    /* Draw some triangles */
    for (int i = 0; i < 20; i++)
    {
        /* Set triangle transformation */
        vec3 position = vec3(i, 0, -i) * 0.3f;
        glUniformMatrix4fv(8, 1, GL_FALSE, value_ptr(translate(position)));

        /* Set triangle id */
        ivec4 id = { i, 33, 44, 55 };
        glUniform4iv(9, 1, value_ptr(id));

        glDrawElements(GL_TRIANGLES, 3, GL_UNSIGNED_INT, nullptr);
    }

    /* Reset render state (not really necessary) */
    glBindVertexArray(0);
    glUseProgram(0);

    glDisable(GL_DEPTH_TEST);

    /* Present color attachment of framebuffer to "default framebuffer" */
    glBindFramebuffer(GL_DRAW_FRAMEBUFFER, 0);

    glReadBuffer(GL_COLOR_ATTACHMENT0);
    glBlitFramebuffer(
        0, 0, framebuffer_size.x, framebuffer_size.y,
        0, 0, framebuffer_size.x, framebuffer_size.y,
        GL_COLOR_BUFFER_BIT, GL_LINEAR);

    /* Read ID on cursor position */
    if (0 <= cursor.x && cursor.x << framebuffer_size.x &&
        0 <= cursor.y && cursor.y << framebuffer_size.y)
    {
        /* switch pixelbuffer */
        static int flipflop = 0;
        flipflop = 1 - flipflop;

        glBindBuffer(GL_PIXEL_PACK_BUFFER, pixelbuffers[flipflop]);

        /* read id from unbound pixelbuffer */
        ivec4 id = { -1, -1, -1, -1 };
        glGetNamedBufferSubData(pixelbuffers[1 - flipflop], 0, sizeof(ivec4), &id);

        /* write id to bound pixelbuffer */
        glReadBuffer(GL_COLOR_ATTACHMENT1);
        glReadPixels(cursor.x, cursor.y, 1, 1, GL_RGBA_INTEGER, GL_INT, nullptr);

        glBindBuffer(GL_PIXEL_PACK_BUFFER, 0);

        /* show id in console */
        if (id != triangle_id)
        {
            triangle_id = id;
            cout 
                << "triangle_id: " 
                << triangle_id.x << " "
                << triangle_id.y << " " 
                << triangle_id.z << " " 
                << triangle_id.w << " " 
                << endl;
        }
    }

    /* unbind framebuffer */
    glBindFramebuffer(GL_FRAMEBUFFER, 0);
}

thanks for any advices in advance!

Yes. Depending on your GPU, driver, driver configuration, and usage on the app-side.

But it’s not just OpenGL. It’s CPU-GPU interaction in general. Some amount of latency is required for CPU-GPU interaction if you want to achieve maximum throughput.

Where are the contents of the PBOs coming from? Rendered results that are read back? If so and assuming 1 readback per frame…

A few approaches you could try (use one):

  1. Pre-allocate 3 PBOs and round-robin readback into this set on different frames, 1 PBO per frame (PBO_num = frame_num % 3). Should be good enough.
  2. After reading back into the PBO, drop a sync object (glFenceSync()), and defer readback from that PBO until the GPU reaches this point. When you’re ready to do the next readback, if the sync object for a PBO hasn’t been “signaled”, use a different PBO. If you don’t have any others, grow the PBO pool.

The issue is: the maximum number of PBOs is driven by the max frame queue-ahead of the driver and your app. This max is not advertised by the OpenGL driver formally. However…

  • You can often “sense” it dynamically with Timer Queries (OpenGL Wiki Link).
  • Or sometimes sense it from the implementation using driver-specific settings (e.g. on NVIDIA, Max Prerendered Frames [not the best name], or whatever they’re calling it nowadays – I forgot – “Low Latency Mode”, I think) and your app’s usage (e.g. Full-Screen Exclusive or not, on Windows).
  • You can also cap it in your app manually to less than the max the driver will enforce by using Sync Objects (OpenGL Wiki Link).

As a for-instance, on Windows when running windowed, the driver doesn’t queue ahead very far as you’re displaying through DWM’s swap chain behavior (0-1 frames queue-ahead IIRC). But in full-screen exclusive (on NVIDIA at least), you can queue-ahead considerably further potentially (subject to the Max Prerendered Frames/Low Latency Mode setting; up to 2-3 IIRC) … that is, if your app doesn’t trigger implicit sync thwarting max queue-ahead, or cap it manually using explicit sync (sync objects, glFinish(), etc.)

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.