Optimization advice


I’m currently working on updating/optimizing an old simulation. This simulation creates a random constellation of points in 2D screen coordinates and then projects the points back into 3D using the depth buffer as the Z dimension. Then the 3D dots are rendered in left/right stereo thus making the scene invisible monocularly but visible stereoscopically. Here is the basic outline of the draw loop:

Render 3d Scene

for numPoints
    generate random screen coordinate
    use glReadPixels to determine value of depth buffer at that coordinate
    use gluUnProject to determine 3d position of point based on screen coordinate and depth value


Render 3D points in stereo

The program works fine, but the performance drops when we render a lot of points (over 1000)

I’m thinking that glReadPixels is the bottleneck. I will try to call it once and read in the entire depth buffer, but I’m not sure how much that will help.

I don’t have much experience with vertex/fragment programs and was wondering if it would be possible to utilize them to gain performance. From doing a little research it seems as though reading from the depth buffer is not possible within a shader program. Is this correct? Are there any other opitmizations that can be done?

Thanks in advance

The main bottleneck imho is indeed the readpixels call. Because you call it for each pixel there is just to much overhead to the readpixel call.

Here is an optimization:

The gluUnproject call is basically nothing more than a matrix mutiplication of a 4x4 matrix with the vector (screenx, screeny, depth,1.0). So you can precalculate the values of this 4x4 matrix using the viewportparameters and the projection matrix (maybe also the modelview matrix, but i’m assuming you want to retrieve the eye-coordinates and not the world coordinates). Don’t forget to update this matrix as the projection matrix and/or the viewport changes.

  1. construct a 32x32 floating point texture (1024 points) containig the 1024 random screen coordinates. The rgba values contain (x,y,0,1);

  2. Render depth to a texture.

  3. Activate a floating point pbuffer

  4. Render a 32x32 quad textured with the random coordinate texture.

    fragment program with unproject matrix as a parameter:

    perform texture lookup in random coordinate texture -> assign to temp.
    use temp as a dependent texture lookup into the depth texture-> assign to the z value of temp.

    transform temp with unproject matrix

    move temp to output color

  5. use one 32x32 floating point readpixels call to obtain the 3d vertex values

  6. deactivate pbuffer

  7. render vertices using the stereo modelview matrix

Optionally, step 5 can be performed on the gpu by using pixelbuffer objects(PBO) to copy the resulting values into a vertex array.

For even more speedup, when rendering in standard stereo vision, there is no need to calculate and apply the unprojection matrix completely, but you only need to find the relation between the depth value and the amount of screen x-offset.