Depth Buffer frustration revisited

GarlicGL · March 26, 2002, 6:48am

Hi all,

Here’s the problem. I’m rendering a scene (512x512) on my nVidia GF3 at about 750 FPS.
What I want is the range from the eye position to every pixel in the scene and would like to store that in a 512x512 array. So I used the following code (mostly from previous posts in this forum):

…
…
glReadPixels(0, 0, WindowX, WindowY, GL_DEPTH_COMPONENT, GL_FLOAT, winz);

glGetIntegerv(GL_VIEWPORT, viewport);
glGetDoublev(GL_MODELVIEW_MATRIX, modelMatrix);
glGetDoublev(GL_PROJECTION_MATRIX, projMatrix);
int count = 0;
for (int i = 0; i < WindowX; i++) {
for (int j = 0; j < WindowY; j++) {

     gluUnProject((GLdouble)i, (GLdouble)j, (GLdouble)winz[count], modelMatrix, 
	      projMatrix, viewport, &objx, &objy, &objz);

distance2eyepoint = (GLfloat)(sqrt((objx-eyex)(objx-eyex) +
(objy-eyey)(objy-eyey) +
(objz-eyez)*(objz-eyez)));
count++;
}
}

…
…
I must admit that I was shocked at the drop in FPS. I didn’t expect it to be this bad:
Just rendering: 750 FPS
Rendering with glReadPixels only: 115 FPS
all the above + the for loops: 4.4 FPS

There must be a way to extract true range information without killing performance, isn’t there? Can someone please point it out for me. How about extensions, would they help? It’s driving me nuts…

TIA,
GGL

Eric · March 26, 2002, 6:58am

I can’t say anything about the drop when using glReadPixels… Only the manufacturer of the gfx card and the guy who writes the drivers can do something in this field.

On the other hand, I am sure you can optimize your loop: you are calling gluUnProject 512x512 times !!! I am not sure about what the function does (I am going to look at the code in MESA) but surely there are things it does 512x512 times that could be done only once !!! Maybe expensive things such as matrix inversion…

Your best bet is to look at the gluUnProject source code (MESA) and adapt it to your needs (if a matrix invert is needed you just have to calculate it once !!!).

I’ll try to find some more info.

Regards.

Eric

GarlicGL · March 26, 2002, 9:06am

Thanks Eric.

The reason for the double loop is that gluUnProject takes in one z value at a time and I have 512x512 of them. I might use some form of interpolation since I know the near and far clipping planes and I know the clipped z value from glReadPixels. This is supposed to be a real-time code and anything below 30 or 40 Hz is not gonna cut it. I’ve gotta a lot of tinkering around to do. By the way, I installed the nVidia drivers on Linux to do hardware acceleration. Does that affect Mesa in any way (or vise versa)?

Thanks again,
GGL

davepermen · March 26, 2002, 9:55am

write your own gluUnProject…
code is out there, search for it… mesa, google… what ever

gumby · March 26, 2002, 10:53am

gluUnProject is inverting Projection*Modelview
every time you call it. You can definitely do this
faster by inverting it yourself only once.

GarlicGL · March 28, 2002, 6:00am

Hi again,

I implemented some of the advise given in the replies above. Unfortunately that did not change much of anything. Here’s what I did

…
glReadPixels(0, 0, WindowX, WindowY, GL_DEPTH_COMPONENT, GL_FLOAT, winz);

glGetIntegerv(GL_VIEWPORT, viewport);
glGetDoublev(GL_MODELVIEW_MATRIX, modelMatrix);
glGetDoublev(GL_PROJECTION_MATRIX, projMatrix);
fMatrix model(4), proj(4), PM(4), InvPM(4);
fVector b(4), obj(4);

count = 0;
for (i = 0; i < 4; i++) {
for (j = 0; j < 4; j++) {
model(j,i) = modelMatrix[count];
proj(j,i) = projMatrix[count];
count++;
}
}

//PM = model * proj;
PM = proj * model;

// Get the inverse of the product
InvPM = Inv(PM);

count = 0;
for (i = 0; i < WindowX; i++) {
for (j = 0; j < WindowY; j++) {
// Compute the vector to muliply with as described in the blue book
b(0) = 2.0f*(1.0i-viewport[0])/viewport[2] - 1.0f;
b(1) = 2.0f(1.0j-viewport[1])/viewport[3] - 1.0f;
b(2) = 2.0f(winz[count]) - 1.0f;
b(3) = 1.0f;
// Compute the world coordinates
obj =InvPM * b;

     // Get the distance to the eyepoint
     distance2eyepoint = sqrt((obj(0)-eyex)*(obj(0)-eyex) +
                            (obj(1)-eyey)*(obj(1)-eyey)    +
                            (obj(2)-eyez)*(obj(2)-eyez));
     count++;        
  }

}

I think the problem is the double loop. Here the inverse is outside the loop, computed only once per frame. The things that change in the double loop are the pixel locations (i and j) and winz. Unfortunately I cannot interpolate the range values from opne pixel to the next. Still I need the distance from every pixel to the eyepoint. How about extensions, do they allow you to do this in hardware?

Any ideas on how to improve this …?

Thanks for your suggestions and time.

Regards,
GGL

davepermen · March 28, 2002, 8:50am

i think generating the rays for every pixel can be done quite fast… at least, i do it for my rtrt on 320x240 without bigger problems…

my suggestion:
generate the points on the nearplane, generate the points on the far plane, and simply linear interpolate by the z-value (is it linearly? dunno try…)

generating the points on near and far is afaik simple: find the edge-points of the frustum and bilinear interpolate, too

its a trilinear interpolation per vertex you have to do between 8… try sse
if you know more about the scene you can probably precalculate lot of the stuff and the result is then a linear interpolation per pixel…

no exts for this

rlskinner · March 28, 2002, 10:19am

Here are some suggestions:

First, factor out the OpenGL performance and see if you can really compute what you want “in real time”. Make a test program that does width*height sqrt() calls and see how many “frames” per second you can do. Think about whether you can use distance squared instead.
Notice that your (i,j,z) -> b() transformation can be expressed as another matrix. Write out that matrix as W and combine it with InvPM: WInvPM. That will save you a few operations. Then expand the transformation of b to explicit equations. You can probably shave some operations off of what a general matrixvector transformation routine will cost.
If all you are interested in distance from the eye, don’t bother doing all this math in world coordinates, just use camera coordinates. Ignore the modelview matrix and just use the Projection matrix. The camera is at (0,0,0) in camera coordinates, so that simplifies your distance (or distance squared) computation.

rlskinner · March 28, 2002, 8:07pm

Reverse the order of your for loops for better memory coherency:

for (j = 0; j < WindowY; j++) {
for (i = 0; i < WindowX; i++) {

So that each winz[] you access is sizeof(GLfloat) bytes away from the last one, instead of WindowX*sizeof(GLfloat) bytes.

Given that you are now using i as the inner loop index, recognize that once you compute the first value for a row, b=M*(i,j,winz[count]), the next value can be computed incrementally

b = M*(i,j,winz[count])
b’ = M*(i+1,j,winz[count+1])
= M*(i,j,winz[count]) + M*(1,0,0) + M*(0,0,winz[count+1]-winz[count])

The second term is just the 1st column vector of M, the third term is the 3rd column vector of M, scaled. So you can compute b’ from b faster. Don’t forget to apply all the operations to b[3] too.

Since you are working in camera coordinates now, you only have to invert the projection matrix. A projection matrix has enough zero elements that you can work out the closed form solution which will have less error than a generic matrix inversion.
If sqrt() is a major factor and you don’t need a full 7 digits of precision, consider computing 2 or 3 iterations of a Newton-Raphson solution for it. You’ll still get 3-4 digits of precision. And I think I heard somewhere that 1/sqrt(x) converges much faster – compute that if you are just going to divide some other number by the distance.

Good Luck

[This message has been edited by rlskinner (edited 03-28-2002).]

Jambolo · March 28, 2002, 9:38pm

The 1 million matrix multiplies and 1/4 million square roots per frame aren’t helping your performance.

SnowKrash · March 29, 2002, 4:00pm

Wont (objy-eyey)(objy-eyey) and (objx-eyex)(objx-eyex) be the same each frame for each respective pixel? Stick the sum in a 512x512 table - update on window resize if you really must have that flexibility.

I’m assuming camera-space here - see a previous rlskinner post - so this will actually just be objyobjy + objxobjx.

Concentrate on the sqrt as the standard C library function is very costly against more optimised versions.

[This message has been edited by SnowKrash (edited 03-30-2002).]

GarlicGL · March 30, 2002, 1:37am

rlskinner: Thanks for the long, detailed suggestion. I’ll look into it.

SnowKrash: Don’t I wish! they eye position (look-at) is changing per frame.

davepermen: Thanks for the suggestion dude. I’ll also look into it. I need to solve this issue once and for all…

Thanks folks,
GGL

SnowKrash · March 30, 2002, 2:21am

What I want is the range from the eye position to every pixel in the scene

I feel it’s important to reiterate what rlskinner said regarding working in camera space rather than world space - maybe you’re missing something conceptually. In camera space the eye position is the origin and the eye direction is fixed (along the z axis).

Rather than invert Projection*Modelview, you could just invert Projection to get back to camera space. Your range calculations should be the same, you’ll just be in a different coordinate system to world space.

If you’re fully aware of this already and there’s a particular reason why you can’t do this with your application then sorry if I sound patronising. I’d would be interested to know why if that’s the case.

GarlicGL · March 30, 2002, 6:06am

SnowKrash,

You don’t sound patronising at all. I need to follow some of the suggestions and of course try to think about it from a different perspective. Maybe I am missing something??
Thanks and best regards,
GGL