Point Occlusion with Stencil/Depth Buffer

In my application I draw the geometry with zbuffer and depth test and then find the visibility of some 3d points in the scene ( ~1K points). I can calculate the z-value of the points and compare it to the value in depth buffer or render the points with stencil test and see if they are rendered or not. The problem is reading the depth or the stencil buffer seems very slow. I use a Radeon 7200 it takes 175 msec to read a 640-480 window’s stencil buffer (unsigned_byte). I tried an NVidia GF2 MX200 it was down to ~20 msec. I could read individual pixels instead of the whole buffer, but then the function call overhead kills me again.

Does anybody have a suggestion on how to get point visibility information fast. I read about nvidia`s occlusion culling extension but noticed that you can get the results of one query at a time so it will still have the function call overhead.

Thanks for your help!

[This message has been edited by beko (edited 06-13-2002).]

Reading back pixels is always a bad thing to do. Try to avoid it as much as you can.
Also reading the stencil back as unsigned_byte may be slower since the stencil can be packed together with the z-buffer to 32-bit align it.