Getting on-screen size of millions of objects fast


Suppose to have millions of objects to be sorted by size on screen (in pixels), what would be the best way to estimate size on screen? We are using perspective projection (in orthographic projection there is no need for this). We are already doing frustum and small object culling.

Currently, we are computing the bounding rect of gluProject bbox vertices and it takes too long.

Is there any better approach on GPU? What else can be done on CPU?



I’m not sure what you mean. Do you wish to calculate how many pixels are used by those objects?

I need to determine what is big (and therefore important to be drawn) and what is not.

One very simple and cheap approach is to track a bounding sphere for your objects and project it. This can be used not only for estimating size (e.g. LOD computation) but also for super-cheap on-GPU frustum culling.

Thank you!

What do you mean exactly with?

to track a bounding sphere for your objects and project it

project the sphere radius start and end point to get two XY points on screen?

When you say:

for super-cheap on-GPU frustum culling

Do you refer to culling using geometry shaders? Something like this: Instance culling using geometry shaders – RasterGrid?



Lots of ways to do this. For one, take poseye = ( r, 0, zeye, 1), where r is the sphere radius and zeye is the sphere center’s eye-space Z, multiply it by the perspective projection transform (parameterized), and do the perspective divide on the result to get to a formula for posndc. The screen-space size you seek is posndc.x.

For a symmetric perspective frustum, you get a very simple formula for the size of the sphere radius in NDC. Basically, a constant times ( xeye / zeye ), where the constant depends on the frustum parameters (r,l,n).

Something like that or compute shaders or any other technique that allows you to discard geometry.

If your geometry was spatially grouped into batches (a good idea for course-grain culling), you could instead perform most of this culling and LODing on the CPU in large chunks and potentially not even need to have GPU-side culling and LODing going on.

Hi @Dark_Photon,

I tried to implement the algorithm you suggested, but I’m not quite sure about it, is this what you meant?

// screen size
vec4 eye= view * vec4(, 1.0);
vec4 clipCoords_radius=proj * vec4(sphereRadius, 0.0, eye.z, 1.0);
sphereSize = clipCoords_radius.x/clipCoords_radius.w;

Does it make sense to use the result to make a rough frustum culling like this?

vec4 clipCoords_center = proj * eye;
float max=max(clipCoords_center.x, max(clipCoords_center.y, clipCoords_center.z)) / 
float min=min(clipCoords_center.x, min(clipCoords_center.y, clipCoords_center.z)) / 
if(max - sphereSize > 1.0)
else if(min + sphereSize < -1.0)

Thanks in advance for you help, your answer are always really helpful.

I’d also like to ask one more thing: I can’t use compute shaders in my application (for compatibility reasons) but I’m facing a problem like the one described by @devdept (hundreds of thousands of objects that cannot be easily grouped together, need to sort them by screen size). Currently I’m using the couple vertex-fragment to perform my calculations in parallel and retrieving the results as values from a bitmap. The performances are really promising but I know it’s a dirty hack so I’m asking you (and all the other experienced graphics programmers in the forum) to know if this approach makes any sense, what could be possibly go wrong and the alternatives that come to your mind.