It is possible one pass culling?

Hi, All!

Now I’m trying to optimaze my 3d engine.

The fisrt step have been already done: frustum culling via transform feedback (as in this example: http://rastergrid.com/blog/2010/02/instance-culling-using-geometry-shaders/). It riced up perfomance as well.

The another one huge part of rendering is three sampled ‘cascade shadow mapping’. I tried to perform the same technique, but three culling queries decreace performance. So this way looks useless.

So, now I’m investigating for solutions.
Right now my idea is - one pass culling process for all cullings such:

  • culling for camera view.
  • culling for each sample of CSM.
  • culling for every light (spot and point).

My question is: Can I have several outputs from the vertex shader?
I mean - I want to define by position and boundingbox this istance is visible for ‘camera’, ‘CSM view’, ‘point light’ etc, and modify needed buffer (one of several) putting instanceID to it.
Or something like that.

It looks as impossible using opengl, but may be not?

Thanks for any consultation! :slight_smile:

Frustum culling through transform feedback is typically done via a Geometry Shader, not a vertex shader. A GS can have multiple output streams; these are sometimes used for doing LOD-ing (that is, different streams represent different mesh resolutions; the GS only outputs to a specific resolution). However, a GS has a limit on the number of output streams, and it’s not a large limit. No hardware provides more than four streams.

If you need a larger number of outputs, and you find that multiple culling passes is inefficient (you can do multiple culling tests in each pass; just no more than 4 at once), you could switch to Compute Shaders, writing their data to SSBOs using atomic counters and such. Compute Shaders don’t guarantee an order the way feedback does, but I don’t believe order is particularly important to your use case.

Also, are you sure that you need to do frustum culling “for every light (spot and point)?” Culling for shadow maps is necessary because you’re going to render the shadow map from the perspective of the light. But for lights that don’t cast shadows, you don’t need to cull based on them.

Yes. I did it in this way. VS detects if is instance should be culled and send flag to GS, GS accepts or rejects that instance using recieved flag.

As I know GS is basically slow process and devs should avoid use it if it possible.

Sounds interesting, thank you. Will investigate this way.

It is optional point. Not all lights should cast shadows, but several of them should do that. So, engine should support casting shadows for point lights.

Also after some measurements I had known fact of transform feedback with query is slow working. As I know it because of query, because it sends data back to CPU.
Can you answer the next question: Exists the way avoid to use query after ‘transform feedback’? May be some way to leave ‘occluded instances number’ on the GPU, and use it directly from it later?

[QUOTE=nimelord;1293282]
Also after some mesures I guess transform feedback with query is slow working. As I kew it because of query, because it sends data back to CPU.
Can you answer the next question: Exists the way avoid to use query after ‘transform feedback’? May be some way to leave ‘occluded instances number’ on the GPU, and use it directly from it later?[/QUOTE]
Retrieving the value from a query object isn’t inherently slow, but it may cause synchronisation if you attempt to read the value while the query is still active. You can use glGetQueryObject(GL_QUERY_RESULT_AVAILABLE) to determine if the value can be retrieved immediately (without synchronisation). Also, with 4.4 and later, if a buffer is bound to GL_QUERY_BUFFER, query results are stored in that buffer rather than in client memory. This allows the glGetQueryObject() call to be enqueued in the command stream rather than requiring synchronisation.

[QUOTE=nimelord;1293282]Also after some measurements I had known fact of transform feedback with query is slow working. As I know it because of query, because it sends data back to CPU.
Can you answer the next question: Exists the way avoid to use query after ‘transform feedback’? May be some way to leave ‘occluded instances number’ on the GPU, and use it directly from it later?[/QUOTE]

It all depends on how you intend to use the query results.

If you’re doing transform feedback to generate data structures appropriate to indirect rendering, then you don’t really need the query results per-se. What is sometimes done is you create a buffer object that contains more indirect rendering structures than you would ever need. You clear the buffer to zeros before doing the feedback operation. That operation fills in some count of them, but you don’t care because you’re going to call glMultiDrawIndirect with all of them. The ones that didn’t get filled in don’t actually rendering.

It’s not a perfect solution. A more perfect solution would be to employ the ARB_indirect_parameters extension. That extension allows you to source the number of draws from a buffer object. The idea here is that you use an asynchronous query buffer to get the feedback stream count into a location of a buffer object. You then use that buffer data as the parameter count with ARB_indirect_parameters. You still need to specify the maximum number of objects in the indirect buffer as above, but the idea is that if the feedback operation has completed well in advance of the query operation, then it can use the queried count and avoid having to read a bunch of zero data structures.

But this assumes you’re doing indirect rendering. If you’re not, then there’s really no way to avoid some level of GPU/CPU sync and communication. You can use feedback query objects to avoid a stall (so that the CPU can keep doing other things between the start of the feedback operation and the start of the indirect call).