MultiDrawIndirect vs Instancing

Hi,

I have a rendering pipeline that uses MultiDrawIndirect and compute shader to cull against the view frustum.

Now I want to add Instancing but that blows the aproach as it interferes with my culling flag (instance count = 0)

However, since I can implement easiliy Instancing via MultidrawIndirect, why bother.

So, hardware compatibility aside, why would I want to use instancing, if I can use MultiDrawIndirect.

Best Regards

With real instancing, the whole draw command is considered a single invocation group. With multi-draw instancing, each separate draw operation is a separate invocation group.

That strongly suggests that hardware is doing something very different between these two circumstances. Something that permits it to share shader resources between separate instances, but not between separate draw commands. As such, it stands to reason that real instancing will be faster than fake instancing (all other things being equal, of course).

Now I want to add Instancing but that blows the aproach as it interferes with my culling flag (instance count = 0)

Then you need to re-think how your instance building works. When executing a compute shader for an instanced object, each CS invocation would choose to cull or not cull each instance. If that object needs to be rendered, then it should atomically increment the instance count; if it doesn’t, then it won’t.

Yes, this means that you’ll need a compute shader specially designed for dealing with instanced objects. And each CS dispatch could only deal with a single kind of instanced object. But that should be fine.

Thank you for your reply.

Deciding wheter to cull or not seems rather straight forward.

However the rendering if culled instances seems difficult. Assuming I want to cull the odd instances I would have to create new buffers holding only the instance specific data for the surviving instances, as the instance IDs are consecutive. (?)

That will probably require an additional scan and compression shader.

Or is there a feature that I am missing which would make this simpler?

Best Regards

That will probably require an additional scan and compression shader.

Or the culling shader could also build that data. Since, after all, it has to have at least some of the per-object data in order to do culling.

So just have it copy the per-object data into a buffer, using the index it got from the atomic increment.

Yes, if atomics are fast enough.

I have no experience using them, they sound evil and a scan sounds like superfast magic.

[QUOTE=Christoph;41055]Yes, if atomics are fast enough.

I have no experience using them, they sound evil and a scan sounds like superfast magic.[/QUOTE]

If you have no experience with something, why would you assume that executing a second dispatch operation would be faster than using them?

harmful sciolism and uneducated assumptions