Vertex buffer management with indirect drawing

noizex · January 14, 2019, 12:18pm

Hello,
I’m going into indirect land and this has not been the easiest journey. I’m almost there, but I found a small problem with vertex buffers. As we know, indirect pretty much requires us to not rebind buffers if it’s supposed to work. Right now I have interleaved buffers, which works okay but I had to do some weird decisions around meshes that require 2 more attribs (like skinned meshes) and I heard the trend is nowadays towards non-interleaved buffers. The pros of this I see woud be:
[ol]
[li]Better management of what is enabled for a given draw - so shadow / depth pass could use smaller amount of attribs (some unknown performance gain for not having to read data that would be discarded anyway?)[/li][li]Better way to upload data if only small subset of attributes is dynamic and others are static[/li][li]Easier upload of data from file (most of the formats seem to be non-interleaved, but this can be a non-factor if prepared offline)[/li][li]Easier debugging (some debuggers are not very clever when it comes to interleaved format possibilities, making it impossible to read buffer contents)[/li][/ol]

Now, all sounds fun and I could go this way but I just discovered that this may defeat my indirect drawing, because of how I have to specify offsets into buffers (remember - I bind them at the beginning and do not change, so starting offset is 0 and I abuse baseVertex to offset the index into vertex buffer to accomodate that there is a single big buffer having all the meshes rather than one buffer / buffer which is bound as a range to a specific offsets outside of draw call). It would work if all buffers are advanced at the same pace - so if I allocate vertex attrib in POSITION buffer, I do this for all other involved buffers, because if I don’t, buffers go out of sync and it’s impossible to calculate a single, common baseVertex to provide in indirect call (there it’s called “firstIndex”).

I tried to visualise this and that’s what I created:

As you can see we end up with non-uniform baseVertex if we mix several buffers that are not in sync.

My question is: [b]Is there any way to avoid this problem, other than
a) sticking to interleaved buffers, where problem exists but it’s single source of data which means I can provide single baseVertex offset
b) synchronizing buffers so they allocate incrementally and equally among all involved buffers - this pretty much couples all buffers in such “batch” together, so I can’t use position buffer for non-skinned meshes anymore - it has to have its own buffer

[/b]I see no other solutions that would not defeat the indirect draw, which I think is such a big gain that I’m probably rather going to give up on non-interleaved unless you tell me this is wrong for some reason I don’t see.

Has anyone thought about it? Seems like indirect is not yet so popular (and I can see why - implementing it is not exactly feasible if someone has already working engine and don’t want to make some sacrifices), so it’s not possible to find others with such problems.

GClements · January 14, 2019, 12:57pm

Interleaved versus non-interleaved doesn’t matter. Either way, all objects drawn using the same draw call must have the same attribute structure.

There’s no particular reason to keep the same attribute array state (i.e. VAO) for multiple draw calls. If you have different types of object with different sets of attributes, you may as well just switch VAOs between calls.

noizex · January 14, 2019, 1:09pm

Well, I am changing VAO between calls when it matters - it’s about not changing it in the middle of indirect call.

Alfonse_Reinheart · January 14, 2019, 8:59pm

To add to what GClements said, the basic idea is this: the number of VAO changes should not be proportional to how much stuff you draw. That is, it shouldn’t matter if you draw 1 character or 50,000; you should change VAOs the same number of times.

So you should break things down into a small number of well-defined sets of kinds of rendered stuff. Like UI stuff, character models (skinning), static models, dynamic models, particle systems, etc. Or whatever arrangement you want. Each kind of rendered stuff should have a single VAO, with all such models sharing the same arrays, vertex count, etc.

Oh, and FYI:

I abuse baseVertex to offset the index into vertex buffer to accomodate that there is a single big buffer having all the meshes rather than one buffer / buffer which is bound as a range to a specific offsets outside of draw call

That is not “abuse”; that is exactly why baseVertex exists.

GClements · January 15, 2019, 9:03am

I’m not entirely sure, but I think that what he’s doing is storing multiple sets of geometry with differing sets of (interleaved) attributes in a single buffer, and specifying a base vertex rather than changing the offsets.

So long as the base offset of each set of data is a multiple of the stride, this would work. But it isn’t a rational approach, IMHO. If you have different sets of attributes, either you’d still need to change the strides, or you’d need to leave space for the unused attributes. And the latter would work just as well with non-interleaved attributes.

GClements · January 15, 2019, 9:12am

You don’t need to change it in the middle of an indirect call. Just set the buffer offsets appropriately for each VAO rather than setting them to zero then offsetting the base vertex to compensate.

noizex · January 15, 2019, 12:45pm

Well that’s my problem and what I’m asking about - I even drawn an image of buffer representation showing where the problem is. Yes, I have one big buffer because with indirect draw I can’t change the buffers, right? But if I use non-interleaved buffers, which may not be uniformly populated (see the diagram) there comes the problem, and I was wondering how pros deal with it. I don’t know where you picked on these VAOs which have literally nothing to do with my question

But yeah what Alfonse said is exactly what I’m doing - I can’t rebind buffers in the middle of indirect batch, so I’m counting on baseVetex offset, and with non-interleaved it’s not exactly easy to use it as you may have buffers that are not on the same “vertex number” -> again ,see diagram.

I guess there is no way really around it - question was more about non-interleaved which leave you with N buffers instead of 1 buffer - and you have no guarantee the offsets will be continuous to use baseVertex. Bah.

GClements · January 15, 2019, 4:19pm

That image really doesn’t do anything to explain what your problem actually is.

There’s no difference between what you can do with interleaved attributes or non-interleaved attributes. The fact that non-interleaved attributes require a different stride for each attribute isn’t an issue because you can just provide a different stride for each attribute.

Dark_Photon · January 16, 2019, 5:55am

I’m going into indirect land … As we know, indirect pretty much requires us to not rebind buffers … Right now I have interleaved buffers, which works okay but I had to do some weird decisions around meshes that require 2 more attribs (like skinned meshes)

You can avoid this by just not batching skinned and unskinned meshes together. Then you can tightly pack your vertices regardless, avoiding (I think) the problem you’re talking about.

You don’t have to cram all possible batches into 1 Indirect draw call. If it feels like you’re trying to hammer a round peg into a square hole, it’s probably not a good idea. Having a relatively small number of indirect draw calls per frame is not a problem.

I heard the trend is nowadays towards non-interleaved buffers.

Where did you read/hear this?

In the absence of performance tests that indicate otherwise, you should assume that streaming 1 sequential block of data from GPU memory into the vertex shader is likely to be faster than streaming N sequential blocks of data (each from separate addresses), for the case where you are memory transfer bound.

…[for non-interleaved vertices accessed within a single draw call, is there any way to avoid] synchronizing buffers so they allocate incrementally and equally among all involved buffers - this pretty much couples all buffers in such “batch” together

You need to match the length across the vertex attributes, yes (assuming you’re not using vertex attribute divisors of course).

However, I’d stop thinking of this in terms of buffer objects. With non-interleaved (or interleaved) vertices, there’s nothing that says that separate sequential blocks of vertex data for each attribute need to be sourced from separate buffer objects. You could have a separate non-interleaved buffer object for each vertex attribute, or you could store your some/all of your non-interleaved vertex attribute blocks for attributes 0…N sequentially end-to-end within the same buffer object, or you could do the latter with 20-100 batches stored end-to-end all within the same buffer object. And similarly for interleaved attribute blocks. You can mix-and-match these anyway you want: for instance, have 5 vertex attributes sourced interleaved from buffer object A along with 1 or 2 vertex attributes (dynamically generated on the GPU let’s say) sourced from buffer object B which may or may not be interleaved. All of these are valid.

Generally though, prefer pulling from the minimum number of sequential blocks per batch in the absence of 1) a compelling reason to do otherwise (e.g. some attributes pull in GPU-generated vertex attribute data), or 2) perf tests on all the GPUs you care about indicating that you don’t have to care.