glDrawElementsIndirect() with ELEMENT_ARRAY_BUFFER offset ??

As we all know, when ripping batches with standard glDrawElements() or glDrawElementsInstanced(), with a standard VBO bound with glBindBuffer( GL_ELEMENT_ARRAY_BUFFER ) containing the index list, you can provide a byte-offset into this VBO via the “const void *indices” argument to these draw calls. This allows you to place the index list for a batch anywhere within the bound buffer.

However, when using glDrawElementsIndirect() or glMultiDrawElementsIndirect(), there is no such argument to provide a byte-offset into the GL_ELEMENT_ARRAY_BUFFER. So, logically, one would think you should just account for the byte offset by binding it, not with ol’ glBindBuffer(), but with glBindBufferRange instead, providing the byte offset into the buffer to the “offset” parameter of this API call…

…except that glBindBufferRange does not accept GL_ELEMENT_ARRAY_BUFFER as a valid buffer target. (???)

How do you do this? Am I missing something?

That is, how do you provide a byte-offset into a buffer object to use for fetching the index list in a glDrawElementsIndirect() call?

…as an aside, it’s interesting to note that if you use NV bindless with glDrawElementsIndirect() or glMultiDrawElementsIndirect(), it doesn’t appear that you have any problem with byte offsets into the index buffer, as you’d provide the byte offset into the glBufferAddressRangeNV() call as usual before launching the draw call:

glBufferAddressRangeNV( GL_ELEMENT_ARRAY_ADDRESS_NV, 0, gpu_addr+offset, size );

The struct for glDrawElementsIndirect has a “firstIndex” field, which is… good enough. You just have to compute it; divide your byte offset by the size of the index type.

Thanks. I thought about that, but it seems like a bit of a hack. You end up requiring that the byte offset into the buffer for the start of every index list be a multiple of the size of an index (uint, ushort, etc.). Now that’s a good idea anyway (and happens to be true now), but when you’re shoveling a lot more than index lists into a single VBO, that isn’t inherently guaranteed without alignment forcing.

Another (more important) con for that tech approach is that the firstIndex field (and the DrawElementsIndirectCommand data in general) is written on the GPU, which may not otherwise know/care where in the bound ELEMENT_ARRAY_BUFFER the Draw Indirect batch is supposed to be reading indices from (and in-fact, that index buffer offset may not yet have even been determined at that point if it’s the offset it will be streamed into a buffer object later – which is the case in this instance).

So seems like you end up having to pre-stream the index list into some known offset beforehand so you can feed that into the GPU program that serializes the DrawElementsIndirectCommand buffer, so the DrawIndirect can access the right offset. …and if you’re going to do that, it’d just be simpler to just force the index list to be in a buffer all by itself so that a 0 firstIndex offset can be serialized without the offset serialization headache. That precludes VBO streaming of the index list, or at least makes your life needlessly hard if you’re determined to do it.

This doesn’t really help, but it should offer some hope for the generality of what you’re asking for. D3D’s function for setting the index buffer has a built-in offset, which works as you would expect with D3D11’s indirect rendering system. So odds are good that hardware can already handle this. I would expect the GLSL equivalent to simply provide a glElementArrayOffset, which is part of VAO state.

That’s interesting. Thanks for the info. Using the glBindBufferRange offset approach might be cleaner though, as it wouldn’t add any new APIs. In practice it might be stored the same though (GL_ELEMENT_ARRAY_BUFFER_OFFSET alongside GL_ELEMENT_ARRAY_BUFFER_BINDING).

The concern with using glBindBufferRange is that it binds to an indexed target. Making GL_ELEMENT_ARRAY_BUFFER an indexed target with a fixed index limit of 1 could be confusing to users. Especially in conjunction with format/buffer separation from vertex_attrib_binding, people might think that the different targets somehow map to different attributes. Thus, they might confuse it with the mythical “draw with multiple index buffers” functionality, when it’s really just about providing a base offset for element buffers.

Then again, if we ever do get that mythical functionality, it probably would use glBindBufferRange. So it would make sense from that perspective.

From a more practical perspective, indexed targets are structurally different from non-indexed targets. Take GL_UNIFORM_BUFFER. Binding to GL_UNIFORM_BUFFER with glBindBuffer does not affect the indexed targets at all. The only way to change an indexed uniform buffer target is to use glBindBufferRange/Base. The concern is how you specify the behavior.

For example, consider the code that works today:

//Do vertex stuff

A naive programmer might think that simply changing the first bind is all they need. After all, there’s only really one indexed target, and they’re just specifying an offset.

//Do vertex stuff
glBindBufferRange(GL_ELEMENT_ARRAY_BUFFER, 0, eBuf, offset, ...);

This looks like it works now, but it doesn’t. And then, it calls forth a new question: what happens if GL_ELEMENT_ARRAY_BUFFER and it’s indexed target are different buffers?

//Do vertex stuff
glBindBufferRange(GL_ELEMENT_ARRAY_BUFFER, 0, eBuf1, offset, ...);

eBuf1 is bound to the indexed element array, and eBuf2 is bound to the non-indexed one. If the non-indexed takes priority, then ever glBindBufferRange(GL_ELEMENT_ARRAY_BUFFER) call has to look like this:

//Do vertex stuff
glBindBufferRange(GL_ELEMENT_ARRAY_BUFFER, 0, eBuf1, offset, ...);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, 0); //Clear the non-indexed target.

So obviously, we want the indexed target to take priority. And users will have to clean up the indexed target properly. It’s not a terrible burden and it certainly can work. It’s just very… different from how indexed targets normally work. Usually, the unindexed target has no functionality, so they don’t conflict.

That being said, looking at how the ARB solved the buffer binding with vertex_attrib_binding, I would guess that they’d probably just add a dedicated glBindElementBuffer call, which would set the GL_ELEMENT_ARRAY_BUFFER and its offset. Calling glBindBuffer(GL_ELEMENT_ARRAY_BUFFER) would clear the offset as well.

Ok. Well, I’ll leave it to the spec experts to decide what makes the most sense. I only suggested [var]glBindBufferRange[/var] because it’s the only bind call that lets you specify a starting offset. Though it also ropes in a “size” for the binding not just a “start”/offset (ala [var]TRANSFORM_FEEDBACK_BUFFER_{BINDING, START, SIZE}[/var]), as well as an “index” which can just be 0 for this non-indexed bind point.

Incidentally, this is method is exactly how NVidia handled the specification of GPU address bases and sizes for vertex attributes, element arrays, and other state bindings (see [var]glBufferAddressRangeNV()[/var] here). Vtx attribs specify 0…N value for “index” whereas bind points that aren’t indexed just specify 0 for the index parameter. Saves creating a new API just for non-indexed targets.

From a user perspective, making [var]glBindBuffer( GL_ELEMENT_ARRAY_BUFFER, handle )[/var] an alias for [var]glBindBufferRange(GL_ELEMENT_ARRAY_BUFFER, 0, handle, 0, sizeof(buffer))[/var] seems reasonable, but I’m no a driver engineer. So there wouldn’t be this duality of indexed and non-indexed state for a non-indexed bind point. That is, in your example:

glBindBufferRange( GL_ELEMENT_ARRAY_BUFFER, 0, handle, offset, size );
glDrawElements( ... );

This’d be just fine. They’d both update all of [var]GL_ELEMENT_ARRAY_BUFFER_{BINDING,START,SIZE}[/var].

I’d love to see this to, just tripped over the same issue today.

It would also be nice to see glBufferAddressRangeNV be able to to bind to all the buffer binding points rather than just vertex related ones.