Setting a vertex base index in a glDrawElements call

evanGLizr · July 26, 2004, 10:29pm

Originally posted by V-man:
[b]You forgot to define what happens when base + index overflows.

Keep in mind the user can use ubyte(8), ushort(16), uint(32) for the indices.

OPTION 1 : let it wrap back to 0
OPTION 2 : flag GL_OVERFLOW
OPTION 3 : promote to higher precision[/b]
Rather than what happens when base+index overflows, what is missing is what happens when the rebased index is outside the range, in which case I assume it just does what the current spec does for glDrawRangeElements and indices out of range, which is nothing (i.e. implementation dependent):

It is an error for indices to lie outside the range [start, end], but implementations may not check for this situation. Such indices cause implementation-dependent behavior.

( glDrawRangeElements man page )

Doing any other thing would be bad performance-wise (returning GL_OVERFLOW would force the CPU to verify the array of indices) or compatibility-wise (may not be compatible with what the current graphics cards support so might again force the CPU to verify the indices).

Re base+index overflow, the size of the indices does not affect how the base works: a straight-forward implementation of this array base is to rebase the memory pointer of the vertexarray, which will be a memory address. Using ushort or uint as indices is independent of the memory address the array is mapped to and shouldn’t cause any kind of overflow.
There’s no “array addr” + “base” + “index” sum going on, what happens is that “array addr” is rebased by “base” at vertexarray mapping time, and then, when indices are specified, they are added to the new array address as usual (which just happens to be rebased).

Maybe it would be nice to specify how setting the base index affects the range of DrawRangeElements (is the range an absolute value or is it relative to the current array base?).

knackered · July 27, 2004, 2:45am

Does anyone have any info on how this is handled in a d3d driver? Surely all these issues were addressed there…

idr · July 27, 2004, 9:01am

Those are both good issues. I’ll add them to the list. Here are my thoughs.

[ul][li] The overflow case of (base+i) can’t really happen due to the way the extension is defined. It is defined by modifying the behavior of ArrayElement, which takes an int as a parameter. The value of (base+i) would have to overflow a signed int. In that particular case, some words should be added to paragraph 5 on page 25 saying that the behavior in that case is undefined.[*] Perhaps some words should be added to the DrawRangeElements paragraph on page 27 stating that start and end are also offset by base. This seems like the most logical behavior.[/ul][/li]I’ll update the extension spec with these two issues. I’ll leave them marked as unresolved until we decide.

system · July 27, 2004, 2:45pm

It is an error for indices to lie outside the range [start, end], but implementations may not check for this situation. Such indices cause implementation-dependent behavior.

That paragraph does not address the issue of overflow. It seems to reference min max range.

Since with this extensions, one number is added to another, the issue of overflow is raised, and overflowing CAN happen with ubyte and ushort indices.

Would it not be a problem with Geforces that prefer ushort?

idr · July 27, 2004, 3:03pm

It depends on the hardware. For example, the R200 has a register to hold a base index value. If you set the base index and call DrawElements, it wouldn’t have to modify any of the element data. It would just modify the value of that one register. In that case, there is no overflow since the chip had better be able to internal add the index and the base.

In any case, all of the DrawElements type functions are defined in terms of calling ArrayElement. This extension changes the behavior of those functions implicitly by changing the behavior of ArrayElement. Therefore, the driver just has to make it work.

For DrawRangeElements, it should be easy enough to tell if you hit this case. If the hardware doesn’t work like that, it may hit a slightly suboptimal path, but it shouldn’t be too bad. If the hardware has to modify the index values on upload, it can look at end+base and decide if it needs to use a large datatype. It would be more painful for DrawElements type calls, however.

idr · July 27, 2004, 3:12pm

I just thought of this after hitting ‘Add Reply’. If the hardware does support a specific base register, it may be faster for the caller to use this extension and make multiple draw calls. For example, if a mesh has 150,000 points, but the model can be broken into groups where all the points indicies line within 64k of each other. The app could divide up the mesh so the ushort indicies could be used and make multiple DrawRangeElements calls separated by ArrayElementBase calls.

l_belev · July 27, 2004, 4:43pm

I don’t see why a new extension is necessary for this matter. Such functionality is already available in the OpenGL (i.e. call glVertexPointer/etc with new pointer parameter).
The implementation could check if the other parameters of these functions are unchanged (i.e. optimize the case when the caller wants just to set a new base index).

If you are concerned about the high-cost of these funcs, it is just the cost of the actual operation performed (i.e. there’s no cost associated with the functions names themselves, of course), so if the operation is optimized, there’s no need to worry about the cost.

If this operation can be faster with a new parameter to the glDraw* functions (or whatever other way to do this thing), I see no reason for the operation not to be as fast with the gl*Pointer functions.

If this is not optimized in some current implementations (probably nvidia or ati), this is their problem, but not a problem of the API.

On the other hand, if you provide a new means of setting base index, that would be a duplicate functionality - one time we have a base set via the gl*Pointer funcs and then we have a second base set some other way and neither of them have independent meaning by itself - only their sum is meaningful. This does not seem quite elegant to me.

Korval · July 27, 2004, 6:26pm

I_belev has a point. There is no specific reason why driver developers should make gl*Pointer calls heavy-weight under offset-like circumstances (unless they’re only uploading part of the data, in which case, offsetting will also be heavy-weight). The best way to get driver developers to optimize for these circumstances is to use them. They optimize for the path that is used, so if we use this path alot, they will optimize for it.

This does not seem quite elegant to me.
Actually, I consider the other method more inelegant. After all, what I want to do is offset each pointer by some amount. Rather than having to loop over each bound pointer and changing it (taking the chance that I miss one or bind another later), I just have a one-stop section to do the offsetting from.

To me, the big question is why gl*Pointer operations (in general, not even in an offset-like case) are so heavy-weight to begin with. Is there any specific reason why a change of VBO has to provoke a performance hit? After all, if the VBO(s) are all in their respective memory sections, why would it cause a performance problem? It’s no different from changing textures from one video-resident texture to another.

evanGLizr · July 27, 2004, 9:57pm

Originally posted by V-man:
[b] [quote] It is an error for indices to lie outside the range [start, end], but implementations may not check for this situation. Such indices cause implementation-dependent behavior.

That paragraph does not address the issue of overflow. It seems to reference min max range.

Since with this extensions, one number is added to another, the issue of overflow is raised, and overflowing CAN happen with ubyte and ushort indices.

Would it not be a problem with Geforces that prefer ushort?[/b][/QUOTE]My point - and what I unsuccessfully tried to explain before - is that the rebasing is not done by adding base+index every time you send an index, it’s done by setting the vertexarray address to “old address”+base. And because “old address” is a memory address, the size of the indices being used is irrelevant. There’s no need for a base register in the hardware.

idr · July 27, 2004, 10:07pm

l_belev,

The problem is that to optimize for all the various special-case usages of the gl*Pointer functions, the driver has to have checks and tests to determine when to activate the special-case fast paths. Those tests have cost. We can avoid the need for those tests by having a way to directly tell the driver “I want you to do this” instead of making it try and figure it out for itself.

Korval,

The glPointer calls are expensive because, particularly with VBOs or CVAs, they have to do a fair amount of data validation. The validation costs don’t always show up in the glPointer call, but instead show up in the next draw call. The cost isn’t just the data upload. Also, as you pointed out, making one API call is better than making many API calls.

l_belev · July 28, 2004, 4:10am

Originally posted by idr:
l_belev,

The problem is that to optimize for all the various special-case usages of the gl*Pointer functions, the driver has to have checks and tests to determine when to activate the special-case fast paths. Those tests have cost. We can avoid the need for those tests by having a way to directly tell the driver “I want you to do this” instead of making it try and figure it out for itself.
what various cases are you talking about? All that has to be done is the comparison of the other function parameters and the bind vertex buffer object with their current values from the OpenGL context. This is just a few compare/test instruction for the CPU, which is exactly <nothing> when compared to the actual work that has to be done.

Korval,

The glPointer calls are expensive because, particularly with VBOs or CVAs, they have to do a fair amount of data validation. The validation costs don’t always show up in the glPointer call, but instead show up in the next draw call. The cost isn’t just the data upload. Also, as you pointed out, making one API call is better than making many API calls.

Can you be more specific. What data validations are talking about, that can not be avoided when the implementation determines that the application just wants to set a new base index. Also what are these data uploads. The data is supposedly already in the right place (say, the video memory), so when the driver sees that the application just means to set a new base index, no more uploads/whatever should take place. Here I speak only about the VBO case. If one uses classic (userspace) vertex arrays, then the performance is bad anyway, so there’s no point of much efforts to optimaze this case. About the compiled vertex arrays - who cares about them anymore as we have VBOs.

knackered · July 28, 2004, 4:36am

You don’t think there’s a lot of driver mangling with this little lot? Not to mention tempting fate.

glbindbuffer(…)
glColorPointer(…)
glEnableClientState(…)
glClientActiveTexture(GL_TEXTURE0)
glTexCoordPointer(…)
glEnableClientState(…)
glClientActiveTexture(GL_TEXTURE1)
glTexCoordPointer(…)
glEnableClientState(…)
glClientActiveTexture(GL_TEXTURE2)
glTexCoordPointer(…)
glEnableClientState(…)
glClientActiveTexture(GL_TEXTURE3)
glTexCoordPointer(…)
glEnableClientState(…)
glAttribPointer(0,…)
glAttribPointer(1,…)
glVertexPointer(…)
glEnableClientState(…)

l_belev · July 28, 2004, 5:18am

Originally posted by knackered:
You don’t think there’s a lot of driver mangling with this little lot? Not to mention tempting fate.

glbindbuffer(…)
glColorPointer(…)
glEnableClientState(…)
glClientActiveTexture(GL_TEXTURE0)
glTexCoordPointer(…)
glEnableClientState(…)
glClientActiveTexture(GL_TEXTURE1)
glTexCoordPointer(…)
glEnableClientState(…)
glClientActiveTexture(GL_TEXTURE2)
glTexCoordPointer(…)
glEnableClientState(…)
glClientActiveTexture(GL_TEXTURE3)
glTexCoordPointer(…)
glEnableClientState(…)
glAttribPointer(0,…)
glAttribPointer(1,…)
glVertexPointer(…)
glEnableClientState(…)
OK, here you are right. This could be enough to justify such new extension if the people really often need to set a new index base and if this operation is relatively cheap.

skynet · July 28, 2004, 6:04am

I also want to strongly vote for a separate baseindex.
As knackered pointed out, using the various glXXXPointer methods is cumbersome and may be slow. It also relieves the codepart that sets the baseindex of knowing
a) what arrays are currently enabled or will get enabled before the next draw-call
b) what specific datalayout those arrays have (stride, offset, interleaved/nointerleaved etc.)
c) what VBO the array (to be offsetted) is currently bound to

Also, it is very important to NOT interpret the
proposed base as offset in terms of bytes-into-the-arrays, but to actually add the offset to the indices stored in the element array (again, think of differet strides in different arrays).

I also want to vote for negative baseindices to be possible. Restricting them to >=0 is rather arbitrary.

Third, I want to vote for naming the function “glBaseElement()” or “glElementOffset()” (or something similar). “ArrayElementBase” sounds a bit misleading, since its primary use is not in conjuction with glArrayElement() (where the programmer can already easily do the offsetting by himself), but rather glDrawElements() or glDrawRangeElements().

All problems with hitting unspecified memory adresses or disallowed indices can be just put to the programmer´s responsibility. The specs should just note that hitting vertices outside the specified range result in undefined results

Last, a question related to that topic.
What if I specify a negative buffer offset into a VBO in one of those glXXXPointer() calls, but ensure that the used indices actually hit valid adresses inside the VBO ? Is it allowed?

idr · July 28, 2004, 6:51am

I also want to vote for negative baseindices to be possible. Restricting them to >=0 is rather arbitrary.
It is somewhat arbitrary, but it does match hardware functionality. Again, think about hardware that has a base-index register. I don’t know for sure that all hardware will handle that smartly. I will add this to the issues list. If this extension ever got promoted to EXT or ARB, this is something that could be discussed again.

what various cases are you talking about? All that has to be done is the comparison of the other function parameters and the bind vertex buffer object with their current values from the OpenGL context.
Again, think about the case of hardware that has a base-index register. The gl*Pointer functions would have to detect that all pointers were advanced in such way, taking stride into consideration, to adjust the index by some fixed value. That sounds like more than a couple easy comparisons to me.

While I’m updating the spec, I’m going to mark issue #3 as resolved. I’m voting in favor of not having a per-array base because that defeats part of the optimization potential in the driver. It also defeats the single API call niceness.

I’m also going to resolve issue #6. It turns out that the wording on page 27 of the spec is such that we don’t need to make any changes. WRT to DrawRangeElements it says:

…with the additional constraint that all values in the array indices must lie between start and end inclusive.
Since it’s pretty clear that it’s only the actual values in the array that are constrained, I don’t think we need any special wording changes for ARRAY_ELEMENT_BASE_MESAX.

In any case, I posted version 0.3 of the spec.

l_belev · July 28, 2004, 7:53am

idr,
I agree with you, this extension would probably be a good thing.
But when considering the details about it, please don’t tune it for the currently available hardware possibilities. For example if allowing negative base index is considered a good thing in principle, let it be so. Look at the GL_ARB_texture_non_power_of_two - currently it has no hardware support at all, but that’s going to change in the future. I think that is the OpenGL style - to be far-sighted.
Note that here I’m not arguing that allowing negative base index is a good idea - I’m not convinced that’s the case. I just gave it as an example.

zeckensack · July 28, 2004, 9:19am

I’d like to propose a somewhat different approach. Let’s start with the name: glElementIndexBias. Technically I’d prefer glIndexBias, but that may be confusing with respect to color index functionality.

First of all, I don’t think the new functionality is useful at all to users of glDrawArrays (or glArrayElement). The base index is easily “emulated” while calling these entry points, so there’s no need to add all that nasty interaction with proper vertex array state.

I rather think that what we have here is a tool for indexed geometry. That’s why I’d like to see the functionality restricted to Draw{|Range}Elements. The behaviour change to ArrayElement would need to be removed then, and instead there would only be an “inline” effect to array indices sourced from an array.

I.e. I propose this change (against the 1.5 spec):

DrawElements (mode, count, type, indices);

is the same as the effect of the command sequence
if (mode, count, or type is invalid ) generate appropriate error
else {
  int i;
  Begin(mode);
  for (i=0; i < count ; i++)
    ArrayElement(indices[ i] +bias); //<<
End();
}

… and reverting the ArrayElement spec back to its original form.

With this restricted functionality, I’d prefer “bias” to “base” or “offset”, because I think the latter two are potential sources of confusion. They are easily mistaken for offsets to the indices pointer. “bias”, I think, is a term that’s widely understood to be an in-line modifier.

l_belev · July 28, 2004, 10:40am

First of all, I don’t think the new functionality is useful at all to users of glDrawArrays (or glArrayElement). The base index is easily “emulated” while calling these entry points, so there’s no need to add all that nasty interaction with proper vertex array state.

The fact that index offset would not be too usefull for glArrayElement is not enough reason to cut if off. In fact glArrayElement itself is rarely usefull in practice, so this whole issue isn’t of much importance, but what’s more important here is to preserve the specification clarity and consistence. I think the original variant is better - it does not introduce unnecessary discrimination between glArrayElement and glDrawElements. Otherwise one would have to remember one more unnecessary rule - when the index base is applied and when not. For the glDrawArrays the base index isnt applied anyway, so it is out of question.

Obli · July 28, 2004, 10:53am

Originally posted by SeskaPeel
The weak experience I have on such batching is that you’ll have to switch a lot of parameters between each call, that should hide glVertexPointer() latency.
This is good news for me because I always wondered how to workaround it. There are some cases in which I need to call VertexPointer to set some different parameters such as number of components in a attribute array. I’m somewhat off topic however.

Originally posted by l_belev
But when considering the details about it, please don’t tune it for the currently available hardware possibilities. For example if allowing negative base index is considered a good thing in principle, let it be so. Look at the GL_ARB_texture_non_power_of_two…
I’m not sure this could be really useful. I mean, ARB_npot has to do with artists. Indices are usually managed by the app… I’m not really sure I can think at a problem in which a negative index (in respect to $base) would help. Say I’ll have to index vertex $base-1. Then I would review the way I choose $base to be at that value and recompute some indices accordingly.
I can’t figure out a scenario in which this really helps. Also, considering the kind of functionality, I’m not really sure it’s good to drop old hardware. I would rather like a distinct extension which allows negative offsets.
Obviously, the whole thing holds when $base+@index[i] is actually pointing in the array, i.e. at offset > 0 from base address.
I agree that negative $bases are a bad thing.

By the way, this thing recalls me ListBase.
Fine, this is probably much more performance critical and I’m not really used to it so I could be wrong but I’d like to recall the conventions used. So, I vote for ElementBase. As for the added ‘Array’ just like it is in the spec, I’m not aware of what other kinds of ‘elements’ one could think at.
As for the behaviour at overflow, I also agree on having a implementation-specific result when accessing out of bounds memory.

l_belev · July 28, 2004, 12:22pm

]I’m not sure this could be really useful. I mean, ARB_npot has to do with artists. Indices are usually managed by the app… I’m not really sure I can think at a problem in which a negative index (in respect to $base) would help. Say I’ll have to index vertex $base-1. Then I would review the way I choose $base to be at that value and recompute some indices accordingly.
Neither can I think of such problem. As I said, I just gave this as an example and didnt mean to push for it. But you didn’t read the entire my post, did you