Using indexed vertex arrays?

Duncan_Champney · March 31, 2008, 5:28pm

I’m getting ready to convert my app from using display lists to using vertex arrays. (VBOs, actually, but nevermind.)

I have a large rectangular triangle mesh that I want to render.

I want to break it into several VBOs. Each one will consist of a series of fixed-sized triangle strips.

I want to be able to render N strips of COUNT triangles in one call. The data for the triangle strips will be listed sequentially in an index array. I’d like to be able to issue one call that would render all N strips at once.

It looks like glMultiDrawArrays will do what I want, but with a catch. From the way I read it, I need to create an array of pointers to arrays of indexes. Since all my strips are the same size, and they are sequentially listed in my index array, this seems like extra work with no benefit. Is there a call that assumes that the indexes of the primitives are listed sequentially in the array you pass?

If there’s not such a call, would there be any penalty for using glDrawElements with GL_TRIANGLE instead, and just listing the indexes of all 3 vertexes of my triangles each time? That might be simpler and cleaner for me than allocating space for arrays of array pointers so that I can use glMultiDrawArrays. I’d just refactor my index array to list all 3 vertexes for each triangle, and draw them all in one shot. Since the vertex/normal/color data is already in arrays and is only being referenced, would this have any performance penalty as opposed to using triangle strips and glMultiDrawArrays.

As a final alternative, I guess I could use a loop, and call glDrawElements(GL_TRIANGLE_STRIP) for each triangle strip. That wold add the call overhead of multiple calls to glDrawElements.

Gawd, this gets confusing. Arrays of pointers to arrays of indexes into arrays of vertex/normal/color data. (and the vertex data is, itself, an array of coordinate values.) I think my head would explode if I tried to use the interleaved format.

Relic · April 1, 2008, 6:18am

Since you’re rendering a mesh you should use indexed rendering because inner vertices are reused. In your case that would be glMultiDrawElements.

Or if you’re on an NVIDIA implementation you could use the primitive restart extension.
http://developer.download.nvidia.com/opengl/specs/nvOpenGLspecs.pdf
Use glPrimitiveRestartIndex to dedicate one unused index to mean “start a new primitive of the same kind”, insert that behind every strip, and use a single glDraw(Range)Elements call to render all strips at once.
Done. And that’s super fast.

Duncan_Champney · April 1, 2008, 7:01am

That’s what I figured. However, I want to use VBOs so my vertex/normal/color data is stored on the card whenever possible.

I see how you use VBOs to do commands like glDrawElements, but how do you use glMultiDrawElements with VBOs? Or can you?

glMultiDrawElements wants an array of sizes in count, and an array of index arrays in the last parameter, indices. (GLvoid* *indices)

For use with VBOs, what do you pass for these two? I haven’t found any documentation on using this call with VBOs. I would guess that you pass an array of counts as normal for count, and maybe an array of ids to GL_ELEMENT_ARRAY_BUFFER objects for the indices parameter?

Or do you pass an array of array “names”? And where is this documented? I have both the “Red book” (6th edition) and the OpenGL SuperBible (4th edition.)

That sounds very cool, but unfortunately, I can’t assume NVIDIA hardware. Some Macs use Intel video hardware, and for that matter, the desktop machines can use a variety of user-installed video cards.

CatDog · April 1, 2008, 7:43am

The same that you would pass for a series of glDrawElement calls.


for i=0 to ElementArrayCount-1 do
    glDrawElements( GL_TRIANGLES, 
        ElementArraySizes[i], GL_UNSIGNED_INT, 
        ElementArrayOffsets[i] );

is the same as


glMultiDrawElements( GL_TRIANGLES, 
        ElementArraySizes, GL_UNSIGNED_INT, 
        ElementArrayOffsets, ElementArrayCount );

So ElementArraySizes and ElementArrayOffsets have nothing to do with VBOs. You have to maintain them in main memory.

CatDog

Duncan_Champney · April 1, 2008, 8:32am

Oh, I guess that makes sense. Thanks

Relic · April 1, 2008, 8:51am

>>That sounds very cool, but unfortunately, I can’t assume NVIDIA hardware.<<

It’s an extension, means like all extensions you can (need to) determine its presence at runtime and make use of it when it’s available and you like.
If not, you have the MultiDraw calls, and if not those you can still loop over glDrawElements. Luckily OpenGL can handle small render calls pretty well compared to D3D. (Strike!)

Duncan_Champney · April 1, 2008, 7:00pm

The same that you would pass for a series of glDrawElement calls.


for i=0 to ElementArrayCount-1 do
    glDrawElements( GL_TRIANGLES, 
        ElementArraySizes[i], GL_UNSIGNED_INT, 
        ElementArrayOffsets[i] );

is the same as


glMultiDrawElements( GL_TRIANGLES, 
        ElementArraySizes, GL_UNSIGNED_INT, 
        ElementArrayOffsets, ElementArrayCount );

So ElementArraySizes and ElementArrayOffsets have nothing to do with VBOs. You have to maintain them in main memory.

CatDog [/QUOTE]

I’m still confused.

if I bind an array of indexes to a VBO using:

glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, index_VBO_id)

then how do I index into that array using either glDrawElements or glMultiDrawElements?

In that case the last parameter to glDrawElements doesn’t make sense. I can’t pass a pointer to the index array which now lives on the graphics hardware.

Similarly, I don’t see how the GLvoid** pointer indicies in glMultiDrawElements can point to different arrays of indexes when my array of indexes is bound to a VBO.

Duncan_Champney · April 1, 2008, 7:51pm

I managed to get VBOs working using glMultiDrawElements if my array if indexes is in main memory. I don’t understand how to use this call if my indexes are bound to a VBO and in video memory though.

I was disappointed to find that this rendering is quite slow.

I decided to see if it was the overhead of reading the index array from main memory.

I wrote a version that bound the index array to a VBO, and just rendered the whole mesh using glDrawElements. This causes some artifacts when my triangle strips wrap from the bottom of my mesh back to the top, but I wanted to see what performance was like when all drawing was done with one simple call.

Even when I draw the whole mesh using one call to glDrawElements and all the data structures (vertexes, normals, colors, and indexes) are bound to VBOs (and theoretically on the card) performance is quite a bit slower than with a simple display list. Based on what I read, I expected VBO based drawing to be faster because of vertex sharing and pre-transformation of the vertex data. (In a display list, the vertexes that should be shared between triangle strips are instead repeated.)

VBOs are noticeably slower even for fairly modest sized meshes. (I’m using a mesh with 800x800 vertexes for most testing, and with display lists I’m able to rotate the object in my view in real time with no detectable lag. With VBOs rotating my mesh is sluggish.)

Based on the reading I did, I expected VBOs to be faster than display lists. What am I missing?

I am currently using one large VBO each for my vertex, normal, color, and array index data. I haven’t yet broken my mesh into a set of smaller VBOs. Would using a series of smaller VBOs make drawing faster, even though I’d have to issue more OpenGL commands in order to render each “chunk” of my mesh?

Lord_crc · April 1, 2008, 9:10pm

Normally, glMultiDrawElements accepts an array of integer arrays. From my understanding, if you have an element array buffer bound, then you should pass an array of integer offsets to glMultiDrawElements instead. So, all of the integer arrays (from the normal operation) must lie within the same VBO, and each offset indicates where each integer array starts.

As for the speed… I’ve heard that NVIDIA has put quite some effort into optimizing their display list code, and it will do things to optimize the rendering of them. I’ve also noticed that the NVIDIA driver does things when using VBO’s, and so I sometimes get a substantial speed-up after 5-10 seconds of rendering.

Duncan_Champney · April 2, 2008, 3:43am

Ok, thank you.

The array apparently contains integer byte offsets from the beginning of the bound array. I was trying to pass an array of starting indexes. I had to cast the pointer to GLVoid** to get the compiler to take it, but it works.

Where is that documented? I poured through several references and example apps, and could not find any information on how to use that call for a bound (VBO based) index array.

Even still the glMultiDrawElements call with everything bound to a VBO is quite a bit slower than a simple display list.

Would I get better performance if I cut my mesh into a large number of smaller VBOs, even though that would require a large number of calls to draw? (for each VBO, bind its vertexes, normals, colors, and indexes, then call glMultiDrawElements)

I would think that VBO based drawing would be faster than display lists. Quite a bit faster, even. All the vertexes are in arrays that can be transformed all at once, and the rendering is a single call instead of a whole bunch of individual calls. I’m disappointed to find that it’s actually a great deal slower. Maybe there’s something I’m doing that’s slowing it down. Any suggestions would be greatly appreciated.

I’m now thinking of going back to display lists and breaking my drawing up into more than one display lists when the mesh gets large, in order to support larger meshes.

Lord_crc · April 2, 2008, 9:57am

I must admit I was in the same boat as you, except I gave up on the call and simply emulated it myself. In general the documentation regarding VBO usage is rather spotty IMHO.

Well see, a display list is a single, static object. The driver can be smart about what it does with the data in it. It can convert some of the data to a format which suits the graphics card better. It can lay out the data in the best way (interleaved vs non-interleaved), or it can create batches out of it, optimized for the vertex cache on that card (I’m pulling some of this out of thin air, just trying to convey a point).

As I see it, VBO’s were primarily meant to improve speed for dynamic meshes, for instance an animated character, where some of the information stays the same (indexes, texture coords), and some of it changes every frame (position, normals). In this case, VBO’s would allow you to store the static info on the GPU, and access the video memory directly for the dynamic parts, thus saving a lot of copying by the driver.

Duncan_Champney · April 2, 2008, 11:44am

I must admit I was in the same boat as you, except I gave up on the call and simply emulated it myself. In general the documentation regarding VBO usage is rather spotty IMHO.
[/QUOTE]
Which specific call did you give up on?
Agreed about the documentation. I’ve come to realize that most of the draw calls that take a pointer to vertex data seem to want a byte offset when there is a bound VBO active. I wish the docs would SAY that.

My next tack is going to be to use glDrawRangeElements, and check to see how big a chunk of vertexes the driver wants for optimum performance.

Lord_crc:

Well see, a display list is a single, static object. The driver can be smart about what it does with the data in it. It can convert some of the data to a format which suits the graphics card better. It can lay out the data in the best way (interleaved vs non-interleaved), or it can create batches out of it, optimized for the vertex cache on that card (I’m pulling some of this out of thin air, just trying to convey a point).

As I see it, VBO’s were primarily meant to improve speed for dynamic meshes, for instance an animated character, where some of the information stays the same (indexes, texture coords), and some of it changes every frame (position, normals). In this case, VBO’s would allow you to store the static info on the GPU, and access the video memory directly for the dynamic parts, thus saving a lot of copying by the driver.

I thought that’s what the usage hints were for. There is a GL_DRAW_STATIC hint that tells the driver you don’t plan to update the data very often. That’s what I’m using, but it doesn’t help.

Lord_crc · April 2, 2008, 2:23pm

I gave up on the glMultiDrawElements call.

As for offsets, I thought they were element offsets, not byte offsets, and so I’ve probably done the same mistake as you have, which would explain a lot

As for the usage flags, yes that’s indeed what they are for, however NVIDIA’s driver does treat them as hints and will do what it thinks is best after a period of analysis.

However the driver can do so much more with a display list, since it has all the needed data in one place. It can compute a bounding box around the geometry contained, and perform automatic frustum culling (which I hear the NVIDIA driver does). Your data is presented in separate arrays, yet perhaps the card can perform better if the data is interleaved. The driver can then rearrange the data in the display list to make it interleaved, and with optimal alignment.

I’ve also heard that the NVIDIA driver may convert the data into optimized triangle lists (which can be fine-tuned based on vertex cache size etc). It can convert the triangle indexes to 16-bit if possible, thus cutting the bandwidth needed for that in half.

Most importantly, you don’t have to do the expensive glVertexPointer call, which validates all the VBO’s and prepares for rendering. With a display list, the driver knows everything is “go for launch”.

With a VBO, basically all it can do is decide if the data should be uploaded to video memory or not.

Duncan_Champney · April 2, 2008, 2:34pm

Ok, this is very odd. I changed my code to use glDrawRangeElements, and set up my start and end values so the driver could limit the number of vertexes it needed to worry about for each call. For my initial tests the DrawElements calls I was making were not exceeding the values I got from glGet(GL_MAX_ELEMENTS_VERTICES) and glGet(GL_MAX_ELEMENTS_INDICES)

Thus I didn’t spend the time and effort to break up my drawing any finer than a triangle strip for every column in my mesh.

After changing my code to use glDrawRangeElements, drawing is no faster.

On a lark, I decided to put display lists back in. Now I start a display list, and the only calls I write to the display lists are my sequence of glDrawRangeElements calls.

My draw routine now just invokes the display list.

THE DRAWING IS NOW BACK TO BEING VERY FAST. I don’t know if it’s faster than before I implemented VBOs, but it’s back to being quite useable again.

This seems really odd. I don’t get it at all. Why does moving my calls to glDrawRangeElements into a display list make them so much faster? In my testing I’m only making about 800 calls to glDrawRangeElements in the display list. The call overhead should not be significant. The vertexes/normals/colors/indexes are all on the graphics hardware with or without using a display list.

Can anybody shed some light on why display lists make such a dramatic difference in this case?

Anyway, I’m back on track. Now I need to teach my program to break up my mesh into multiple VBOs when it gets large so I can draw a much larger mesh. That’s going to involve a fair amount of re-factoring, but it’s not rocket science.

Duncan_Champney · April 2, 2008, 2:44pm

crc,

It looks like when you have VBOs bound, all the parameters that normally take a pointer switch to wanting a byte offset. I got glDrawElements to draw starting in the middle of my index array by passing it a byte offset. This is TOTALLY NOT CLEAR from the documentation.

That’s another odd thing. I am binding my VBOs and doing my glVertexPointer/glNormalPointer/glColorPointer calls before I draw at all, and leaving them set up. Thus that overhead should not affect my rendering loop at all.

See my next post. I changed my code to use glDrawRangeElements, thinking that maybe it was doing transformations on every vertex in my VBO object for each draw call, but it didn’t make any significant difference. However, when I put my calls to glDrawRangeElements into a (very small) display list, it got fast again! Go figure. I would think the memory footprint of my VBO based drawing would be smaller, because the display list had to buffer all the drawing commands (set vertex. Set normal. Set color. Set vertex. Set normal. Set color. Repeat ad nauseum.) With the VBO approach, I’m only storing the vertexes/normals/colors/indexes on the card. Dunno. I’ll have to do some testing.

Lord_crc · April 2, 2008, 3:23pm

I agree, the VBO stuff is not very clear. Hopefully OpenGL 3 will be easier to decipher, once it’s released sometime next century…

As for the speed difference, I think it boils down to
a) glVertexPointer is supposedly very expensive, and once a frame is more than zero per frame.
b) Display lists can be optimized heavily by the driver, VBO’s not so.

If you’re testing with small buffers etc so you get very high framerates, remember that overhead will potentially play a MUCH greater role than with “real world” data.

Komat · April 2, 2008, 3:40pm

The specification explicitly states that:

When an array is sourced from a buffer object, the pointer value of that array is used to compute an offset, in basic machine units, into the data store of the buffer object. This offset is computed by subtracting a null pointer from the pointer value, where both pointers are treated as pointers to basic machine units.