vertex buffer points and element arrays

thinks · July 6, 2007, 6:31am

Apparently, when rendering primitives from a vertex buffer object (VBO) using glDrawElements, glDrawRangeElements, or glDrawRangeElementsEXT, an index array must be supplied. As i understand this can be done in two ways (using glDrawRangeElements in the examples):

The first way may be the most straight-forward.

// Bind vertex data.
glBindBufferARB( GL_ARRAY_BUFFER, VertexBufferID );
glVertexPointer( 3, GL_FLOAT, 0, 0 );

// Create index array.  
GLuint indexArray[] = { 0, 1, 2, ..., NumPoints };

// Render all the points. Pass the index array in the glDrawRangeElements call.
glDrawRangeElements( GL_POINTS, 0, NumPoints, NumPoints, GL_UNSIGNED_INT, indexArray );

// Reset.
glDisableClientState( GL_VERTEX_ARRAY );
glBindBufferARB( GL_ARRAY_BUFFER, 0 );

The downside of this is that the index array is sent to the GPU each call.

To avoid passing the index array to the GPU each call we can store the index array in a buffer too. The call is then:

 
// Bind vertex and element data.
glBindBufferARB( GL_ARRAY_BUFFER, VertexBufferID );
glBindBufferARB( GL_ELEMENT_ARRAY_BUFFER, IndexBufferID );
glVertexPointer( 3, GL_FLOAT, 0, 0 );

// Render all the points. The indices are now read from the index buffer instead (pass 0 as last argument).
glDrawRangeElements( GL_POINTS, 0, NumPoints, NumPoints, GL_UNSIGNED_INT, 0 );

// Reset.
glDisableClientState( GL_VERTEX_ARRAY );
glBindBufferARB( GL_ELEMENT_ARRAY_BUFFER, 0 );
glBindBufferARB( GL_ARRAY_BUFFER, 0 );

Now to my question. For rendering points, it often doesn’t make much sense to have an index array at all. I would like to be able to specify a range of indices to render, rather than having to store/transfer every single index. For large collections of points, the memory required becomes quite substantial (one third of the vertex memory, assuming 3 floats per vertex and an unsigned int per index). One solution is to revert to immediate mode and go:

for( unsigned int i = 0; i < NumPoints; ++i )
{
   glVertex3f( VertexData[i].x, VertexData[i].y, VertexData[i].z);
}

Is it possible to use glDrawRangeElements without an index array? I suspect not, but I am hoping to confirm this.

Also, I suspect the call:

glDrawRangeElements( GL_POINTS, 0, NumPoints, NumPoints, GL_UNSIGNED_INT, indexArray );

should be

// "end" is size-1 because of zero-based indexing?
glDrawRangeElements( GL_POINTS, 0, NumPoints-1, NumPoints, GL_UNSIGNED_INT, indexArray );

I hope this is clear enough.

best,

T

Relic · July 6, 2007, 8:14am

Rendering GL_POINTS with glDrawElements where the element array is 0 to NumPoints is a waste of time.
Just use glDrawArrays in that case, which allows to specify a contiguous range by using its first and count parameters.

glDrawElements would only make sense for non-contiguous sets of points or if the glDrawArrays would be too many (count too small).

Correct, the start and end parameters in glDrawRangeElements are the actual min and max indices you will access. The NumPoints - 1 is the right end parameter.

songho · July 6, 2007, 8:42am

thinks,
If you reference all vertices in VBO every frame then, try glDrawArrays(). glDrawArrays() does not need an index array.

However, note that glDrawArrays() does not reduce memory usage in general cases. Actually, it requires more memory than using index, because of shared vertices.

Let’s say there is a mesh with 100 triangles and 30 shared vertices (70 verts are unique). Now, compare memory usages on both glDrawArrays() and glDrawElements().

glDrawArrays()
The size of the vertex array is 100x3x4 = 1200 bytes.
100: # of tris
3: coords of each vertex (x,y,z)
4: size of float variable
glDrawElements()
The size of the vertex array is 70x3x4 = 840 bytes.
70: # of non-shared vertices
3: coords of each vertex, (x,y,z)
4: size of float variable

The size of the index array with GLubyte is 100x3x1 = 300 btyes.
100: # of tris
3: each tri needs 3 indices
1: size of GLubyte

So, total memory requirement for glDrawElements() is 840+300 = 1140 bytes. It is 60 bytes less than glDrawArrays(). If you also count colour, normal and uv arrays, then the difference is getting bigger.

thinks · July 6, 2007, 8:59am

Relic:

glDrawElements would only make sense for non-contiguous sets of points or if the glDrawArrays would be too many (count too small).

What do you mean by the glDrawArraysbeing too many?

songho:
Your discussion is informed. However, in the case of rendering points there is (per definition) no shared vertices.

Thanks guys, I suspected there was no use for the indexing since there are no shared vertices. I might implement some LOD functionality. But still, if I order the points in the vertex array correctly I might be able to specify certain ranges without having to use indices. When there are millions of points, indices are a bit of a pain…

dorbie · July 15, 2007, 11:48pm

Yup, the idea behind index rendering of DrawElements is data compression through vertex reuse. Drawing points implies a 1:1 vertex : primitive ratio where each vertex is sent only once, therefore indexed rendering is not a win and may in fact be slower.

This assumes that your point data is unique, you should ensure it is.

With millions of points you could probably do a lot with multiple draw calls, there’s no need to stick them all in one call.

Visibility culling with aggregate patches of points, frustum culling and LOD culling with interleaved arrays should all be possible.

It sounds like you have some sort of surfel splatting going on.

thinks · July 16, 2007, 1:40am

Dorbie: I am not doing surfels yet. I am processing LiDAR data (think radar, and then switch radio for laser). Data sets are huge and visualization is a bit of a problem. Just want to try to avoid common pitfalls here, every slip-up matters with that many points.

Does anyone know of any LOD methods for point clouds that have been tried and tested? My idea would be to scramble the vertex array (randaomize the ordering of the points) and the simple draw larger and larger chunks of the vertex array as the user moves closer. What I don’t like about this is the randomness. I think it would be better if the sampling was more controlled.

ZbuffeR · July 16, 2007, 3:21am

As an off-line step, what about octree-like subdivision of world space, with smaller voxels about the size of a screen pixel (ajust according to performance), and only a single point per voxel ?
Then dump each level of detail to an array for optimal real-time performance.

thinks · July 16, 2007, 3:32am

Zbuffer: I also think that spatial subdivision is the way to go. I was thinking I might subdivide until the point-count in the leaf nodes drops below some threshold. I am not sure what you mean by have voxels the size of a pixel. I understand that the voxel projected to screen space may occupy one pixel, but seeing as the octree has dimensions in space, the projection will not be constant for all the voxels, right? So there is no one unique size that guarantees that the voxel size in screen space is one pixel?

I think there might be time to save by allowing several points to “live” in a voxel though. In my experience, clipping of points is very fast, so passing a few points that end up being clipped is not too bad, and it shifts computations from CPU to GPU, which in my case is good.

Thanks for input!

ZbuffeR · July 16, 2007, 4:11am

I am not sure what you mean by have voxels the size of a pixel.
ie. voxel size computed according to main interest of viewer (center of view, upto distance Z, targeted/selected point, something like that).

I was a bit quick on this, but it is only a quideline, not an absolute need, and you are right about balancing CPU/GPU computation. Very fine grained LOD is bad for GPUs.

thinks · July 16, 2007, 4:11pm

Very fine grained LOD is bad for GPUs.
Why is this? I thought this would be bad for CPU since there would be a lot of tree-traversal involved in deciding what gets passed to GPU.

dorbie · September 17, 2007, 5:51pm

I know what lidar is and the data aquisition for aerial survey etc.

Some kind of local clustering with dropout would work best IMHO, although you’re still taking a memory hit on the fetch mostly, but not the T&L.

Random order may hurt unexpectedly unless you retain some locality. Graphics cards rely heavily on framebuffer cache coherency at different levels and in a variety of ways across architectures and randomizing would be about the worst thing you could do. FB access coherency is one of the biggest issues for graphics and clustering your points spatially is about the best thing you could do. But for points other factors may make it a non issue, although on some platforms you’re proposing a pathological case you really should avoid.

Hmm, some kind of spatially coherent surfel MIP maps would be a good idea but they would cost more memory, and memory is the main motivation for the index dropout suggestion.

For level of detail, if you can afford the memory build patches of LOD culled/averaged surfels.