VBOs: Drawing vertices of same color

Hi everyone,

i’m currently working on some Cuda-GL-Interop stuff, and got to a point where i’m kind of stuck. This is pretty much the first thing i do with openGL btw, so stupidities might occur on my side.

Basically what i try to do is draw a large amount of single-color cubes. I update the colors and add and remove a few cubes every frame via Cuda, thus i would like to store vertices and colors in VBOs in the gpu memory.
However, i haven’t been able so far to find a way to draw the 8 vertices of a cube with e.g. glDrawRangeElements() while using the same color from the color buffer for all of them. So what i lack is a way to go to the next entry in the color buffer only after every 8th quad draw call that happens in the glDrawRangeElements …
I could obviously insert 1 color value into my color buffer for every vertex, but that would just be a lot of wasted memory.
Is there a way to do this i’ve overlooked? Or is maybe the performance hit so small that i better just put the colors into a normal array, copy it back to the host memory and set the color manually before drawing each cube?

Thanks a lot in advance for any replies

Short answer: You have to specify the color for each vertex.

In general, each attribute(normal, color,…) has to be either per-vertex or per-draw-call in OpenGL(imo).

The rendering performance will barely suffer when using a per-vertex color.
From Cudas point-of-view, just update the 8 color values(per cube) in the color VBO(which should be pretty fast since the VBO is in device mem). Avoid the host memory transfer.

An alternative approach might be to use a single vertex per cube, and use a geometry shader to generate a cube from that vertex. But i doubt that it would be much faster.

allright, i figured it would be like this with the per-vertex / per-draw attributes.

As i said, storing and updating 8 colors instead of 1 seems like an awful waste of space and unnecessary memory accesses, so i came up with the approach quickly stated in the last paragraph:

I could create a color VBO in the VRAM and update it date via Cuda. When rendering, I copy it back into the host memory, and instead of rolling over the full length of the VBO, i use glColor* to set the color for the next cube to the corresponding value from the color VBO, and then draw just this one cube.
So basically, i’d split up the glDrawRangeElements call into one call per cube.

This in contrast to my initial approach, where i would have to create 8 colors for each cube, but would be able to keep the VBOs completely in VRAM, and roll over the full length of the VBO in one call.

Thats pretty much where i don’t know yet what would perform better, considering i have at the very least 50k+ cubes in my scenes, the second option would lower the calls by a huge amount, but on the other hand generate an overhead of color information eating up my VRAM …

This will definitely result in worse performance(Each draw call has high CPU overhead, host mem transfer from cuda is slow, glColor() adds overhead too).

Go for the other approach with one big draw call.
If you are concerned about memory, be sure to use a 4byte color format(RGBA). But my guess would be, even when using 4 float RGBA, performance won’t be much lower.
How many bytes are used for the position of each cube vertex in your setup?

3 floats i figure, so 12 bytes - are there even alternatives to this?

I mainly asked to show that the color data is small compared to position data :wink:

On a more serious note, it is possible to use 16bit values for positions, when the precision loss is not a problem. In this thread, someone tried (with direct3d though):
=> Little/no gain in performance, but saves memory

In OpenGL it should be possible when you change the data type of glVertexPointer() from GL_FLOAT to GL_SHORT, for example. Never tried it though(my stuff is always fillrate/shader limited).

Allright, thanks a lot so far :slight_smile: I’ll stick with using 2 separate VBOs for now to be able to draw all the cubes in a single call. It’s ugly in some aspects, but i hope it pays off by extremely reducing the draw calls. And if I find the time, maybe i’ll try the other way as well just for performance comparison.