So I’ve been struggling trying to figure out what is the best way to plot many cubes. I basically have a scene that needs to draw anywhere between 10,000 and 500,000 cubes. These cubes are identical in size and shape, but will have different color and later on I’d like them to have a different texture on surface.
I tried doing vertex arrays, but that get’s slow once I get passed 50,000 cubes.
I also used VBO’s, but the same happens.
Currently I’m trying to figure out this geometry instancing business, but I can’t figure out why nothing will draw. I haven’t been able to find a simple example of geometry instancing, but I would like to know if this is a right approach?
Also please don’t ask me why I’m drawing so many cubes, since the application in question is used to display some algorithm results for debugging purposes and drawing many cubes is exactly what I a need.
I have a few suggestions that would speed up the drawing of many cubes, especially if the scene is static, so that a little pre-processing is worth the time.
Are your cubes organized on a grid such that they never overlap? You tell us that they are regular in size and shape, which leads me to think that they are perhaps organized on a grid as well?
(1) Use shared vertices. If two cubes share a corner they could re-use that vertex.
(2) Remove internal faces. For cubes that are aligned face-to-face it is not necessary to render that face, since it is internal.
(3) Multiple cube face that are connected in the same plane can be merged into large quads, thus reducing the number of primitives. This might not work so well if you want to texture the cubes later.
Finally, if your program is for debugging, why are you worried about speed? Is it the case that it is too slow to even be usable?
They are organized on a grid and they’ll never overlap. Even though it is for debugging we use the viewer to demo some capabilities and having a slow viewer seems to give people wrong impression of our algorithms.
If you’re using an orthographic projection, you might also try storing the faces in 6 separate VBOs. All the left faces in one, bottom faces in another, and so on. Then, on each frame, see which faces could actually be visible. At most 3 of the VBOs would need to be drawn, since you can never see more than 3 faces of a cube. If a face’s normal dotted with the view direction is greater than 0, draw that VBO.
This technique precludes shared vertices, but cubes don’t handle shared vertices well anyway, since the faces have different normals.
If you get slow when you go past a certain number of cubes, it might be the case that you’re running up against hardware limits and falling back to software emulation. Maybe try drawing them in multiple batches of 50,000 instead of all in one big batch.
You don’t say how you’re transforming the cubes. Doing a glPushMatrix/glRotatef/glTranslatef/glDraw*/glPopMatrix for each individual cube is definitely not the fastest way to handle this. You’ll get much better performance if you’re able to draw many cubes in a single call.
If your cubes are static objects you can try loading them all into one big VBO as a one-time-only operation then draw directly from that. If they’re dynamic objects then constructing a client-side vertex array on the fly each frame, with transforms performed in software, might be the fastest approach.
If you’re not running any scaling or rotation on the cubes then the transform can be collapsed from a full matrix multiply to 3 simple additions; more performance.
If you’re drawing using quads or strips or fans you need to stop doing that right now. GL_TRIANGLES combined with indexes is the most efficient and flexible way to draw, and you’ll be able to reduce the number of vertexes you send to the hardware from 24 per cube to 8. kaerimasu makes a good point about normals though, but I guess that’s not so important for the type of rendering requirements you have (you may not even be using OpenGL lighting - getting the things drawn as fast as possible would be more important here).
With this method, a nice thing is that your index buffer can be completely static - it doesn’t need to change each frame. The same applies to other properties of each vertex, such as texcoords or colours. Only position needs to update dynamically (and that’s assuming that you even need to change the position at all).
You might even be able to split the drawing over multiple frames. You’ll probably need a single-buffered context for this to work best (otherwise you should draw each frame’s batch twice in 2 consecutive frames) but the trick is to only clear the color and depth buffers every fourth frame, and every frame draw just 25% of the cubes.
I’d recommend against texturing them as it’s only going to slow things down further. For this kind of drawing you need to strip your renderer back to be bare minimum possible. Adding colors should be OK though.
Funny. I have exactly the same problem here. I must draw thousands of cubes at random positions on a regular grid. First i tried the slow immediate modus. Then i switched to display lists. I get the best framerate. Much better than a single big vbo. At last i tried geometry instancing. I saved the vertexes for one cube in a vbo and transfered the translation-date for all cubes in one big texture buffer object to the gpu. The rendering happened with the glInstance_id in the vertex shader. It was still slower than the display lists. But i`m still not very familiar with opengl so maybe there is something wrong with my setup.
At the moment i’m looking for optimization on the cpu side.
There are lots of posibilities to remove vertexdata before drawing (remove shared vertex,culling, etc)
I`m curious to hear from you how you solved the problem.
Well I found this page: http://sol.gfxile.net/cubes.html, but there isn’t much in the way of source code. I’d like to see how he does arb_draw_instanced.