I’m concerned with speeding up my application. I’m rendering a large amount of cubes and I’m wondering if someone can point me in the right direction to doing it as fast as possible. Right now I’m just making a bunch of calls to glVertex3f() between glBegin(GL_QUADS) and glEnd(). I doubt this is the most efficient way. Using diplay lists with the same code didn’t help much. I get around 4 FPS for close to 70,000 cubes. I did some calculations and got that to be around 3.2 million triangles per second. I know that modern video cards are capable of alot more than that. I don’t need anything fancy just plain white cubes. Disabling lighting got me a few more frames per second, but without lighting you can’t really see what’s going on, it’s just a bunch of white pixels smeared together. So if anyone can help me out I’d appreciate it. Thanks!

use indexed quads/ primitives-
try to avoid sending down the same vertices for different faces of a cube.
e.g. the point (0,0,0) might occur in 3 faces of a cube but instead of specifying it for each face
you could send all different vertices to opengl and then send an array of indices describing your cube.

-> Vertex Arrays are a good thing to look for

use transforms to specify children of a
“mother cube”

Thank you very much for replying to my post.

I’ve had some success from the things you suggested. I’m now using indexed quads and vertex arrays to define my cubes from just 8 vertices instead of 24 and I’m rendering them with a display list. I think my code is alot cleaner and more efficient now.

I’m using 68921 (41^3) cubes as a test and have gone from 4 fps to 10. A decent improvement I guess, but I’m wondering if I can do more?

I’m not quite sure what you meant with the parent/child cube thing. I AM tranforming around 3d space to render the cubes where I need them, if that’s what you meant by that…

Any suggestions from anyone would be appreciated, thanks!

Use vertex arrays with the vertex buffer object extension (GL_ARB_vertex_buffer_object) if available - ( glDrawRangeElements() instead of glDrawElements()… )… avoid glFinish(); and disabling vsynch could also be a good thing (If not using LCD screen…).

Originally posted by ete:
glDrawRangeElements() instead of glDrawElements()… )
Does this really gives a speedup? I didn’t observe any difference in my tests here. I would be glad to hear your experience, just to know if I have to put in in my system or not. Thank you!

Originally posted by ete:
and disabling vsynch could also be a good thing.
Considering we are at 10fps, I hardly believe this.

Originally posted by Doughsay:
I’m using 68921 (41^3) cubes
I would seriously consider being CPU-limited or batch-limited. Try sending less cubes and check if the performance scales accordingly. I would suggest to try scale down the batches, maybe try to pre-transform “clusters” of cubes.
I think 68k batches are overkill.

glDrawRangeElements() instead of glDrawElements()… )
I think it’s a recomendation from NVidia; and it does make a very small diference in some situations on NVidia cards I tested (GeForce 6600/6800)…

I think 68k batches are overkill
More than 64k IS overkill…

I think you really should use occlusion queries, and just draw what’s “visible”…

Hi there
Why don’t you try using triangle strips instead of quads with vertex arrays. I think this will definitely give you some more fps any primitive apart from triangles is converted into triangles prior to rasterization so do try it out i m sure it will definitely boost the speed.
Here is the code that i used to draw the cube.

typedef GLfloat Point[3];

//Vertex array
Point pt[] = {
             { -0.5, -0.5,  0.5 },
	     { -0.5,  0.5,  0.5 },
	     {  0.5,  0.5,  0.5 },
	     {  0.5, -0.5,  0.5 },
	     { -0.5, -0.5, -0.5 },
	     { -0.5,  0.5, -0.5 },
	     {  0.5,  0.5, -0.5 },
	     {  0.5, -0.5, -0.5 }
/*Draws a cube without the normals no lights using tris 
Note: The vertex order given here is clockwise and the
default order for front polygons is anti clockwise,
 thus you have to tell opengl about the clockwise orientation 
of polygons by using glFrontFace(GL_CW).
void DrawCubeTris()
   int indices[14]={6,2,5,1,0,2,3,6,7,5,4,0,7,3};

   for(int i=0;i<14;i++)


Hi, thanks again for all the help. I did switch to using triangle strips instead of quads and gained another 5 FPS to get a total of 15.

I may be missing some major concepts. How should the basic structure of my rendering function be setup?

I would seriously consider being CPU-limited or batch-limited. Try sending less cubes and check if the performance scales accordingly. I would suggest to try scale down the batches, maybe try to pre-transform “clusters” of cubes.
I think 68k batches are overkill.
My function basically steps through a linked list of structures holding data that determines where and how many of the cubes should be drawn. It does this one cube at a time, basically, so I end up making many calls to glDrawElements(). As many calls to it as there are cubes being drawn. Is this bad logic? How else can I do this? I don’t quite understand what Obli said about “batches” or “clusters” of cubes. Should I be compiling everything into one large or several large vertex lists and the calling glDrawElements() once or a smaller number of time? This doesn’t work if I’m using triangle strips but I guess it could work with quads.

sould I try posting some of my code? Thanks again for all the help.

If you make 64k state changes in modelview matrix in order to set your cubes coordinates, this is really overkill. If you have vertex shaders, you may try a technology used in games for rendering grass, tree leafs - do not call glTranslate, glRotate, glScale, but send the modelview matrix as multitex coordinates to a vertex shader.