speed of glBindVertexArray


I have been working on a voxel engine for terrain, similar to mine craft. Online there are many tutorials about this. One of the optimizations that is frequently discussed involves separating the VBO’s into collections of faces that all point in the same direction. Then you only draw the VBOs with faces that would actually be visible.

So, a cube has 6 faces. If the camera is to the left of the cube, there is no need to even draw the right-facing face. When you are talking about thousands of cubes, it would seem to make sense to have 6 different VBO’s, one for each direction. So if the camera is to the left, you only draw the vbo containing left-facing faces, and not the right-facing vbo.

In practice, however, I am finding that my engine is much faster if I do NOT do this. That is, my engine runs at 48 fps if I put all the faces into one VBO and draw the whole thing every frame, letting openGL do the culling. If I have 6 different VBOs, calling glBindVertexArray to switch between the ones that need drawing, my framerate drops to around 38.

I was just wondering why that would be? In fact, with my polygons split into 6 different VBO’s, the framerate stays around 38 whether or not I do the test. So if I do the test, I only need to draw 3 of the 6 VBO’s. But the frame rate is 38. If I comment out the test, leaving the 6 VBO’s and drawing all 6, the rate is still around 38. I don’t really seem to save any time by not sending half my polygons!? Anyone know why I might not be seeing any savings there?


It’s probably because you’re changing buffer objects 6 times. Why not just put all of the faces in the same buffer object, and simply adjust your glDrawArrays/glDrawElements calls accordingly. That is, put all the faces in the same buffer, but organized by which side they’re on. Then, just draw the appropriate parts of the buffer for those faces.

In general, switching VAOs seems to be a costly operation. I did some profiling once on Windows and glBindVertexArray() took more time than most other calls IIRC. Thinking about what it does, this makes sense. First, state regarding the VAO itself has to be made current. That includes buffer bindings and vertex array information. If you have n vertex arrays setup for the VAO, you can assume n times the same work - unless the driver does some clever and unfortunately non-transparent optimizations. In addition, all the state belonging to one or two buffer objects needs to be made current.

Incidentally, how many triangles are we talking here? How complex is your fragment shading? Do you do occlusion culling?