I am in the process of optimizing my renderer, and I am wondering if I should change its caching behavior a bit. Right now it caches the smallest cullable bits of geometry on the video card, and calls them all individually. This results in about 1000 calls to glCallList() per pass, and there are many passes.
I am planning to switch to use VBOs instead of display lists, and it brings up some interesting questions for me. I am wondering if I should minimize OpenGL calls as much as possible by culling my scene, sorting the geometry by material, transforming it, and then filling in a single VBO for each material by using glMapBuffer or GL_STREAM_DRAW. These VBOs would change just about every frame, but given the small size of my geometry relative to the bandwidth of the video card, I wonder if it would be a win?
What do you think? If I just migrate my renderer to VBOs as-is, the average VBO will be only 50 triangles. Of course there are a few tricks I can play to get that number up, but maybe someone experienced has some insight on how to strike the right balance?