I am in the process of converting an old OpenGL game to modern, vertex buffer based style but continue to run into performance issues.
The game mostly consists of low-poly geometry and often makes dynamic changes to some of its parts - think Doom with levels 5-6 times as large and the same basic mechanics of altering the level for opening doors, etc.
This means using a static vertex buffer for the basic geometry is not an option because there’s no simple way to detect what parts of the level are static and which are not.
I have written a basic renderer that manages to handle this with vertex buffers but I am getting performance issues. The single batches are simply too small to see any significant effect of speeding up the GPU but having to create and handle the buffers costs significantly more time than just calling the immediate mode functions. Another problem I have is that due to the lighting model, I end up transferring 2-3 times as much date to the GPU, because I have to set certain attributes for each single vertex, which before I only had to re-send when they changed. It’s also hardly relevant which means of sending the buffer data I use, both glBufferData and mapping the buffer to write directly to it never manage to get the same performance as immediate mode.
I also tried some immediate mode replacement libraries but they even fare worse in terms of performance than anything I attempted myself. From the looks of it, I lose all the internal optimization the GL driver does for these cases.
So, does anybody have some tips how such a scenario can be optimized for better performance with vertex buffers? When using high vertex count geometry I can clearly see how it performs better but in these cases with an average of less than 20 vertices per draw call it’s absolutely killing me.
vertex buffer will not necessarily gain you speed if you are constantly changing your data - in fact they could easily be worse. There benefit is mostly from having static data. You should look at splitting your objects into dynamic and static. If you are only drawing 20 vertices per call you have a fundamental design problem.
Use async MapBuffer to avoid stalls. Read the wiki.
A naive port, where you just take each glBegin/glEnd pair and write that to a dynamic buffer, possibly even creating the dynamic buffer at the same time (and destroying it after the draw call), is not going to work well at all and it sounds as though you’ve got something similar to that. If that’s what you’re doing then it’s expected that you’re just not going to get the performance.
In order to really get things running fast you need to more closely examine your data. Which data can remain static? Which data absolutely must be dynamic? You’ll probably be susprised to find that much more of your data can remain static than you initially thought; for example you mention “the same basic mechanics of altering the level for opening doors” - well, why do you think this must be dynamic data? It’s not - it’s static data but drawn with a different matrix.
Vertex shaders can help a lot with turning previously dynamic data into static data. Say you’re doing some keyframe interpolation, for example: you can do this in a vertex shader instead of on the CPU and suddenly you have static data again. You’re also talking about lighting - but can your lighting model calculations not also move to the GPU? Chances are that the GPU may be able to run them much faster anyway, and doing so may help make your data static as well as reduce the amount of data you transfer. You’ll probably also be able to get higher quality by doing some of the lighting per-fragment instead of per-vertex - but yet still with better performance than the old CPU-based model.
OpenGL has a rich set of drawing commands so use them. Draw calls have less overhead than in D3D but they’re still not free, so getting as much data into a single draw call as you can will help. Look at the glMultiDraw commands to help batch up multiple polygons with the same texture(s) and other properties.
Don’t be afraid to jump back to older-style OpenGL if it makes sense to do so. By this I mean that it’s OK to use client-side vertex arrays and it may even be OK to use immediate mode under certain limited conditions. You don’t need to be religious about “everything in vertex buffers”; be pragmatic.
Finally, be realistic about it. You’re not going to see large performance improvements if you’re dealing with small workloads - the simple reason is that with small workloads vertex submission is just not a bottleneck in your program. As the polygon counts go up, however, you’ll begin to see a modern GL implementation pull ahead.