hi friends , today I try my test, in RTX2070s and MX330 ,displaylist is still faster than vbo.
but,If I have never bound the buffer a second time.vbo is still faster than displaylist.
I don’t know why unbinding and rebinding vbos will result in a performance difference of 10 times.
You’ve provided very few details here. You don’t describe what you’re rendering, how you’re rendering it, whether you’re updating VBOs dynamically, or even how you’re timing. And there’s no source code shown.
Also, don’t post in old threads (see Guideline #6). The threads you appended your original posts to where very, very old. I’ve moved them here into this new thread, linking to the older threads for reference.
To your question: That VBOs can be slower than NVIDIA’s GL display lists, particularly with lots of tiny draw calls, has been well known for a long time:
There are lots of threads in the forum archives on this, and techniques you can use to get around this.
Batching your content better (into fewer draw calls and fewer buffer objects) can reduce the cost considerably. Using VAOs can also reduce it. Alternatively, using NVIDIA bindless graphics extensions and/or display lists can bypass much of this overhead, even for pathologically bad use cases (lots of tiny draw calls, spread out across many small buffer objects).
However, whether this is your problem is totally unclear given the lack of details in your original post.
his performance is very poor.
Each lane is drawn with a "glbegin: command,
I try to use one displaylist to render all the lanes.or use vbo replace glbegin .
but if one displaylist , Modifying displaylist is slow, and the performance is not improved.
if use vbo, Then maybe I need to build hundreds of thousands of vaos 、 vbos, which will make the program very complex.
In short, I want to build a map software that can edit maps and render a large number of elements.
I wonder if we should skip the performance issue? From a business perspective.
Vbo is more suitable for my software.
Should I have a vertex buffer for each element, or should the entire map share a buffer?
There is nothing in the scene except these maps.
no lighting model,no pbr, no texture of material,
Instancing will only help if the map consists largely of copies of prefabricated objects. From the picture, it doesn’t look as if that’s the case. Instances need to be non-trivial for instancing to be useful, as implementations don’t pack multiple instances into a workgroup. Instances with few vertices will result in wasted GPU capacity in the vertex processing stage.
To what GClements mentioned (referencing the glDraw*Instanced*() draw calls, and similar with indirect draws), many drivers/GPUs actually can pack different instances in an instanced draw call into shared thread groups. So that’s probably not an concern here.
Regardless, this is a GPU-side perf issue. Your CPU time needed to dispatch all of these otherwise separate object draw calls will be significantly reduced. And if you’re currently making many draw calls perf frame and CPU-side frame time limited, switching to instanced draw calls nets you a huge perf++.
Now with MDI rendering (i.e. the the glMultiDraw*Indirect*() draw calls where we talking about putting different objects in different GL_DRAW_INDIRECT_BUFFER subdraw records), that falls squarely in the category GClements is referring to. That said, again this is completely a GPU-side perf issue. The amount of CPU time needed to queue a few MDI draw calls (or a few instanced draw calls for that matter) is almost zero. This is a huge CPU-side perf++, if you’re currently massively CPU-side frame rate limited. And there are GPU-side techniques to reduce the GPU-side perf cost if/when that becomes an issue.
A big part of the win with using instanced draw calls and/or MDI draw calls is the data and state reorg that you have to do to use them. Namely: 1) pack multiple objects in shared VBOs/IBOs, and 2) get rid of all of the often-needless GL state changes that you are doing between each of those original draw calls … so that it’s even possible to launch a bunch of object draws with a single draw call.
Thank you,The CPU should be more limited,A classic traversal scenario graph.There are about 100000 nodes.That is 100000 lane lines.
The trouble is that the number of vertices of each lane line is not equal, so there is no way to use geometric instances.
I`ve switched to using a display list for rendering,but, becasue work environment,I using Intel HD630 video card. in HD630,The performance of any method is similar,for 100000 elements,the rendering frame rate is maintained at about 8 frames. I dont know why…
at the same time ,I am in the virtual machine (vmware) work…
but,I already feel it, use one vbo buffer to render one map,is best choice.In order to function expansion,
Unlikely. High-level engines usually support multiple rendering APIs (Unity supports DirectX, Vulkan, and OpenGL core profile on Windows, Metal on MacOS). Display lists are specific to OpenGL compatibility profile.