Re: VBO & DisplayList,which is faster

Re:

hi friends , today I try my test, in RTX2070s and MX330 ,displaylist is still faster than vbo.
but,If I have never bound the buffer a second time.vbo is still faster than displaylist.
I don’t know why unbinding and rebinding vbos will result in a performance difference of 10 times.

vbovsdiaplaylist

Re:

Where can we learn how the graphics card handles opengl instructions?
Are there any relevant articles?

You’ve provided very few details here. You don’t describe what you’re rendering, how you’re rendering it, whether you’re updating VBOs dynamically, or even how you’re timing. And there’s no source code shown.

Please see The Forum Posting Guidelines for tips in composing a post that’s more likely to net you the answers you want.

Also, don’t post in old threads (see Guideline #6). The threads you appended your original posts to where very, very old. I’ve moved them here into this new thread, linking to the older threads for reference.

To your question: That VBOs can be slower than NVIDIA’s GL display lists, particularly with lots of tiny draw calls, has been well known for a long time:

There are lots of threads in the forum archives on this, and techniques you can use to get around this.

Batching your content better (into fewer draw calls and fewer buffer objects) can reduce the cost considerably. Using VAOs can also reduce it. Alternatively, using NVIDIA bindless graphics extensions and/or display lists can bypass much of this overhead, even for pathologically bad use cases (lots of tiny draw calls, spread out across many small buffer objects).

However, whether this is your problem is totally unclear given the lack of details in your original post.

Thank you very much,I can’t upload multiple files.

The samples I use are downloaded from this website.
Use fixed rendering pipes.
I test performance because I have a software that draws a lot of line elements.
Similar to map software. like this .


his performance is very poor.
Each lane is drawn with a "glbegin: command,
I try to use one displaylist to render all the lanes.or use vbo replace glbegin .
but if one displaylist , Modifying displaylist is slow, and the performance is not improved.
if use vbo, Then maybe I need to build hundreds of thousands of vaos 、 vbos, which will make the program very complex.

In short, I want to build a map software that can edit maps and render a large number of elements.

I wonder if we should skip the performance issue? From a business perspective.
Vbo is more suitable for my software.
Should I have a vertex buffer for each element, or should the entire map share a buffer?

There is nothing in the scene except these maps.
no lighting model,no pbr, no texture of material,

You should probably have a VBO (or a handful of VBOs) for the entire map. You’ll need to handle the allocation of vertices within the VBO yourself.

The performance of updating the VBO shouldn’t matter if it’s only being updated in response to user input; the user isn’t going to be clicking the mouse or pressing keys hundreds of times per second.

Can I use geometric instances for such functions?

Instancing will only help if the map consists largely of copies of prefabricated objects. From the picture, it doesn’t look as if that’s the case. Instances need to be non-trivial for instancing to be useful, as implementations don’t pack multiple instances into a workgroup. Instances with few vertices will result in wasted GPU capacity in the vertex processing stage.

To what GClements mentioned (referencing the glDraw*Instanced*() draw calls, and similar with indirect draws), many drivers/GPUs actually can pack different instances in an instanced draw call into shared thread groups. So that’s probably not an concern here.

Regardless, this is a GPU-side perf issue. Your CPU time needed to dispatch all of these otherwise separate object draw calls will be significantly reduced. And if you’re currently making many draw calls perf frame and CPU-side frame time limited, switching to instanced draw calls nets you a huge perf++.

Now with MDI rendering (i.e. the the glMultiDraw*Indirect*() draw calls where we talking about putting different objects in different GL_DRAW_INDIRECT_BUFFER subdraw records), that falls squarely in the category GClements is referring to. That said, again this is completely a GPU-side perf issue. The amount of CPU time needed to queue a few MDI draw calls (or a few instanced draw calls for that matter) is almost zero. This is a huge CPU-side perf++, if you’re currently massively CPU-side frame rate limited. And there are GPU-side techniques to reduce the GPU-side perf cost if/when that becomes an issue.

A big part of the win with using instanced draw calls and/or MDI draw calls is the data and state reorg that you have to do to use them. Namely: 1) pack multiple objects in shared VBOs/IBOs, and 2) get rid of all of the often-needless GL state changes that you are doing between each of those original draw calls … so that it’s even possible to launch a bunch of object draws with a single draw call.

Related thread:

Thank you,The CPU should be more limited,A classic traversal scenario graph.There are about 100000 nodes.That is 100000 lane lines.
The trouble is that the number of vertices of each lane line is not equal, so there is no way to use geometric instances.
I`ve switched to using a display list for rendering,but, becasue work environment,I using Intel HD630 video card. in HD630,The performance of any method is similar,for 100000 elements,the rendering frame rate is maintained at about 8 frames. I dont know why…
at the same time ,I am in the virtual machine (vmware) work…

but,I already feel it, use one vbo buffer to render one map,is best choice.In order to function expansion,

By the way, for large engines like Unreal, they will not consider using methods like display list?

Unlikely. High-level engines usually support multiple rendering APIs (Unity supports DirectX, Vulkan, and OpenGL core profile on Windows, Metal on MacOS). Display lists are specific to OpenGL compatibility profile.