GL3 and matrices

Display lists are probably the best way to get high performances but I agree the issue for drivers writers (is any one haven’t complains about drivers?) and there also make complicate things for the programmer. For a small sample display list are just great but wish a large account of source code… a be more complicated. A solution would have been to limite their uses on geometry but what could they do that VBO couldn’t in that case?

Maybe in the futur display lists will make their come back but right now I just want a OpenGL 3 spec to hand on the futur!

Any bet for GL3 spec for siggraph 2008 ?

If you have them each function has to check if it should be recorded into a display list. Then the display list stuff has to be stored somewhere and the driver would also need to be able to play them back. Life for the driver writer would be much easier if they didn’t exist.

Please pay attention to the thread. Geometry-only display lists do not work that way. They do not record anything. You do not build them by pretending to run the rendering system.

I’m confused. If you will create some GL object and just feed it your vertices and the driver will optimize it, how is this different than from a static VBO?

Some one says that nvidia driver optimize display lists (because it does scene culling)
It should be able to do the same for a static VBO.

geometry only display lists seems useless to me.

If you will create some GL object and just feed it your vertices and the driver will optimize it, how is this different than from a static VBO?

Because, while a static VBO will likely be in video memory, the arrangements of vertices and elements are exactly and only what you asked for them to be. Which means that they do not have to conform to what the hardware would like them to.

Some one says that nvidia driver optimize display lists (because it does scene culling)

No, nVidia’s display list optimizations include culling; they aren’t only culling. They also include proper stripping (for the specific hardware caches of the chip) and so forth.

@ korval… you’re saying NVIDIA drivers will tristrip and index calls sent to a display list for vertex cache optimization?! I don’t believe you, where did you read that? It wouldn’t even have a rational vertex set to begin with. At a minimum, known weaknesses in nvtristrip make this a nasty proposition. If you’re saying it will regurgitate anything indexed & stripped you send it but from fast memory that’s a nobrainer and misses my point.

@Groovounet… best by what standard? The best way is to write good dispatch code from well ordered data in fast memory. Buidling a display list has a cost in terms of time, support and most of all driver complexity. It is also an abused feature.

Culling of display lists does make a lot of sense for idiotic apps in a fixed function pipeline, BUT with vertex shaders it’s a different ballgame. You cannot cull until at least vertex positional transformation for affine transformations, and for any decent app even the legacy functionality should be redundant. So this would needs all sorts of analysis just in your shader compiler to see if it’s even possible THEN you have to split your shader to realize a partial win, good luck with that.

If you use ftransform in your shader, the driver can flag that shader as “using ftransform” upon creation. When you have bound that shader to your pipeline and you render some display list, the driver knows, that it can cull the object, as long as it also knows the modelview and projection matrices.

This is what is necessary to enable such a feature. I don’t say it should be done, but i always see it as a good thing, if an API at least exposes such possibilities for drivers to optimize things.

Even apps that use many fancy shaders, will very often still do some “standard” rendering. So why remove the modelview and projection matrix and ftransform which are not only matrices/functions but also inhabit some very fundamental semantics? Sure, i like a lean and mean API, too. But i really don’t like restricting myself and driver-writers possibilities for optimizations, just for the sake of removing redundancy. You are NOT removing redundancy by this, you are also removing some meta-information, that can be used reasonably.

Jan.

And IMO that’s not a bad thing. Culling in display lists in a heresy to me. It benefits the “lazy” programmers that create their whole scene with display lists without any form of scene graph, but any moderately advanced engine will perform its own visibility/culling processing, and so the work is done twice and you actually waste performance.

Y.

Incidentally, in one of the GDC 2007 presentations, one of the features mentioned for d3d10’s future is a command-buffer object, which smacks to me of a DL of sorts… “Commands stored as replayable macros” … “Fast resubmission of common command set.”

Hmmm… seems reasonable.

I agree. There are also side effects such as if you do any form of profiling with hardware counters it’ll register less geometry submitted than was actually done because the driver threw away some stuff.

So what? A renderer with early-Z will count much less shaded fragments than one that exactly follows the theoretical OpenGL pipeline. Yet you will certainly agree that it’s a perfectly valid optimization.

So what? A renderer with early-Z will count much less shaded fragments than one that exactly follows the theoretical OpenGL pipeline. Yet you will certainly agree that it’s a perfectly valid optimization. [/QUOTE]

I agree it’s a valid optimization (unlike the abusive example I gave) but it helps the worst of developers and only adds overhead to the best since it should never succeed in culling anything, it would be redundant. If your app relies on this you have issues as a developer. I have also pointed out the potential for problems doing this with more advanced vertex shaders.

Some developers may not have the luxury of time to fully optimize their application. With more advanced vertex shaders culling can just be disabled.

But back to topic, you neither need ftransform nor gl_Vertex and gl_ModelViewProjectionMatrix to have the compiler check that gl_Position is the result of a uniform mat4 * attribute vec4 operation.

Sure, but the hardware counts culled fragments as well, so the developer will have the full picture of what’s going on.

There’s nothing preventing the driver from counting the culled triangles.

It seems the general opinion in this thread is to remove API opportunities for the hardware to perform optimisations simply because they could be performed by the application and therefore on the CPU. This contradicts the mantra’s of recent times that more graphics tasks should be off-loaded to the GPU. With geometry display lists the hardware has the opportunity to perform both frustum and occlusion culling. Without geometry display lists it is virtually impossible for the hardware to do this.

There are many small programs or tools, that are just hacked together to get something done. Especially in the academic field. Sure, we CAN optimize all our programs. But being forced to optimize every single pi**-program just because there is NO optimization whatsoever done by the driver will be a major pain. For example, i have an editor, that displays many small 3D gui-objects to manipulate selected items, which consist of a few lines or triangles. With display lists, i can make sure, that i can render them at least with only one drawcall, instead of a bunch of glBegin/… calls. When i render like 50 of them, because the user currently selected 50 objects, it would be nice, if the driver would at least do basic frustum culling.

Tools are often written by people who are not so much into the details of OpenGL and optimizations. Having at least basic optimizations, especially for such small stuff, would IMO be a good thing. One should not forget, that even OpenGL 3 is not ONLY intended for game-programming, where a company needs to go the extra mile of optimization, but it is also for many academic and other semi-professional purposes.

With the few state-objects that D3D10 uses (and OpenGL 3) i really don’t see the point of a “command-buffer” anymore.

Jan.

Yes, but unfortunately culling isn’t performed by the GPU, but by the CPU, hence why it’s done twice and wasting cpu cycles. Maybe you don’t care ? Fine, but I do.

I could live with a hint you could set, specifying whether DL should do culling or not. But doing culling when I already do it… just no.

Y.

I thought the point of OpenGL 3 was to reduce the complexity of the drivers. As such, I find it strange that display lists should be included, as they’ve always appeared as a rather complex beast. Not the concept, but the implementation. They modify the behavior of a rather large bunch of calls, which makes it more error prone.

Why couldn’t display lists be implemented in something like GLU? Except from culling, from what I can see, most other optimizations could be done by such a library. It would keep the drivers clean, and people who would like the ease of display lists would be able to use them. It should also make the performance of display lists more consistent between systems.

With geometry display lists the hardware has the opportunity to perform both frustum and occlusion culling.

Actually, no.

Geometry display lists imply that all that is being stored is the pre-T&L geometry itself. That is, it could be usable with any vertex shader that accepts the inputs that the pre-T&L geometry provides.

Frustum culling requires the implementation to know how the vertex shader with transform the vertices, which by GL 3.0 standards is now entirely arbitrary. Thus, no frustum culling is possible.

By “occlusion culling,” I assume that you mean performing occlusion queries on some bounding region and then checking later to see if that object was visible before rendering the actual geometry. The problem there, once again, is the arbitrary T&L. The implementation cannot even tell what the input positional data is, let alone build a bounding volume that is guaranteed to encompass the post-T&L region.

No, the advantage of geometry display lists is in giving the driver the opportunity to rearrange your vertex data into a form most appropriate for rendering. For example, Humus mentioned in a previous thread the idea of making some vertex attributes accessible from a texture rather than a buffer object, to more effectively use parallelism. Well, the driver knows best when to do this, and the only information it needs to know is covered by the GL 3.0 Vertex Array Object (the shader can be patched to get its vertex data from a different place. It’s a quick patch). Thus, a geometry display list from well-built drivers will be able to parse your vertex data and split it into textures and so forth and more optimally use the hardware for maximum vertex throughput.

The only way for geometry display lists to be able to perform any kind of culling would require the reinstatement of fixed-function T&L.

I thought the point of OpenGL 3 was to reduce the complexity of the drivers.

Does nobody read the thread? How many times has it been mentioned how stupidly simple it is to implement geometry display lists? Let me make it abundantly clear for you all:

Driver complexity is not a valid issue here!

Why couldn’t display lists be implemented in something like GLU?

Because the purpose of geometry display lists is to get optimal performance for a particular piece of hardware. You cannot achieve that without hardware-independent code. And GL implementations cannot alter GLU.

With “simplifying driver development” being one of the stated goals for OpenGL 3, how can it not be a valid issue?

So a display list MIGHT give you optimal performance. Or should the driver just not create the display list object if it cannot deliver an (more) optimal version?

Personally I would prefer the core API to give me predictable performance between platforms. As it is now (pre gl3), that is not the case.