Siggraph Asia 2008 slides suggestions for OpenGL

With immediate mode gone the main uses of display lists are:
1/ Setup of rendering state for each pass (often requiring large state changes such as when switching from shadow-map rendering to the main render pass)
2/ Switching to a new material (Change of textures, shaders, uniforms)
3/ Combining multiple draw calls.

If we dont get immutable state objects in 3.1 then obviously we will still need display lists to make major state changes as efficient as possible.
But if we do get state objects then we only need a small number of calls to bind different objects to the context to change to a different rendering pass, and only one call to bind a program object to change the material being rendered.

If something like the proposed lpDrawArrays/lpDrawElements in pipeline 004 is implimented then we may not need display lists to reduce the number of draw calls either.
Likewise multithread geometry generation would only need to write VBO’s in parallel then pass a few values to the main rendering thread.

The other main argument for display lists is for conditional state changes, but this is only because the spec for BeginConditionalRender only mentions rendering commands.
It would have made much more sense if BeginConditionalRender simply ignored EVERY opengl command until the EndConditionalRender.

Whether or not display lists are still needed really depends on what is going to be in OpenGL 3.1.
They can only be removed if all uses for them are replaced by something that has the same or better performance.
I currently use display lists for all my major state changes, but i dont mind changing if its to something better.
The depreciation of display lists does however seem to be premature, OpenGL 3.0 needs them, and until we actually write some programs for OpenGL 3.1 we wont know if we still need them.

Is there anything else display lists are useful for?

I would prefer that they not be removed until it has been shown that they are no longer useful.
Most of the complexity will be removed with the immediate mode commands, and problems with selectors can be fixed by requiring direct-state-access commands.

…and…

4/ <u>Hyper-efficient single draw call submission</u>.

I’m just telling you, even with super-optimized triangle list orderings, interleaved VBOs, carefully tuned attribute formats, VAOs, etc., I still get a significant speed-up using display lists as geometry only display lists each for single batch submission on NVidia.

Dunno what they’re doing … maybe squashing multiple batches into shared VBOs. But whatever black magic that is, <u>vender-specific batch optimization</u> is another great use for display lists beyond the state/state/combine batches use cases you listed above.

It’s only a shame it can’t be used for run-time loaded batches (like a re-entrant vendor-specific CPU-only GLU function could).

“It’s only a shame it can’t be used for run-time loaded batches”

?? How exactly do you mean that?

agree that display lists are useful (still). i use them like crazy on many renderstate changes especially with shaders or simply switching to ortho projection mode and back. reduces API overhead and is cached on GPU? many advantages imho.

afaik, i even heard/read/seen/not_made_up that Direct3D 11 is planning to introduce similar functionality as display lists.

By that I mean it takes too long. And by run-time, I mean after startup while we’re rendering.

We have a hard 60Hz requirement (everything else is secondary to that). That means 16.66ms for everything, including resolve/swap overhead. If you time display list creation, it’s completely incompatible with that. Because it requires the GL context, it can’t be off-loaded to a background thread like loading up pre-mapped PBOs can. Also, you can’t “query” the compiled display list out of the driver and store it in a blob file for fast pre-compiled display list loading later, otherwise you could do that in a preprocess tool. Sooo… we can’t use display lists for run-time loaded batches. And that’s most of them.

Could you create second OpenGL context in background thread and share it with contect of primary thread? Then you could build lists in second thread, but use them in first thread, no?

Are you saying that you wrap one single draw call into a display list and measure significant performance improvements?

CatDog

I haven’t tried this with display lists but I have with shader compilation. Unfortunately, there is some kind of global lock which blocks rendering in the first thread when a shader is being compiled in a second thread.

Yes. More precisely, we take a whole frame of interleaved VBO draw calls (measure performance), then compile each draw call into its own display list and render with that (measure performance). There is a nice 20% performance improvement from the VBO path. One VBO batch becomes one display list.

However, display list compile times were as much as 1.4 ms per batch, so with a boatload of batches and no “background” compilation support, run-time compilation is a non-starter.

In a previous thread, it was suggested that if you combine multiple batches into VBOs (in a draw-coherent way, presumably), you probably could get close to NVidia display list speed. Haven’t tried this yet on a large scale.

I haven’t tried this with display lists but I have with shader compilation. Unfortunately, there is some kind of global lock which blocks rendering in the first thread when a shader is being compiled in a second thread. [/QUOTE]
Yeah, that’s one big reason why we’re moving away from GLSL. No support for background compilation.

OTOH, with Cg, you can do this… (feature request hint)

Also, re the multiple context approach (thanks for the suggestion BTW), it’s definitely worth checking, but from everything I’ve read having multiple GL contexts for the same GPU is never a performance winning way to go. Anybody have know differently?

Wouldn’t it be nice if some nVidia guy jumped in here to explain what’s going on?

Ain’t gonna happen, so here’s my guess: They are doing their own batch optimization (1.4ms per batch…) that’s much better than your optimization. Or, your own optimization is very good preliminary work for their optimization.

Also, I’d like to know details about the memory overhead, when you put one draw call in a display list. I would have lots of them.

CatDog

They could probably tell us, but then they’d have to kill us.

Outside of ordering/indexing for the caches, what’s left but the format, state sort and submission size? Only the Shadow knows…

Hey, I don’t ask how they do it. I just want to know what they are doing and which hardware is affected. If this leads to 20% performance win on nVidia, it would be good for nVidia as well.

Oh well. Maybe they didn’t know that either.

Uhm wait! Maybe Dark Photon is a nVidia guy!! :wink: :slight_smile:

CatDog

Yeah, planting “golden goose” performance eggs for the driver guys to make good on :slight_smile: Muhaha. Uh – what’s this pink slip for?

No, just a third-party developer like you, trying to get the most out of a black box.

Hi,

I noticed the forum entry about the slides, only today.

The slides are quite interesting, are provide lots of usefull information.

For me the most interesting part is at the end when they talk about the porting issues to OpenGL from another APIs.

At least on the gaming industry I think that OpenGL is becoming less and less relevant.

Thanks to the poliferation of graphical engines, the developers have the oportunity to programm against a common API that will then take advantage of the best 3D API that is supported on the target system.

I also found interesting that NVidia was so supportive of OpenGL on the desktop, specially since their developer forum tends to be full of DirectX information nowadays.

Still I know that they are quite supportive of OpenGL 3.0 and their OpenGL driver is also quite good.

Anyway the slides were quite a good reading.