GL3 and matrices

Xmas · November 19, 2007, 3:29am

To quote yourself: “Please pay attention to the thread.”

It is entirely possible for the compiler to check whether the vertex shader calculates gl_Position as uniform mat4 * attribute vec4 (or similar). If that’s the case mark the uniform as “MVP matrix” and the attribute as “vertex position”. No fixed function required at all.

knackered · November 19, 2007, 3:45am

so you’re only arguing against frustum culling in geometry display lists, and not arguing against geometry display lists themselves?
if so, and if you’ve been using display lists in recent years on nvidia hardware, you’ve already been paying the price twice - and yet nvidia display lists enable maximum hardware spec throughput and have done for years. We’re not talking about a proposed feature, this feature has been in successful use for some considerable time.
Also, the fact that you recognise the cost of frustum culling in terms of cycles should indicate to you how much your application would gain from offloading the task to the GPU. Some engineering scenes I’ve dealt with have cost 40% in the cull traversal alone.

Korval · November 19, 2007, 12:28pm

With “simplifying driver development” being one of the stated goals for OpenGL 3, how can it not be a valid issue?

It’s not a valid issue because it has been shown several times that implementing geometry display lists doesn’t make drivers more complex.

So a display list MIGHT give you optimal performance. Or should the driver just not create the display list object if it cannot deliver an (more) optimal version?

The bare minimum I would expect of a geometry display list implementation is to simply store the VAO and draw call(s?) internally, and simply regurgitate them upon command. That is, if the driver can’t/won’t do better than your VAO and buffers, then it will simply use your stuff directly.

It takes maybe 4 hours to code.

Personally I would prefer the core API to give me predictable performance between platforms.

Nobody’s forcing you to use display lists.

Lord_crc · November 19, 2007, 2:32pm

I guess we’re talking about different kinds of complexity then, so never mind.

If it is guaranteed that GDL’s would be the fastest alternative (as in no other alternative, including extensions, would be faster), then I guess it would be nice to have them. However if that isn’t the case, then imho I just don’t quite see the point of having them in the core API.

pudman · November 19, 2007, 2:34pm

It takes maybe 4 hours to code.

I estimate more like 3.5 hours. At least, that’s the rate at which an nVidia developer would do it in the G8x GL3.0 driver. ATI I estimate 3.7 hours. It would be more like 3.4 hours without the AMD merger.

Sorry, I can’t help myself. Simplicity or speed of coding the implementation plays little part in deciding an architecture. It’s definitely not a reason to leave in a ‘feature’.

That said, I have no opinion on this ‘feature’, only on subjective coding times.

knackered · November 19, 2007, 4:20pm

You have no guarantee of anything in OpenGL, as you don’t in D3d. You have only your own benchmarking to go on. If you thought otherwise, then you’ve naive. What you would have would be a guarantee that they would be no slower than if you manually set up the objects and called the draw commands yourself.
If you seriously can’t see the benefit of having this light-weight semantic in the API after all the points that have been made in this thread, then there’s not much else to say.

Humus · November 19, 2007, 5:25pm

But the driver is not the right place to put this. It’s better put in a middleware layer. I’m sure there are plenty of open source libraries that you could use.

knackered · November 19, 2007, 6:23pm

They need to be in the ICD to benefit from any hardware acceleration or hardware specific optimisation.

Humus · November 20, 2007, 1:06pm

If you want hardware assisted culling, I think we should come up with a proper API for that instead of expecting automagic action under the hood for display lists under limited conditions. Predicated rendering is an example of a proper hardware assisted form of culling. I’m open to other ideas.

knackered · November 20, 2007, 2:56pm

aside from not wanting the hardware to be given the opportunity to cull, have you any objections to the other reason for geometry display lists? i.e. giving the IHV the opportunity to optimise the mesh for that specific piece hardware, automagically-so-to-speak?

MZ1 · November 20, 2007, 5:58pm

from older GL3 thread:

Ysaneya · November 21, 2007, 2:29am

Come on, nobody said the hardware shouldn’t be able to cull. We are arguing that display lists isn’t the right place for that.

knackered · November 21, 2007, 3:18am

I think the culling has become a bit of a distraction for you all. Forget about the culling - it’s just something you could get as an added bonus to geometry display lists, not their main reason for existing.

Xmas · November 21, 2007, 3:08pm

Well, why not? Every implementation does a lot of optimizations behind your back, why is this one particularly harmful?

knackered · November 21, 2007, 3:31pm

For me there just isn’t an argument against them. It’s such a simple thing to add, has so many possible benefits for prototyping (culling) and full blown apps (buffer re-formatting), and has already been proven to provide incredible performance on nvidia hardware. If the API is supposed to be a true abstraction of current and future hardware, then you have to accept that renderers use buffer objects primarily to render ‘meshes’ which in turn should be given their own level of abstraction - so long as it doesn’t add unnecessary complexity to an implementation, which with the new object API it simply won’t.

Ysaneya · November 21, 2007, 4:28pm

Well, why not? Every implementation does a lot of optimizations behind your back, why is this one particularly harmful? [/QUOTE]

Because it’s not optimizing anything, in my case it’s actually slowing me down (spending cpu cycles on something I already do).

Y.

knackered · November 21, 2007, 4:44pm

But,but,but that’s an implementation detail. You have no control over what an implementation does, you just benchmark. If something slows you down on your test hardware, don’t use it - if you get a boost, do use it. The same with every other opengl feature. Nobody’s forcing you to use it - whether it’s there or not, using it is your choice.
Why should everyone else pay the price of reduced acceleration slots just because you’ve some vague paranoia about the driver possibly expending some cpu cycles in some implementation detail?

Xmas · November 22, 2007, 4:04am

If you do your own culling you’re probably not using display lists. It’s an optimization for the common case. And it’s likely that any culling used for display lists is inexpensive.

Overmind · November 22, 2007, 4:21am

Could anyone please explain to me what can be culled in display lists? Because at display list creation time, the modelview matrix is usually unknown, so how could the driver possibly know what to cull?

Xmas · November 22, 2007, 4:49am

The driver can generate bounding volumes at display list creation time. Then at draw time it can transform the bounding volumes and check whether they are completely outside the view frustum.