So the fact that vendor X does a sad job on render path Y is a really dumb reason to say that path Y doesn’t matter.
It’s all a matter of effort vs. reward.
NVIDIA’s graphics card division is… well… things aren’t going well for them. They’re 6 months late with a DX11 card, the card they eventually released is not exactly shipping in quantity, it runs fantastically hot, etc.
ATI by contrast was able to ship 4 DX11 chips in 6 months, and they’re able to meet demand in selling those chips. They’re selling DX11 hardware to the mainstream market, while NVIDIA can’t even produce mainstream (sub-$200) DX11 cards after a 6 month delay.
One company is winning, and the other is losing.
The simple economic reality is this: development resources are not infinite. It’d be great if we could optimize everything, everywhere, for every piece of hardware. But what matters most is doing the greatest good for the greatest number. Adding a rendering path for display lists only helps NVIDIA card users; for most people, that means some percentage of their customer base less than 100%. This rendering path requires testing, debugging, and other care&feeding.
Or, one could spend my development resources tweaking shaders to make them faster and gain a performance benefit there. Alternatively, since performance is being lost anyway, one could make the game look better with the same performance. Maybe make the shaders more complex, or add in HDR or bloom, or whatever. Unlike the display list optimization, both of these will be useful for 100% of the customer base.
Where are the development resources better spent? On the slowly dwindling population of NVIDIA card holders? Or on all of the potential customers? Yes, it’d be nice if development resources could be spent on both. And for some, they can afford it; more power to them.
The rest of the developers would rather have a single path that both NVIDIA and ATI are willing to optimize as much as possible. Right now, that path is VBOs.
and he proposed CBOs. …after which this degenerated into a food-fight over what the “cause” of the speed-up is, and (in some cases) how you can could kludge around those causes rather than fix the underlying problem(s).
That’s how you see it, but that’s not what the actual discussion is.
First, identifying the cause of the performance increase from display lists or bindless is vital to determining how to actually achieve it. If the cause of the increase is not what was identified in the original post, then CBOs will not help! And proposing something that will not actually solve the problem is a waste of everyone’s time.
If you want to consider any discussion of whether CBOs will actually solve the problem to be missing the point, well, that’s something you’ll have to deal with yourself.
Second, “kludging” around the problem is more likely to solve it than inventing an entire new API. Bindless is nothing if not a gigantic kludge, yet you seem totally happy with it.
Something is needed (API support) to fill this performance gap in a simple, cross-vendor way.
This thread is not about “something” that solves the problem. It is not a thread for discussing arbitrary solutions to the problem. It is about a specific solution. A solution who’s efficacy is far from settled.
using 64-bit buffer handles, which just so happen to be GPU addresses on some hardware
Those are not 64-bit handles; they are actual GPU addresses. Even if you completely ignore the fact that the function is called glBufferAddressRangeNV and the fact that the spec constantly refers to them as “addresses”, glBufferAddressRangeNV doesn’t take an offset. So that 64-bit value must be referring to an address. Either the address returned from querying it or an offset from the queried value.
If it looks like an address, acts like an address, and everyone calls it an address, then it is an address. So please don’t act like bindless is something that could be trivially adopted by the ARB or something that doesn’t break the buffer object abstraction.