But difference will be negliable.
Have any specific evidence of that? Once again, have you seen a demo that compares the best OpenGL can do in instancing situations to the best D3D can do?
The onus isn’t on us to provide a reason for this extension; we already have one (worst case, it does nothing. Best case, non-trivial performacne gains. Ergo worthwhile). The onus is on you to provide some specific evidence that shows how it would not be useful.
Not sure what you’re saying “not really” about, as the data passing the AGP is still the same regardless of pre-T&L cache utilization
I don’t think I realized that you could replicate vertex data from indices, thus avoiding replicating the actual vertex data and thus potentially blowing your cache.
The other is that instancing already screws up for the pre-T&L already as it requires two vertex streams.
It isn’t that bad (or bad at all). It simply requires a different kind of per-vertex fetch operation. It doesn’t screw up the cache unless the hardware has some horrible limitation.
The third reason I’m afraid involves some non-public information that I can’t disclose, which I believe is good argument why what you say isn’t the case in practice, but it’s hard of course to make this convincing without going into details.
Considering that you work for ATi, this means that it’s ATi’s problem, not a problem with the concept as a whole or hardware in general. They should have made a real graphics card this go around with real features, rather than a simple knockoff of the R300.
It has clogged up the API since all extensions typically are made to be orthogonal with immediate mode calls. There’s a good reason why both display lists and immediate mode were ditched in OpenGL ES.
And yet, it is immediate mode which gives OpenGL some semblence of instanced rendering.
I think there are other priorities that are more important right now.
Such as? Performance should always be priority #1. Just because ATi doesn’t see it as a priority doesn’t mean that it isn’t a priority. And, looking at what the ARB has cooking, it ain’t much. This is not a highly complex spec that requires 2 years (including a failed year) to make progress on. It is a spec of already known behavior that can, in good hardware, potentially improve performance.