Nutty: He’s saying that whey they use the “plain” OpenGL pipeline on the card, which is apparently NOT entirey composed of shader opcodes that we have access to, the number of clock cycles or opcode equivalents that a vertex transform takes is one less than if you were executing the semantically equivalent program, as defined by the vertex shader language available to us.
Opla:
yes, and it’s much better than VAR, you don’t have to manage and synchronize the AGP memory.
Except I can manage it more efficiently than they can, as I know what the pattern of writing and accessing is. If I allocate a VAR and split it in two, I only need to test a fence when I pass the end of a chunk, a la double-buffer. This happens maybe once every 10 or 20 meshes I render, depending on the size of the meshes.
Meanwhile, the ATI driver has to set a fence for each buffer I upload and render, as it can’t know whether I will soon ask to re-upload to that buffer or not. Or it will have to do mumbo-jumbo swithc-aroo behind the scenes, which degrades to very much the same thing in the end. They cannot do the double-buffering thing, because they don’t know the lifetime of each individual object upload I make.
Then there’s the problem of having to upload geometry to the buffer in the first place. If I’m dynamically generating the geometry, they impose an extra copy pass on me. I have no idea what the implementation is, but they may even blow my L1 cache when they upload the data, even if I’m conscientious about writing to memory with un-cached stores.
I’ve heard several times that the ATI people are amenable to adding an extension so you can get access to the buffer. If they do that, and relax synchronization so that I don’t have to synchronize per mesh, then they’ll be equivalent. Until then, the extension may appear simpler to use, but it’s simpler to use in the same way that glVertex3f() is simpler to use than glDrawRangeElements().
[This message has been edited by jwatte (edited 02-12-2002).]