Sept. Meeting notes

Nice thoughts Cab for sure. :slight_smile:
I completly agree about the unified interface stuff…
Anyhow, and i don’t know that much for streaming data but… today, VAR is the most performant, reliable and handy mechanism if u need to use highly optimised & customised data format. I mean, there is no HINT or such things… :wink: For all theses reasons VAO doesn’t make it.

I suppose you refer to things like discard / preserve hints when updating the VAO. That’s true. On the other hand, VAR has priorities when allocating memory. And you don’t have to worry about synchronization with VAO. All in all, both have nice and bad things. VAR is more powerfull, but it’s also easier to mess up with it. How much time did you spend debugging because you were writing to a zone of memory the GPU was still reading ? I find VAO’s a lot cleaner… VAR always seemed to me to be a big “hack” to grant access to video memory… but maybe that’s just me :slight_smile:

Y.

Then, once you’ve stored your vertices into VRAM could u explain for which good reason they’ll be sent back to system memory through the BUS again?

You didn’t specify the AGP bus. The video card has its own bus to its video RAM.

Bravo, but it is non-sense.

Yeah, it’s “non-sense.” Except how it completely defeats your argument that the VAO extension is somehow worse than VAR.

The 8500 implementation of VAO may not perform as good as the GeForce implementation of VAR. That isn’t the same as saying that VAO is worse than VAR.

There is nothing intrinsically in the VAO extension that makes it slower or more CPU reliant than VAR. The only thing that is really different (besides having direct access to memory) is the lack of synchronization events in VAR. VAR requires manual synchronization, while VAO does sync-ing automatically.

It is a well-known fact that ATi’s drivers are not as well developed as nVidia’s. So, when you see an 8500’s VAO implementation losing to a GeForce’s VAR implementation, I fail to understand why you are surprised. It may even be some hardware problem that’s getting in the way. In any case, it has nothing to do with the VAO extension itself.

And if you think this is “non-sense”, then you don’t understand the difference between a specification and its implementation.

Y.
As i said i do not use dynamically modified nor streaming data. Thus, i haven’t got to worry that much in terms of implementation… Maybe the only worries i get were optimising my vertex structure to fill all the needed geometry into 16Mb. :wink:
Moreover we’re all coders here and i can understand that VAR design looks a bit like an alien mechanism into GL… okay…
But i haven’t seen and i was not able to do what i want & need to do with VAO regardless of performances…
But globally i agree with what you’re saying :slight_smile:

Originally posted by Korval:
if you think this is “non-sense”, then you don’t understand the difference between a specification and its implementation.

Ok Korval, i don’t want a good big long battle here… :slight_smile: Let’s say that for the BUS stuff there was misunderstanding. The fact is that i’m suspecting VAO to store in AGP & then send it to the board when u need to display. (but i can’t prove that right now).

Now talking about ->T&L<- implementations. It look obvious that if you’ve planned to write & run a T&L based engine then u’ll have to face != types of VertexProcessing implementations as there is no unified interface under GL at the very moment.
So what do u expect then? The two current wellknown challengers are NV & ATI.
And respectively they’ve implemented VAR & VAO. Thus, for the same program on the same computer running both video cards families i expect let’s say similar results. But it’s not like that. Even a GF2MX beat a 8500 using the ideal configuration for a T&L chip
->
all geometry stored onboard 1 prim = 1 strip. (16MB)
single texturing & gouraud with a max of 16Mb textures.
lighting enabled of course, it is supposed to be done by the GPU.

I can remember months ago it was worst than it is with ATI drivers. Now it runs… and i could add : it is running within the frame!!
But… when cpu is 89% occupancy with a 8500LE it is the half on the same machine running a crappy GF.

that’s all i’ve got to say. take it easy.

Originally posted by Ozzy:
[b] The fact is that i’m suspecting VAO to store in AGP & then send it to the board when u need to display. (but i can’t prove that right now).

[/b]

To repeat Korval : Maybe the current ATI implementation does. But the specs do not specify that.

Ok… and in ten years i will be 42. :slight_smile:
Np let’s wait for better drivers.

Ozzy, just curious, what did you want / needed to do with VAR that you couldn’t do with VAO ? Excepting very specific and unusual usage of fences, i don’t really see… they should more or less be “functionnaly” equivalent.

I don’t think a GF2MX beats a R8500 at T&L, when using VAR from one side and VAO on the other. What makes you think that?

CPU usage: i wouldn’t trust that one. It’s well known that some functions, like SwapBuffers when using vsync, make the cpu go crazy; but it doesn’t mean it’s slow at rendering… god only knows (and maybe ATI’s driver team too…) what happens in the driver :slight_smile:

Finally, VAO storing in AGP memory: i have no idea. I’d expect it to store it in AGP memory when using GL_DYNAMIC_ATI and in video memory (hence no bus transfer) when using GL_STATIC_ATI. It might not be the case, i honnestly don’t know. But well, that’s what i’d expect logically. After all, don’t forget that before the detonators 40.41, VAR memory allocations were limited to 32 Mb, even when you had a 128 Mb video card. Speak about driver limitations, heh…

Y.

Basically my structs using VAR look like this ->

typedef struct
{
VR_SHORT x,y,z,rienz; //coords $0,$2,$4,$6
VR_SHORT nx,ny,nz,rien; //normale… $8,$0a,$0c,$0e
VR_BYTE r,g,b,a; //couleurs… $10,$11,$12,$13
VR_UV texCoord[VR_MAX_TEXTURE_UNITS]; //coordonees de textures… $14,$18,
// $1c,$20,
// $24,$28
// $2c,$30

VR_DWORD		pad;																			//:(

}NV_VERTEX;


Of course size is varying depending on the texture channels used by the prim.

Moreover with VAR i explicitly store the vertices in VRAM while with VAO i can only pray for it to store my custom vertices as they are defined.
Moreover, customised vertex formats speed up GPU processing on GF hw then it’s really nice to get performances + size advantages at the same time.

As my geometry is 100% static then no need to sync. Thus while rendering frame -1 in these circumstances i enjoy my 2 VBL in one.
understand parallelisation. While GPU is rendering frame-1 , CPU is cooking next lists, manage the game, play music and smoke a joint.

This is definitely not a vsync problem while swapping etc… the VAO cpu overhead is occuring while displaying primitive only.
Then if cpu is too much involved just say bye bye to parallelism. Frankly i don’t enjoy this kind of high penalties. and this is explaining why on even an InnoGf2Mx200 it’s smooth like … u know… :wink:

got to go…

I fail to see why you couldn’t be able to use the same structure with VAO. In addition, the priorities in VAR are just like hints. Because you request video memory (priority 1.0) doesn’t mean you won’t get AGP memory. To be convinced, try allocating 200 Mb of 1.0 memory; with the latest detonators, it will succeed, and for sure it doesn’t fit in video memory.

I was not suggesting that the CPU usage was a vsync problem. It was merely an example of why i think CPU usage is worthless in itself. I’d be more interested to see a test in which you make a CPU calculation (physics, AI, whatever), and due to parallelization, it still runs at 100% framerate with VAR, and slows down with VAO. Can you demonstrate that ? If not, i for one wouldn’t rely on CPU usage :slight_smile:

Y.

Originally posted by Ysaneya:
I fail to see why you couldn’t be able to use the same structure with VAO. In addition, the priorities in VAR are just like hints. Because you request video memory (priority 1.0) doesn’t mean you won’t get AGP memory. To be convinced, try allocating 200 Mb of 1.0 memory; with the latest detonators, it will succeed, and for sure it doesn’t fit in video memory.

Since March 2002 (as far as i know) VAO implementation is bugged for data types != GL_FLOAT. (try GL_SHORT,GL_BYTE etc)

and with VAR 16Mb is far enough for our project. :wink: Thus we don’t have to worry about the new memory management settled in current NV drivers. :slight_smile:

Originally posted by Ysaneya:
[b]I was not suggesting that the CPU usage was a vsync problem. It was merely an example of why i think CPU usage is worthless in itself. I’d be more interested to see a test in which you make a CPU calculation (physics, AI, whatever), and due to parallelization, it still runs at 100% framerate with VAR, and slows down with VAO. Can you demonstrate that ? If not, i for one wouldn’t rely on CPU usage :slight_smile:

Y.[/b]

I’m afraid that you’ll have to wait for the first episode of Orky’s adventures! :wink: )

cpu usage is always 100% if your thread does something, unimportant if he does much actually… but yeah, it should not use the cpu to convert. possibly it has to, due some hardware restrictions…
but VAO is cool anyways. else we should as well get a TAR, Texture Array Range, and PAR, and all that. or MAR, memory array range, and just allocate every memory and do the whole memorymanagement for ourselfes. this is a) not the way opengl works, and b) stupid, as the ones that code the drivers know bether how to optimize for the particular gpu. VAO on geforces would surely rock…