glLockArraysEXT\glUnlockArraysEXT?

XBCT · December 30, 2000, 3:33am

Hi!
Yesterday I implemented glLockarraysEXT\UnlockarraysEXT in my Q3Clone…
As you know due to the shaders a lot of different passes is done on one and the same geometry therefore I expected a good performance increase with Locking the Vertex
Array before I do the multiple passes and then Unlocking it afterwards…Even in the standard case(base texture + lightmap) where the vertices should only get transformed once with glLockArraysEXT instead of two times without I get nearly no speed up
(+0 - 1 FPS)…I have a ATI Rage128 16MB.
What am I doing wrong?
Is there any special thing to know about CVA´s?
Do I have to use a special format?
Or is there a certain limit how many elements should be locked at once?

Thanx in advance, XBTC!

system · December 30, 2000, 7:31pm

Perhaps the ATI driver doesn’t optimize your
vertex array much when you lock it?
There’s no requirement they do that…

XBCT · December 31, 2000, 3:21am

Hi!
I think the drivers are in some way optimized for glLockArraysEXT\glUnlockArraysEXT ´cause they are special-purpose drivers for Q3 therefore I think they at least should have optimized it for CVA´s…
Yesterday I changed my Vertex Format to the same that Q3 uses(four floats for every vertex) and that gave + 2-4 Frames but I think this could be as well the speedup through SIMD…Another question does the fourth float mean anything?
´Cause if I don´t set all those fourth floats to 1.0f I get weird rendering errors…

Lev · December 31, 2000, 6:02am

these 4 coordinates are homogenous coordinates,
(x, y, z, w) and in case if w != 0 they represent the same vertex as (x/w, y/w, z/w)., so in case w != 1.0, the vertex in actually not the one you think.

-Lev

XBCT · January 1, 2001, 5:26am

Thanx for replying…This makes sense…
The errors I encountered were probably completely unrelated to the value of the w-coord…

Greets, XBTC!

beavis · January 10, 2001, 2:41am

Hi,
I’ve just done the same thing, XBCT, i.e. I’ve added CVAs to my Q3 clone (gotta have one yourself these days). I tested it on TNT
64 Pro and got about 2-3 fps increase at best. I’m also wondering if there is something I’m doing wrong, or the whole CVA thing is overhyped…

Will test it at home on GeForce 2 GTS, some say these babies are doing a good job with CVAs…

PS: XBCT mail me if you want (bsekura@impaq.com.pl), I’d love to discuss this in detail without taking everyone’s time…

jesusgumbau · January 10, 2001, 4:27am

I have the same problem: I implemented Compiled Vertex Arrays on my program ( not a Q3 clone! ;D ) and I don’t see any increase of the performance.
I have a Creative Voodoo Banshee with 16 Mb, and I konw the driver is prepared for CVA because it appears when I enumerate the glExtensions.

What’s happening?

Eric · January 10, 2001, 6:14am

Just a question: everybody seems to have his Q3A clone so I started mine this afternoon (got bored and wanted some change!).

I have found the BSP specs quite easily but I do not quite understand the different types of faces (1 olygon, 2 atch, 3:mesh, 4:billboard). Anyone who knows of a good site to understand this ???

Regards

Eric

coco · January 10, 2001, 10:21am

I couldn’t see any significant performance increase using CVA either. I have tryed it on a TNT, a GeForce2, and SGI software opengl for windows. I guess glVertexPointer and glDrawElements are already so optimized that CVA wont make much diference? or maybe one should use LockArrays to lock only the vertices that fit in certain video vertex cache or something?

beavis · January 12, 2001, 6:55am

actually, after testing at home on GeForce 2 GTS I get a 5-10 FPS increase when using CVAs… As I thought, this card makes it worthwile… although it’s fast enough without it so why bother in the first place… I was hoping for some increase on the crappy TNT, but I think I’m fillrate bound there so optimizing geometry doesn’t really help…

BwB · January 12, 2001, 8:57am

<begin minor rant>
Do any of you actually plan to make a game out of your Q3 clones or do you just want render engines?

ANY increase in speed is HIGHLY desirable if your actually going to make a game because that means you have a FRACTION of a second more to implement AI, path finding, collision detection, and the rest of general game logic each frame. ID certainly didnt say “ahh, thats only a 2 frame increase in speed, so we wont implement it and in the process we will make the bots dumber and easier to kill or maybe we will make collision less accurate so you dont always hit them when your firing point blank at their heads”.
<end minor rant>

That being said, has anyone tested compiled vertex arrays a slower bus? Maybe for some configurations on some hardware there is a noticable increase… Meaning your game could potentially reach a larger market, on most systems it might not be an incredible benefit, but on some… Just a thought…
Remember… in the game world its all about marketing (and please dont flame me for that!

Ysaneya · January 12, 2001, 9:33am

Guess this is the day. Today is tested the speed of CVAs with a… Vaudoo2! Since i wanted the geometry to be the bottleneck, i created a mesh of ~10000 triangles, with no textures ( only vertex colors ), no special filter, no blending, no lighting. I also reduced the resolution to 640x480.
Then i rendered the mesh in 6 pass. Should make a huge difference between 10000 transforms, or 10000*6 = 60000 transforms.
Guess what ? 1 fps increase.
Hic.

Y.

BwB · January 12, 2001, 11:25am

So who the heck created the compiled vertex arrays extension in the first place? Does it give an advantage on their hardware?

Ysaneya:
And what was your vertex format was it 32bit aligned entirely? Maybe we need to check various byte alignments, on the PS2 (which is not the subject, but the only thing I have this kind of experience with) verts work best 32 or 64 byte aligned (the size of a block transfer from the Emotion Engine to the Vector Unit)

[This message has been edited by BwB (edited 01-12-2001).]

zed · January 12, 2001, 12:44pm

Ysaneya with the voodoo try not actually drawing the triangles on the screen ie point the camera the other way and test the fps then

XBCT · January 12, 2001, 2:50pm

Hi!
Thanx for all your ideas…
I still didn´t find a way to use the CVA´s in a way which gives a real speed-up…
And I tried everything:
Reducing the res to 320*200 and scaling the texture size to one eighth to make sure it´s not fillrate-limited…
I tried the different vertex formats and found the one Carmack uses(4 floats per vertex) is the fastest but it still gives only 1 or 2 FPS additionally…

P.s.:
>Do any of you actually plan to make a game out of your Q3 clones or do you just want render engines?<

First of all I do it for learning purposes and once I even had a team+story+level-designers and activision was interested in our ideas but I had to give it up ´cause school eats up to much time…

P.P.s.:Beavis I would be interested in discussing the subject but I´m not of much help here ´cause I still didn´t find where the problem is…If you know something new please mail me.Perhaps we should ask the Nvidia guys here?

Greets, XBTC!

Julio · January 12, 2001, 4:14pm

You are sorting the textures like they do in q3 aren’t you? that may help some performance wise.

Gorg · January 12, 2001, 4:24pm

If you blow the vertex cache you won’t notice any difference!!

If you read the Q3 engine specs, it says it is using 1k vertex buffers. Meaning they only put 1024 vertices in a vertex buffer. If you look at nivdia paper for Direct3d vertex buffers, they say that around 2000 vertices can be optimal.

[This message has been edited by Gorg (edited 01-12-2001).]

XBCT · January 13, 2001, 12:55am

Hi!
Thanx for your answers…
Julio: Yes everything is sorted by shader…
Gorg: Sounds interesting I´ll try it…

Greets, XBTC!

XBCT · January 13, 2001, 2:31am

Gorg I checked my code now and my vertex buffers are already limited to 1024 Vertices…

Greets, XBTC!

[This message has been edited by XBCT (edited 01-13-2001).]

Gorg · January 13, 2001, 8:05am

Have you tried running quake3 and setting off the CVA to see what will the slowdown be?