VAO probs :)

Ysaneya · September 19, 2002, 5:49am

Do you have the latest drivers ? Because hum… ATI and drivers… last time i worked with the beta drivers, i found that OpenGL was outdated. I now prefer to work with the “official” catalyst drivers

Y.

Ozzy · September 19, 2002, 7:07am

Muahahaha! ) (nice thought thx)

i was using the latest catalyst stuff. (sept release)

Btw, that’s too bad that it’s again a problem with a driver implementation which is ruining performances of what i think at last a pretty nice piece of hw.
In fact, we should ask for a D3D expert to get his opinion about that 8500LE board when using static data with the microsoft API.
(if static geometry is possible with D3D)

[

[This message has been edited by Ozzy (edited 09-19-2002).]

Ysaneya · September 19, 2002, 8:06am

When i got my Radeon 8500, that’s the first thing i tested. I’m no D3D expert, but i can still compile and execute some stuff from the SDK

Anyway, the results were quite incredible… I didn’t achieve more than 10 M Tri / sec (all static) with OpenGL, but the optimized mesh sample from D3D was running at a smooth 40 M Tri / sec. Talk about a difference

I’m convinced that the Radeon is one of the most powerfull cards here, it’s just that you don’t see it in OpenGL …

Y.

Korval · September 19, 2002, 10:06am

For maximum hardware performance, all vertex arrays except for color and secondary color should always be specified to use float
as the component type. Color and secondary color arrays may be specified to use either float or 4-component unsigned byte as the
component type.

I would imagine (RE: hope) that this has changed for the 9700.

Humus · September 19, 2002, 3:04pm

Originally posted by Ysaneya:
I’m convinced that the Radeon is one of the most powerfull cards here, it’s just that you don’t see it in OpenGL …

I’m not sure what you guys do to keep your Radeons performance down , but I have never had any performance problems with polygon throughput. In fact, I wrote a demo once doing vertex skinning in Direct3D, I also did the same thing in OpenGL, same model, same effect. There were no significant performance difference between OpenGL and D3D version. It was like 430 fps (OpenGL) vs. 425 fps (D3D) when looking away from the model and 270 fps (OpenGL) vs. 280 fps (D3D) when looking straight at the model.

Ozzy · September 19, 2002, 9:42pm

Humus, this is not the point to keep that 8500 board performance down!! :))
I have used a celeron300 for the tests ok…
i know that it has a direct impact on dynamic vertex througput! But you certainly also know that it should have no impact on performances when using resident geometry…

When overclocking that celeron to 450Mhz i have noticed an interesting boost thus when can definitely say that data are not stored onboard and that the CPU is too much involved in the VAO implementation.

Concerning D3D vs GL on the same system it sounds pretty logical to get approx the same results Moreover and for GL only, i think that public 3DBenchmarks are quite good for evaluating perfs using standart/classic mechanisms (without dynamic lighting).
What i want to say, is that there shouldn’t be so much differences using CVA (without lighting) on a Radeon or on a Geforce coz sending the data through the bus is the bottleneck.
I’m insisting on the dynamic lighting coz using the fixed funcs has always been a problem (in term of perfs) with ATI drivers!
It is really expensive and again far away from the results with NV implementations.

In fact, the trouble is that the game we’re working on has been designed with T&L in mind. And while it will run smoothly with a stable 60FPS with a low cpu config on a GF256 it will not be the same story with the previous generation of ATI boards which were supposed to be much more powerful than the first GF256 series!

Conclusion: if you can’t store (statically)and process the data in VRAM you are loosing all the advantages of what is called a T&L board.

And finally, ehehe Korval, it could be funny to test that radeon9700!! anyone can rent a 9700 to me? contact me! ;)) Come on ATI dudes…

tchOo

Humus · September 19, 2002, 10:47pm

Well, Ysaneya stated that the situation would be significantly better in D3D, and I show that in my experience that’s not true. I’ve been able to reach very similar results in both API’s. I’m also pretty sure that data IS stored locally. I went back and fired up that old OpenGL app I talked about in the previous post. It still ran at roughly 430 fps when looking away and 270 fps when looking at the model. Now I disabled VAO and let it run through the standard vertex array path, and performance dropped down to 82 fps constant regardless of viewing angle. So it seams pretty much obviuos to me that the vertices are stored onboard, otherwise I can’t see how I could get that huge performance boost by using VAO. Especially when the performance also matches the performance of D3D at a similar task.

Ysaneya · September 19, 2002, 11:34pm

A precision: at the time i did that test, i had a PIII-500 Mhz.

Since then i completely changed my system. I now have an Athlon 1.4 with 512 Mb of RAM. Here’s a test application to benchmark Radeons with 4 lights (note: it was not done by me, credits goes to whoever did it…):
http://www.fl-tw.com/opengl/texbug.exe

My results are:

Arrays : 4.5 FPS, 1228560 TPS, 2.18+2.11 Mb/s
CVA : 9.8 FPS, 2539024 TPS, 4.76+4.61 Mb/s
Lists : 29.5 FPS, 7412312 TPS, 14.25+13.83 Mb/s
Lists N/L : 103.9 FPS, 25595000 TPS, 50.20+48.71 Mb/s
Streaming VAO : 28.0 FPS, 7043744 TPS, 13.53+13.13 Mb/s
Static VAO : 34.4 FPS, 8558968 TPS, 16.61+16.12 Mb/s

Y.

harsman · September 19, 2002, 11:55pm

Wow, thats weird. My paltry GeForce 2MX beats your radeon when using vanilla vertex arrays or CVAs. My results:

AMD K7/MMX/3DNOW 899 Mhz
OpenGL ICD on NVIDIA Corporation GeForce2 MX/AGP/3DNOW! [1.3.1]

Arrays : 23.2 FPS, 5815184 TPS, 11.23+10.89 Mb/s
CVA : 23.1 FPS, 5815184 TPS, 11.18+10.84 Mb/s
Lists : 24.1 FPS, 6019944 TPS, 11.66+11.32 Mb/s
Lists N/L : 40.9 FPS, 10156096 TPS, 19.74+19.15 Mb/s
Streaming VAO : unable to initialize
Static VAO : unable to initialize

Ozzy · September 20, 2002, 1:26am

Originally posted by Humus:
. So it seams pretty much obviuos to me that the vertices are stored onboard, otherwise I can’t see how I could get that huge performance boost by using VAO. Especially when the performance also matches the performance of D3D at a similar task.

Well, i’m not convinced of that fact.
Look Humus, when u are using VAR with data in AGP mem it’s faster than CVA. But it’s also slower than using VRAM.
This could explain VAO performances vs CVA.

I really think that the VAO implementation load data into AGP mem instead of VRAM. Then the GPU would only deals with a kind of cache (in VRAM) to process the vertices.

Moreover if finally the data are really stored in VRAM then performances are incredibly slow for such a mechanism.

Again, this is pure speculation, only guys at ATI could answer. Anyhow, i think it’s a bit ridiculous for a T&L board to have its performances drastically varying depending on the CPU &| BUS frequencies. Something is wrong in the design somewhere :))

Someone would be nice enough to test the little app with a 9700 and latest drivers? please.

Ozzy · September 20, 2002, 1:39am

Originally posted by harsman:
Wow, thats weird. My paltry GeForce 2MX beats your radeon when using vanilla vertex arrays or CVAs

i think you get it man! Too bad that VAR are not tested in this app.

Humus · September 20, 2002, 5:10am

Not sure what to make out of this little app (why is it called texbug btw?), results from Radeon 8500:

Arrays : 34.1 FPS, 8436112 TPS, 16.48+15.99 Mb/s
CVA : 33.3 FPS, 8231352 TPS, 16.11+15.63 Mb/s
Lists : 34.4 FPS, 8640872 TPS, 16.63+16.14 Mb/s
Lists N/L : 126.0 FPS, 31000664 TPS, 60.86+59.05 Mb/s
Streaming VAO : 33.6 FPS, 8395160 TPS, 16.21+15.73 Mb/s
Static VAO : 34.8 FPS, 8640872 TPS, 16.83+16.33 Mb/s

Pretty much the same performance regardless of mode, except without lighting.
Would be interesting to see the source of this app, but anyway, there should be no difference in number of bytes pulled from VRAM/AGP (or whereever it’s stored) between lists and lists N/L mode. As there’s a huge performance difference the bottleneck is obviously something else, it may be as simple as the T&L unit not being faster. That the performance is similar between all the other modes supports that theory.
I’m also not surprised that a GF2MX beats the Radeon in some tests, the GF2MX has the T&L engine of the GF2. In pretty much everything the GF2 beat the Radeon when it came to T&L throughput, even though the Radeon on the paper should be faster.

[This message has been edited by Humus (edited 09-20-2002).]

harsman · September 20, 2002, 5:34am

I thought Ysaneya had a Radeon 8500 not a regular radeon 1. You seem to get much better scores but still low performance from VAO, even with static data.

Ysaneya · September 20, 2002, 5:38am

No, i really have a 8500, just didn’t mention it anymore since i already did a few times No really, no joke, it’s really a 8500

Anyway, i have uploaded the main source file if you wanna see it: http://www.fl-tw.com/opengl/texbug.cpp

Why is it called texbug? Can’t remember. I thought i had renamed it to demonstrate a texturing bug with the first drivers release, but since there is no texture, i dunno…

Y.

Ysaneya · September 20, 2002, 7:24am

I updated my driver from catalyst 2.2 to 2.3, and i got an improvement of 5 (!!) with regular vertex arrays… sigh !!

Here are my results now:
Arrays : 20.3 FPS, 5119000 TPS, 9.79+9.49 Mb/s
CVA : 26.4 FPS, 6634224 TPS, 12.78+12.40 Mb/s
Lists : 34.6 FPS, 8640872 TPS, 16.73+16.23 Mb/s
Lists N/L : 128.2 FPS, 31492088 TPS, 61.92+60.07 Mb/s
Streaming VAO : 33.1 FPS, 8149448 TPS, 15.98+15.51 Mb/s
Static VAO : 35.1 FPS, 8640872 TPS, 16.94+16.44 Mb/s

Y.

Ozzy · September 20, 2002, 9:46am

Interesting small difference between streaming and static VAO.
eheheh

any cool 9700 owner out there?

josip · September 20, 2002, 11:20am

I’ve got a 9700, but I’m really not thaat cool (really).

Here are my results (using Catalyst 2.3 drivers):

GenuineIntel processor/MMX/SSE/SSE2 1794 Mhz
OpenGL ICD on ATI Technologies Inc. Radeon 9700 x86/SSE2 [1.3.3302 WinXP Release]

     Arrays : 40.0 FPS, 9869432 TPS, 19.35+18.77 Mb/s
        CVA : 12.4 FPS, 3153304 TPS, 5.98+5.80 Mb/s
      Lists : 58.4 FPS, 14456056 TPS, 28.20+27.36 Mb/s
  Lists N/L : 237.0 FPS, 58315648 TPS, 114.50+111.09 Mb/s

Streaming VAO : 56.0 FPS, 13800824 TPS, 27.05+26.24 Mb/s
Static VAO : 58.4 FPS, 14456056 TPS, 28.19+27.35 Mb/s

Ysaneya · September 20, 2002, 12:22pm

CVA 3 times slower than regular vertex arrays? Hem. No Comment.

Y.

PH1 · September 20, 2002, 12:59pm

What about alignment ? Might be a problem in the code. ATI should write a document about proper use of VAO ( and element arrays ).

Ozzy · September 20, 2002, 9:08pm

Nice config Josip! thx!
But don’t hide yourself we guess who u are. ;))
Hey Yasena could u go to your retailer at the corner and implement VAR in your little app? it could be really interesting to check perfs now with same kind of system. :)=