VAO probs :)

Do you have the latest drivers ? Because hum… ATI and drivers… last time i worked with the beta drivers, i found that OpenGL was outdated. I now prefer to work with the “official” catalyst drivers :slight_smile:

Y.

Muahahaha! ) (nice thought thx)

i was using the latest catalyst stuff. (sept release)

Btw, that’s too bad that it’s again a problem with a driver implementation which is ruining performances of what i think at last a pretty nice piece of hw.
In fact, we should ask for a D3D expert to get his opinion about that 8500LE board when using static data with the microsoft API.
(if static geometry is possible with D3D)

[

[This message has been edited by Ozzy (edited 09-19-2002).]

When i got my Radeon 8500, that’s the first thing i tested. I’m no D3D expert, but i can still compile and execute some stuff from the SDK :slight_smile:

Anyway, the results were quite incredible… I didn’t achieve more than 10 M Tri / sec (all static) with OpenGL, but the optimized mesh sample from D3D was running at a smooth 40 M Tri / sec. Talk about a difference :stuck_out_tongue:

I’m convinced that the Radeon is one of the most powerfull cards here, it’s just that you don’t see it in OpenGL …

Y.

For maximum hardware performance, all vertex arrays except for color and secondary color should always be specified to use float
as the component type. Color and secondary color arrays may be specified to use either float or 4-component unsigned byte as the
component type.

I would imagine (RE: hope) that this has changed for the 9700.

Originally posted by Ysaneya:
I’m convinced that the Radeon is one of the most powerfull cards here, it’s just that you don’t see it in OpenGL …

I’m not sure what you guys do to keep your Radeons performance down , but I have never had any performance problems with polygon throughput. In fact, I wrote a demo once doing vertex skinning in Direct3D, I also did the same thing in OpenGL, same model, same effect. There were no significant performance difference between OpenGL and D3D version. It was like 430 fps (OpenGL) vs. 425 fps (D3D) when looking away from the model and 270 fps (OpenGL) vs. 280 fps (D3D) when looking straight at the model.

Humus, this is not the point to keep that 8500 board performance down!! :))
I have used a celeron300 for the tests ok…
i know that it has a direct impact on dynamic vertex througput! But you certainly also know that it should have no impact on performances when using resident geometry…

When overclocking that celeron to 450Mhz i have noticed an interesting boost thus when can definitely say that data are not stored onboard and that the CPU is too much involved in the VAO implementation.

Concerning D3D vs GL on the same system it sounds pretty logical to get approx the same results :wink: Moreover and for GL only, i think that public 3DBenchmarks are quite good for evaluating perfs using standart/classic mechanisms (without dynamic lighting).
What i want to say, is that there shouldn’t be so much differences using CVA (without lighting) on a Radeon or on a Geforce coz sending the data through the bus is the bottleneck.
I’m insisting on the dynamic lighting coz using the fixed funcs has always been a problem (in term of perfs) with ATI drivers!
It is really expensive and again far away from the results with NV implementations.

In fact, the trouble is that the game we’re working on has been designed with T&L in mind. And while it will run smoothly with a stable 60FPS with a low cpu config on a GF256 it will not be the same story with the previous generation of ATI boards which were supposed to be much more powerful than the first GF256 series!

Conclusion: if you can’t store (statically)and process the data in VRAM you are loosing all the advantages of what is called a T&L board.

And finally, ehehe Korval, it could be funny to test that radeon9700!! anyone can rent a 9700 to me? contact me! ;)) Come on ATI dudes… :slight_smile:

tchOo

Well, Ysaneya stated that the situation would be significantly better in D3D, and I show that in my experience that’s not true. I’ve been able to reach very similar results in both API’s. I’m also pretty sure that data IS stored locally. I went back and fired up that old OpenGL app I talked about in the previous post. It still ran at roughly 430 fps when looking away and 270 fps when looking at the model. Now I disabled VAO and let it run through the standard vertex array path, and performance dropped down to 82 fps constant regardless of viewing angle. So it seams pretty much obviuos to me that the vertices are stored onboard, otherwise I can’t see how I could get that huge performance boost by using VAO. Especially when the performance also matches the performance of D3D at a similar task.

A precision: at the time i did that test, i had a PIII-500 Mhz.

Since then i completely changed my system. I now have an Athlon 1.4 with 512 Mb of RAM. Here’s a test application to benchmark Radeons with 4 lights (note: it was not done by me, credits goes to whoever did it…):
http://www.fl-tw.com/opengl/texbug.exe

My results are:

Arrays : 4.5 FPS, 1228560 TPS, 2.18+2.11 Mb/s
CVA : 9.8 FPS, 2539024 TPS, 4.76+4.61 Mb/s
Lists : 29.5 FPS, 7412312 TPS, 14.25+13.83 Mb/s
Lists N/L : 103.9 FPS, 25595000 TPS, 50.20+48.71 Mb/s
Streaming VAO : 28.0 FPS, 7043744 TPS, 13.53+13.13 Mb/s
Static VAO : 34.4 FPS, 8558968 TPS, 16.61+16.12 Mb/s

Y.

Wow, thats weird. My paltry GeForce 2MX beats your radeon when using vanilla vertex arrays or CVAs. My results:

AMD K7/MMX/3DNOW 899 Mhz
OpenGL ICD on NVIDIA Corporation GeForce2 MX/AGP/3DNOW! [1.3.1]

Arrays : 23.2 FPS, 5815184 TPS, 11.23+10.89 Mb/s
CVA : 23.1 FPS, 5815184 TPS, 11.18+10.84 Mb/s
Lists : 24.1 FPS, 6019944 TPS, 11.66+11.32 Mb/s
Lists N/L : 40.9 FPS, 10156096 TPS, 19.74+19.15 Mb/s
Streaming VAO : unable to initialize
Static VAO : unable to initialize

Originally posted by Humus:
. So it seams pretty much obviuos to me that the vertices are stored onboard, otherwise I can’t see how I could get that huge performance boost by using VAO. Especially when the performance also matches the performance of D3D at a similar task.

Well, i’m not convinced of that fact.
Look Humus, when u are using VAR with data in AGP mem it’s faster than CVA. But it’s also slower than using VRAM. :slight_smile:
This could explain VAO performances vs CVA.

I really think that the VAO implementation load data into AGP mem instead of VRAM. Then the GPU would only deals with a kind of cache (in VRAM) to process the vertices.

Moreover if finally the data are really stored in VRAM then performances are incredibly slow for such a mechanism.

Again, this is pure speculation, only guys at ATI could answer. Anyhow, i think it’s a bit ridiculous for a T&L board to have its performances drastically varying depending on the CPU &| BUS frequencies. Something is wrong in the design somewhere :))

Someone would be nice enough to test the little app with a 9700 and latest drivers? please. :wink:

Originally posted by harsman:
Wow, thats weird. My paltry GeForce 2MX beats your radeon when using vanilla vertex arrays or CVAs

i think you get it man! :wink: Too bad that VAR are not tested in this app.

Not sure what to make out of this little app (why is it called texbug btw?), results from Radeon 8500:

Arrays : 34.1 FPS, 8436112 TPS, 16.48+15.99 Mb/s
CVA : 33.3 FPS, 8231352 TPS, 16.11+15.63 Mb/s
Lists : 34.4 FPS, 8640872 TPS, 16.63+16.14 Mb/s
Lists N/L : 126.0 FPS, 31000664 TPS, 60.86+59.05 Mb/s
Streaming VAO : 33.6 FPS, 8395160 TPS, 16.21+15.73 Mb/s
Static VAO : 34.8 FPS, 8640872 TPS, 16.83+16.33 Mb/s

Pretty much the same performance regardless of mode, except without lighting.
Would be interesting to see the source of this app, but anyway, there should be no difference in number of bytes pulled from VRAM/AGP (or whereever it’s stored) between lists and lists N/L mode. As there’s a huge performance difference the bottleneck is obviously something else, it may be as simple as the T&L unit not being faster. That the performance is similar between all the other modes supports that theory.
I’m also not surprised that a GF2MX beats the Radeon in some tests, the GF2MX has the T&L engine of the GF2. In pretty much everything the GF2 beat the Radeon when it came to T&L throughput, even though the Radeon on the paper should be faster.

[This message has been edited by Humus (edited 09-20-2002).]

I thought Ysaneya had a Radeon 8500 not a regular radeon 1. You seem to get much better scores but still low performance from VAO, even with static data.

No, i really have a 8500, just didn’t mention it anymore since i already did a few times :slight_smile: No really, no joke, it’s really a 8500 :slight_smile:

Anyway, i have uploaded the main source file if you wanna see it: http://www.fl-tw.com/opengl/texbug.cpp

Why is it called texbug? Can’t remember. I thought i had renamed it to demonstrate a texturing bug with the first drivers release, but since there is no texture, i dunno…

Y.

I updated my driver from catalyst 2.2 to 2.3, and i got an improvement of 5 (!!) with regular vertex arrays… sigh !!

Here are my results now:
Arrays : 20.3 FPS, 5119000 TPS, 9.79+9.49 Mb/s
CVA : 26.4 FPS, 6634224 TPS, 12.78+12.40 Mb/s
Lists : 34.6 FPS, 8640872 TPS, 16.73+16.23 Mb/s
Lists N/L : 128.2 FPS, 31492088 TPS, 61.92+60.07 Mb/s
Streaming VAO : 33.1 FPS, 8149448 TPS, 15.98+15.51 Mb/s
Static VAO : 35.1 FPS, 8640872 TPS, 16.94+16.44 Mb/s

Y.

Interesting small difference between streaming and static VAO. :wink:
eheheh

any cool 9700 owner out there? :slight_smile:

I’ve got a 9700, but I’m really not thaat cool (really).

Here are my results (using Catalyst 2.3 drivers):

GenuineIntel processor/MMX/SSE/SSE2 1794 Mhz
OpenGL ICD on ATI Technologies Inc. Radeon 9700 x86/SSE2 [1.3.3302 WinXP Release]

     Arrays : 40.0 FPS, 9869432 TPS, 19.35+18.77 Mb/s
        CVA : 12.4 FPS, 3153304 TPS, 5.98+5.80 Mb/s
      Lists : 58.4 FPS, 14456056 TPS, 28.20+27.36 Mb/s
  Lists N/L : 237.0 FPS, 58315648 TPS, 114.50+111.09 Mb/s

Streaming VAO : 56.0 FPS, 13800824 TPS, 27.05+26.24 Mb/s
Static VAO : 58.4 FPS, 14456056 TPS, 28.19+27.35 Mb/s

CVA 3 times slower than regular vertex arrays? Hem. No Comment.

Y.

What about alignment ? Might be a problem in the code. ATI should write a document about proper use of VAO ( and element arrays ).

Nice config Josip! thx!
But don’t hide yourself we guess who u are. ;))
Hey Yasena could u go to your retailer at the corner and implement VAR in your little app? it could be really interesting to check perfs now with same kind of system. :)=