OpenGL geometry benchmark released..

http://www.fl-tw.com/opengl/GeomBench/index.htm

This is an OpenGL pure geometry ( T&L ) benchmark. Binaries and source code are available… it should answer quite a few questions, like: does it make a difference if a display list is compiled in immediate mode versus vertex arrays ? What kind of difference to expect from video / AGP VAR / VAO ? Are compiled vertex arrays any usefull today ? Are short indices faster than long indices when using vertex arrays…? Now, you can look at the results yourself…

Y.

Are you sure its correct? I get similar values for the immediate\cva pairs. I would have thought cva would be quicker in all cases?

Hum, I’ve got large better result with my engine and the nvidia vertex array range demo. With this benchmark, I’m about 6 MPoly/s and 18 MPoly/s with nvidia’s demo.

Arath

Yes, i’m pretty sure it’s correct. I get a (small) speed improvement with low tesselation on my Radeon 8500, which means it works. See the recent VAO thread and you’ll even notice that the Radeon 9700 is a lot slower with CVAs… ah well. I suppose you’ve got a NVidia card?

Y.

I got 21 million Tris per sec max. That’s a lot more then I get typically but seems like a reasonable (pretty high even) max since I have a GeForce 2MX. That was with stripped geometry in video mem. You’ll probably need to run the tests for longer periods of time to get consistant results. I was surprised at how fast display lists were btw. Something else that seems strange is that integer indices are faster than shorts on most of the tests.

768k is very slow, is it one array ?

I hope it’s not the case, but can’t really see why it would be slow if you don’t make arrays bigger than the max implementation size.

65535 on my RadeOn 5800.

Ok, i had a first look at the log. Arath, i suppose you’re the one with an Athlon 1800 + GF3 ? If that’s right, i noticed that in the test you did and got 6 MTS with VAR, you benchmarked the 44k tris scene. I don’t think it’s enough… you should try again with the 768k tesselation. Other people’s configs are showing what’s expecting, so i don’t think it’s a bug in the code…

Harsman: longer tests, that’s an idea… except if you want to try everything… it’s already 10 mins with all the options now :slight_smile:

Ingenu: no, in the 768k case it’s a 64k vertex array which is shared by all the spheres. It’s slow because… well… 768k is a lot of triangles :slight_smile: You won’t see anything smooth unless you run at > 20 MTris/sec.

Y.

Ysaneya, you’re right, my apologize to you, I did the benchmark and I’ve got good perf.
By the way, some times when I run the program, I’ve got no test, and no result, just back to windows, it happens when I do only VAR test (in VRAM), but it works sometime, may be the memory allocation doesn’t work?

Arath

Not sure. Since it tests all the combinations of options you’ve checked, you need to have at least one option checked per group, and at least one transfer method. If that’s what you did, it’s a bug, i’ll look into it (if you can tell me exactly what are all the options you’ve got checked when it happens…). Thanks :slight_smile:

Btw, i have added the first logs to the site (beware: it’s already big…)

Y.

The programm crashes right after the last test. Even if i select only some options so that only one test is performed (different types) it crashes after that test.
No question for a connection or something else.

pIII 933 256 mB WinXP GeForce2Go 16 MB Det 40.41

Lars

Ok thanks, i’ll try yo debug it tomorrow. I’ve been working on this program for 2 days now, and i’m starting to get a bad headache :slight_smile:

Y.

You should use the swap_interval extension to override the users vsync selection in drivers. Anyway, all my results seem to make sense so far. Nice program.

EDIT: (deleted a couple of statements about the driver settings for vsync which I realized were incorrect once I though about them)

[This message has been edited by Nakoruru (edited 09-23-2002).]

Nice and useful app! Good work!

I tried some times ago to understand in the same way how the geometry (T&L) performance can change when varying the various transfer, format vertex parameters.

I did a post on this subject but got no answer (http://www.opengl.org/discussion_boards/ubb/Forum3/HTML/007347.html ) Probably i should have added something to download and test :slight_smile:

Just to add my 5 cent. to the discussion, i noticed some things that probably could be useful:

  1. when speaking about rendering high triangle-count scenes, the fact that we are rendering a real big scene and not rendering ten times the same array, affects a lot the performance. (and not only when your scene does not fit al in agp or video mem

  2. The size of the vertex arrays blocks matters (best is from 1k to 7k vertices).

  3. the length of the strips little changes the speed. (its important above 30M/sec).

  4. i could degrade performance to 30M/sec by avoiding the vertex caching. (100vertex/turn)

  5. if i reorder vertices in the block and performance drops considerably, depending on how much (and expecially how locally) i permutate.

thanks

funes

Nakoruru: that makes sense, i’ll disable vsync for the tests.

Funes: didn’t know your 5th cent was worthing your 3th :slight_smile: Anyway, you’ve got excellent ideas, but i’m hesitating to add too many options. For every option i add, it multiplies by 2 the full-test time (since it tries all the possible combinations)… by implementing everything, the full test will take many hours! Ideally i should try to restrict the tests to the important combinations, but how to make sure no important one is missed ? Any ideas are welcome…

Y.

When I implemented this kind of benchmark a while ago, I made an ‘interactive’ program. It was a dialog box where the user could switch options on and off on the fly. For example, you had a slider for tesselation level, radio buttons to choose between immediate / VA / CVA … This way the user can test only things that interest him and get the results immediately (by displaying FPS and MTris/sec info). Problem is it’s not well suited to generate nice reports like your program does.

Another feature for your wishlist : it would be nice to have ATI_map_object_buffer support (this extension has been around for months but specs are still not publicly available. Maybe some ATI registered developers out there could send you these specs (or maybe this extension could be documented ^^)).

Ysaneya: i know, the number of test is exponential, but in my humble opinion, the main problem is not running the tests (the night is long… :slight_smile: ), but browsing the whole mess of results. I think that just looking at the fastest combination is not very useful. What i would like to know is what is the most discriminating factors when rendering. In other words, let assume that you have 2^n timings each one corresponding to the enabling/disabling of one of <n> different “things” (strip, cva, short/long indexes, chache aware primitives, video/agp… etc) the most discriminating factor should be the one that minimizes the ratio between the sum of timings with that feature enabled and the sum of all timings with that feature disabled.
Probably this should be a good hint for everyone on what path you should choose when starting to optimize your code…

funes.

Originally posted by kehziah:
Another feature for your wishlist : it would be nice to have ATI_map_object_buffer support (this extension has been around for months but specs are still not publicly available. Maybe some ATI registered developers out there could send you these specs (or maybe this extension could be documented ^^)).

If you look in ATi’s glATi.h file it’s pretty much obviuos what the extension does and how to use it. It exports these functions:
void *glMapObjectBufferATI(GLuint buffer);
void glUnmapObjectBufferATI(GLuint buffer);

It’s like Lock()/Unlock() on D3D vertex buffers.

Usually not all combinations are interesting. If I have found out from testing that display lists are faster than immediate mode I might not care if immediate mode is faster with strips, or with short indices etc. Instead of ranking all possible combinations you might rank each group separately and then combine the results. I.e Video mem VAR is fastest, shorts are faster than longs, strips are faster than lists. Then you can guess what the fastest mode is and do more exhaustive testing on that.

Nice app, thanks.

I would like to see options to benchmark with texturing and also only using 1 light not 3. I know this could make for a lot more combinations but I think most apps have texture mapping and either 0 or 1 gl light.

i agree, great app.

Display Lists actually just pip Video VAR on my machine (although the difference is tiny…)… both about 22MTris (GF3, athlon 1800+)