Performance numbers for various hardware combos

Does anyone have some performance numbers (tris/sec primarily) for various combinations of hardware (cpu, video card, m/b) and rendering methods (lit, textured …), or a pointer to a website or
perhaps an Excel spreadsheet?

I’ve been reading this forum for quite some time and have yet to see anyone discuss actual numbers. I want to know if I’m even in the ballpark as far as performance in my engine.

Yes I understand numbers like these are very tough to make apples to apples comparisons, but something is better than nothing. I’ve tried various web searches and keep coming up empty.

I see approx 300-400 Ktri/sec (450 MHz Intel P2, Voodoo3 video card) which intuitively I think is likely quite low. The Glide 3 docs seem to imply that the Voodoo3 hardware should be capable of 1-10 Mtri/sec. I am a far cry away from this, and just want to know if it’s worth my time to try improving performance. (Yeah I know my hardware is getting a bit old … what is the current realistic expectation of tri/sec on modern h/w … not the advertised box numbers?)

Thanks for any leads,

Dunno if you mean in an actual engine, but as far as synthetic benchmarks, there’s an interesting one at nvidia called “benmark”.

It simply draws triangle strips to see what your card can do. I imagine it would work on voodoos…

The best I can get with my system on this demo (trying various settings) is 18M tris/sec. This is with a Geforce3 on a PIII-450.

I believe the advertised GF3 performance is 40 Million tris/sec.

Why the disparity? My guesses are:

  1. The benchmark is actually DRAWING these triangles (see previous post my Matt)
  2. My computer only has agp 2x
  3. Only 100Mhz front side bus.
  4. This app is not 100% tuned.

I hope this was somewhat relevant to what you were asking

– Zeno

1.10 million tris a sec on a voodoo3.
tell me what fps does quake3 run at?
i believe an average q3 screen has ±10000 tris on it, if that.
what fps do u get in q3? 20fps say.
20x10000 = 200000!!
one of the fastest engines around is nowhere near that 1-10million figure

You should benchmark using benmark as zenno said, also try the balls demo. The balls demo will give you a lower tri/sec because lighting is on. They will give you an idea of the maximum performance you can achieve on your hardware.

Maximum performance on non geforce cards is dependent on the CPU. I would say roughly 2.5M lit Tris/sec on a PIII 700.

On Geforce II, its about 12M lit Tris/sec and 18M non lit Tris/sec.

The GF3, from what I can gather, is not much faster than the GF2 in terms of Tris/sec.

The above figures are just a rough guide and assume no game code or fill rate bottlenecks. The geforce numbers also assume you use VAR or display lists to avoid the AGP bottleneck.

I’m curious to know of the Radeons T&L performance, there is no VAR equivalent for the Radeon, or is there?

zed, 20 fps? Remember that you’re measuring tris/sec, not fill rate, so you shouldn’t take 1280x1024 as the resolution you’re testing at A Pentium 4 could probably reach the 1+ million figure (100+ fps) for that Voodoo 3 (at low resolutions).

painterb, I’m sure that the triangle figure is for the maximum number of triangles that the Voodoo3 can setup per second. The actual number of transformed vertices depends mainly on the CPU for chips that don’t have hardware T&L.

Obviously you haven’t been reading this board carefully enough Humus wrote a small benchmark that measures what you want (and other things). Search this board for the thread. You’ll be able to find some figures on the thread and at Beyond3D.

Thanks guys,

Zeno, “benmark” was nice … but didn’t offer quite what I was looking for (it failed to run in some combinations … I think with depth buffer bits > 0).

Your numbers of 18Mtri/sec vs. advertised 40Mtri/sec was helpful. It gives me a datapoint with which to go on. The voodoo3 advertises 7Mtri/sec, so I am still a long way from that. But I could try not drawing the tris, and see if that helps!

zed, unfortunately I do not have Quake3, so I can’t run those demos. But again it’s interesting to see they don’t get near that 1-10Mtri/sec figure.

I tried looking for the benchmark humus put together, but the web page is down??? I get a 404 error I think, but it is written in a language other than mine , so I don’t know for sure.

However, looking through that old thread I did see a reference to 3DMark, which looks like exactly what I was hoping for! It tests my machine with “realistic” graphics code and lets me compare my results against many others’ results! It looks good so far, so I’ll get more in depth with it.

On a side note, given my P2 450MHz, 100MHz bus, currently with Voodoo3 video card, do people feel that I can get better 3D performance by upgrading my video card? Or am I mainly limited by CPU/mb at this point?
I always heard that cpu was the limiting factor at this point, but Zeno … your 18Mtri/sec figure is pretty good with a roughly comparable configuration (but better video card)


Bah, don’t listen to the people that tell you it’s your CPU . Previously, I had a TNT2, and just upgraded to the GF3. I have seen huge performance increases in everything I have tried. My friend had a similar experience going from an old permidia card to a gf2 mx (on a celeron 366).

If I recall, when I ran benmark on my TNT2, I got about 5-6 Million tris/sec. I’m guessing the voodoo3 is comparable to the tnt2.

I think upgrading to anything with T&L will help you a lot. Which one to choose depends on the features you want and display resolutions you are shooting for. For every vertex you draw, a T&L enabled card will save your CPU from having to do a matrix-vector multiply for transformation and some dot products for lighting. This can be a huge savings if you’re drawing a lot of tris.

– Zeno

If you don’t have HT&L, it’s most definitely your CPU in a lot of ways (until you hit the fill rate limit).

Running GLTrace on American McGee’s Alice shows me that they push 18-20000 triangles per frame – they’re doing multiple passes over some of the geometry. On a P3/733 with a GF2GTS that game averages something around 40 fps, which means less than a million triangles per second. It’s using the Q3 engine.

>If you don’t have HT&L, it’s most definitely your CPU in a lot of ways (until you hit the fill rate limit).

OK. Interestingly, I also thought I had read that, due to the time it takes to transfer vertices over the bus, the CPU was essentially stalled between vertices, and therefore it should have plenty of time to perform T&L. Not true?

In a vaguely related topic … I ran some calculations on my Dell laptop video performance :

If we assume approx. 20000 tri/frame at 40 fps as you (jwatte) suggest yielding 800Ktri/sec for Quake (which I don’t dispute). And further assume an average of 2(?) vertices required per triangle, and 10 float values required per vertex, and 4 bytes/float, I compute 64 MB/sec bandwidth required to send geometry to the video card (unless I did this wrong???).

This is nowhere near the AGP bandwidth max, and just a small fraction of 4x AGP. So on my M4 Mobility 4x AGP laptop, that 4x isn’t buying me anything right (at least as far as 3D goes)?

If the CPU is actually significantly involved on a per-vertex basis with vertex transfer (assuming a hardware T&L card), then the drivers are at fault. Unless you are using OpenGL immediate mode, the CPU should simply tell the card where the vertices are and it should handle access.

For example, if you setup a vertex array, and you call glDrawElements, at that moment, the drivers will likely copy your vertex array’s out into DMA-able memory. After that, it queues up the primatives you requested to draw and returns control back to your program. It will, then, in parallel with your CPU, render these primatives.

With VAR, the copy step is omitted entirely, which is why you should use fences to make sure you aren’t writing to memory that the card is DMA-ing from.

Note that the preceeding is simply an educated guess. I have not written any OpenGL drivers for any graphics card, nor have I written drivers for anything.

No, the copy itself can take significant time as well, although copying is usually limited more by memory and bus speed than by CPU speed.

  • Matt

Humus’ benchmark is at

The Beyond3D thread with results that you may find helpful is at

Even games that support HW T&L do lots of stuff per vertex/triangle so the CPU often does make a difference. For example I think Q3 does all lighting on the CPU, so even with HW T&L you’re still limited by the CPU.