On Benchmarking

I’ve been doing some thinking about this whole nvidia/ati/Futuremark fiasco that’s been going on lately, and I was toying with the idea of trying to write a fair benchmark for modern graphics cards. I think, if nothing else, it would be a good learning experience.

Before I decide whether this is worth spending some time on, I’d like to get feedback from the developer community (you guys).

Here are some topics for discussion:

  1. Is this worth doing? Are there already enough benchmarks?

  2. Why is 3dMark so popular? eye candy? online comparison feature? results in a single number? Do people really base hardware purchasing decisions on it?

  3. Expanding on number 2, is it a good idea to try to distill a fundamentally n-dimensional test into a single number so that people can compare easily? i.e. “My liquid cooled uber-computer get’s 235 ZenoMarks!”.

  4. Is it better to write a code path for each popular card, or is it better to have them all run the same code? The former would show how different cards would perform under hand-optimization and vendor-specific extensions (which most game companies probably do) while the latter would give as much of an equal comparison as possible. I’d lean towards the latter, at least as a first pass.

  5. Would people pay for it? $5? Or would trying to charge anything just make it unpopular?

  6. What qualities/features would it have to have to get review sites to use it?

  7. Would it be worthwhile to try to make it difficult to “cheat” on? I know it will always be possible to cheat, but I had the following thoughts: Keep everything that’s being rendered on-screen and moving. Optimize shaders as much as possible. Use some randomness (might necessitate statistical score). Create software renderer and compare pixel results (or, make a few screenshots before benchmark is released and store them for comparison. Raise a flag if stored and rendered images do not match).

  8. I would like some sort of peer review for the project, but would not want to make it open source for a couple of reasons. Suggestions on this? I know there are a few people on this board with more experience than me whose input would be extremely valuable.

– Zeno


  1. Eye candy & ‘Mine is bigger than yours’ syndrome.

  2. IMO this is particularly useless for developpers, I would rather have a huge grid of smaller and individual test, so I can know how things are best done (or best not done at the very least).

  3. Just like for CPU, use hardware-specific paths, and compare them against an old style or “simple” path. This could help decide where to spend more optimization time, because there is potential, in an application.

  4. Eye candy. Generate graphs automatically so the reviewers do not have to work too hard. Generate review text automatically in v2. “Card %s does well in the %s test, %.2f%% better than %s in %dx FSAA %dx aniso, but will that lead hold in the next test?”

  5. A reference software renderer would be good IMO. It would take care of the “looks somewhat washed out” or “crisper images” comments that are highly subjective.
    On a site (hardware.fr) they compare the quality of trilinear filtering numerically, it makes the quality/speed tradeoffs very obvious, and subjective comments of other sites very suspicious.

  6. If it’s not open-source, you’re opening yourself to (justified) criticism of bias. Not that open-sourcing would shield you completely (most people wouldn’t look at it), but claiming the “most fair benchmark in the world” for an opaque test wouldn’t be very credible.