Poor Performance of NVidia Cards

After all the bad things said about the gfFX I’ll add some good things about this card.For my current work I had to choose the FX because R3XX just didn’t have the features. Shader speed wasn’t my primary concern.

Nice FX features:

  • Very long shaders
  • Full 32 bit support through the pipeline for pixel shaders
  • No limit on dependent texture lookups
  • 128-bit floating point textures/render targets(ATI has this too,no?)

Originally posted by davepermen:
I prefer to “optimize for ati”, as at the same time it means “optimize for dx9 or arb gl” and like that “optimize for a save future”

I prefer that too. The only specific code I ever used is VAR, which got replace by VBO.

What suxx the most is not whether NvCards are terrible or not. But the fact that many people bought Nv cards - even slow low-cost 5200. So we’re to program code that will run decently on these cards too (at least for the next few years), or our programs/games won’t sell much.

Originally posted by roffe:
[b]Nice FX features:

  • Very long shaders
  • Full 32 bit support through the pipeline for pixel shaders
  • No limit on dependent texture lookups
  • 128-bit floating point textures/render targets(ATI has this too,no?)[/b]

very long shaders are cool. 9800 can have unlimited shaders, but nobody knows how to enable that or does one?

full 32bit support is essencially death with det50, as the general shader optimizer will determine itself in driver how precious a certain shader has to be. hopefully we can choose in drivers to disable lowering of quality, else, THE main feature of the gfFX got essencially killed…

floatingpoint textures are much bether done on r3xx, as we can have floatingpoint 1D,2D,3D,RECT,CUBE textures. nvidia can only have RECT. i love the floatcubemaps… HDR-envmaps that means…

but yes, never forget the gfFX does have some good features. its goods are just way off what any normal gl or dx app will need, and its hw is way off designed to what should be fast and what not. thats bad.

daveperman: UHHHHH, AAAAAHHHHHHH NICE AND SLOW
http://www.tech-report.com/etc/2003q3/hl2bench/index.x?pg=1 whats all the security about?

I wana see 3dfx + HL2

Originally posted by tfpsly:
I prefer that too. The only specific code I ever used is VAR, which got replace by VBO.

i had some specific paths for RC on my gf2… i cannot watch my stunning perpixellighting demo now anymore… and i don’t have the source anymore due hd crash.

just as we had to code proprietary for all geforces to gain access to features standard in dx8, to get fast speed (VAR), etc. just as we had to work around the hw bugs of them, etc…

but what suxx the most is that nvidia cannot stand in front of all and say “okay, we agree, our current generation has some big faults. we’ll try our best now, and stop doing stupid falsepropaganda”. now they bitch on valve, as they did on futuremark. what do they want to bitch on next?

Originally posted by M/\dm/
:
[b]daveperman: UHHHHH, AAAAAHHHHHHH NICE AND SLOW
http://www.tech-report.com/etc/2003q3/hl2bench/index.x?pg=1 whats all the security about?

I wana see 3dfx + HL2 [/b]

what do you want to say? this is all nonsense. thanks for the link, though

Me wana zay thzat zyztemz are f**d up

BTW, prelimenary tests showZ Dets 51 are about 15% faster in vp/fp. In usual benches where gap between 9800&5900 is around 15%

learn to speak english. why are the systems ****ed up? or what exactly?

and it looks like det50 doesn’t guarantee 32bit floatingpoint math anymore. thats rather dissapointing. we’ll see.

Ouch. Davepermen criticizing someone else’s English.

i do at least try to speak english… he works with english as nvidia works with standards.

Originally posted by davepermen:
just as we had to code proprietary for all geforces to gain access to features standard in dx8, to get fast speed (VAR), etc. just as we had to work around the hw bugs of them, etc…
I see a significant difference here, and it bothers me a lot:
Using an NVIDIA proprietary feature is an optimization. You detect an extension, you use it, program runs faster, fine.

Using ARB_fragment_program on GeforceFX cards is just nuts. You must include an off switch for an otherwise completely automatic feature, or users will send bags full of disrespect.
“My FX5200 is very slow, why?”
snickers

Originally posted by roffe:
[b]After all the bad things said about the gfFX I’ll add some good things about this card.For my current work I had to choose the FX because R3XX just didn’t have the features. Shader speed wasn’t my primary concern.

Nice FX features:

  • Very long shaders
  • Full 32 bit support through the pipeline for pixel shaders
  • No limit on dependent texture lookups
  • 128-bit floating point textures/render targets(ATI has this too,no?)[/b]

Just curious: is your work more experimental or meant for pre-rendered scenes? Using very long, 32 bit shaders and lots of dependant texture lookups resulting in framerates too low to be practical in a realtime environment doesn’t seem too good. . .

Dorbie, I did not take the term ‘nvidiot’ personally. I am not a fanboy, so why would I take it personally? Especially something which was posted BEFORE I posted that did not mention my name. You must think I’m a regular idiot

My response about being ‘offended’ was just a response to the stupid analogy about being teased by one’s mom.

The post may be a legitimate news item, but the way it was presented was trollish. The response to my suggestion that it was trollish WAS a personal attack.

I simply do not think that Valve’s benchmark results translate into a general evaluation of NV30’s performance. I think that it only reflects a single developer’s experience developing a specific engine for a specific game. I could call anyone who thinks otherwise a ‘fanATIc’, but that fails to explain anything, so why bother.

Maybe this will all become moot once nVidia releases its new drivers.

Interesting how this post has really got the nvidia zealots out of the woodwork. NVidiot really hits the nail on the head. The OP was pretty objective, as was the article, yet it’s amazing to see so many people taking it personally.

Nobody wants to rewrite their shaders to work for one specific platform. Shaders are supposed to be cross-platform. If we wanted to optimise for one particular card then there would be commercial games using the register combiners.

If nvidia really expect commercial developers to write shaders specifically for their platform then they’re making the same mistake
3dfx made with glide, and they may well suffer the same fate. Real developers simply have better things to do, as Valve has pointed out quite nicely. NVidia is simply smoking crack if they think commercial developers are going to jump through hoops to get our code to work on their platform.

Hopefully their next generation of cards will be better. It would be a shame to see such a great supporter of OpenGL go under because of some bad design decisions.

[This message has been edited by bunny (edited 09-12-2003).]

Originally posted by bunny:
[b]Interesting how this post has really got the nvidia zealots out of the woodwork. NVidiot really hits the nail on the head. The OP was pretty objective, as was the article, yet it’s amazing to see so many people taking it personally.

Nobody wants to rewrite their shaders to work for one specific platform. Shaders are supposed to be cross-platform. If we wanted to optimise for one particular card then there would be commercial games using the register combiners.

If nvidia really expect commercial developers to write shaders specifically for their platform then they’re making the same mistake
3dfx made with glide, and they may well suffer the same fate. Real developers simply have better things to do, as Valve has pointed out quite nicely. NVidia is simply smoking crack if they think commercial developers are going to jump through hoops to get our code to work on their platform.

Hopefully their next generation of cards will be better. It would be a shame to see such a great supporter of OpenGL go under because of some bad design decisions.

[/b]

Well said… When they bought 3dfx core assets and got some of their engineers, I think they got the wrong parts/people: their hw design is not good, they have lost their 6 months cycle (they were really late with gf fx), they focus their driver team to include ‘optimization’ for some games/benchmarks instead of adding some new features (in the past, glslang would be available the same day it was announced. There are some D3D9 functionality still missing) and optimizing the current ones. And they are defending thing with incredible arguments. Read their response: http://www.gamersdepot.com/hardware/video_cards/ati_vs_nvidia/dx9_desktop/HL2_benchmarks/003.htm
I think they should think before say this kind of things: “Regarding the Half Life2 performance numbers that were published on the web, we believe these performance numbers are invalid because they do not use our Rel. 50 drivers. Engineering efforts on our Rel. 45 drivers stopped months ago in anticipation of Rel. 50. NVIDIA’s optimizations for Half Life 2 and other new games are included in our Rel.50 drivers”.
Seems that they have not take notice of the people saying they don’t want some kind of optimizations/cheats for specific applications.

I’m sorry for them, but now that I’ve seen my OpenGL applications working perfectly, with all the extensions used, in the Radeon 9800 (for the first time in ATI’s life), I will change my noisy GF FX 5800 for one of those cards.

I simply do not think that Valve’s benchmark results translate into a general evaluation of NV30’s performance. I think that it only reflects a single developer’s experience developing a specific engine for a specific game.

But we’re talking about a known problem with FX hardware. Any game that attempts to use D3D 9 shaders or ARB_fp will experience slower performance on an FX than a Radeon.

The Valve benchmark is only a symptom of a well-known problem.

Seems that they have not take notice of the people saying they don’t want some kind of optimizations/cheats for specific applications.

That. Or that they have found a “solution” to the whole fragment program precision problem. Which means that they are probably going to try to dynamically determine the necessary precision of each register and allocate it accordingly. Which is a non-trivial undertaking.

What I have suggested appears to be what nVidia has done, and they infer that in the language of their reply to the benchmarks.

Originally posted by Korval:
That. Or that they have found a “solution” to the whole fragment program precision problem. Which means that they are probably going to try to dynamically determine the necessary precision of each register and allocate it accordingly. Which is a non-trivial undertaking.

definitely hard work, espencially as no compiler till now ever optimized for similar constraints, but exactly for the other constraints… on gfFX, doing calculation several time to save registers instead of storing intermediate values can gain speed… urgh.

and, independend on hard work, its NOT what we want.
THE most powerful feature of the gfFX is the 32bit float fragment program. its THE reason why people bought it in scientific areas. and it looks like they now dropped that and determine in drivers if they only need partial precicion in some parts of shaders. that makes math inconsistent, and less deterministic than ever. unusable for any scientific calculation. bether use lowerprecicion ati then…

i’m not sure about this, thought. we’ll see WHAT nvidia mixed together for det50… but they cannot get rid of the fact that they fight against a one year old gpu, and still don’t beat it really…

Originally posted by Zak McKrakem:

I’m sorry for them, but now that I’ve seen my OpenGL applications working perfectly, with all the extensions used, in the Radeon 9800 (for the first time in ATI’s life), I will change my noisy GF FX 5800 for one of those cards.

Just wondering, you using those cat 3.7’s and you see no issues with openGL apps? This would be a refreshing change!

P.S. I’ll take your 5800 off your hands

[quote]and, independend on hard work, its NOT what we want.[/quote

Sure it is. If nVidia could correctly determine 100% of the time which registers could be fixed, half, or float, then you wouldn’t mind. Apps that need float precision get it, becuase nVidia correctly determined that they need it. Apps that only really need fixed get it.

Now, the unfortunate fact is that there is no way to determine with 100% when you need which precision. If it is based on the incoming data from a texture, you would have to scan the texture to determine the application’s needs. If it is based on a vertex program, you can never really know for certain.

Perhaps the driver will have a slider that allows you to set which side for the shader compiler to err on: performance or quality. Quality would mean that, unless the driver can absolutely determine that the computation can get away with less than 32-bit precision, it will use 32-bit precision. Performance would mean that, unless the driver finds 100% proof that a computation needs half or float precision, it uses fixed.

Originally posted by Elixer:
Just wondering, you using those cat 3.7’s and you see no issues with openGL apps? This would be a refreshing change!

That was gone be my post.

Arguments are made that this card is faster than that (like all those dumb benchmarks you find on the net) and this card can compute this more precisely than that, and this card is more noisier than that,

but somehow people always forget to throw in the bug list in the mix.

Without good drivers, any product can look like crap.