Extremely weird performance issue

Marcus,

new control panel has an option “Threading optimization”. Disabling it has the same behaviour, as manual registry key changing.

Not on my computer/card. :frowning:

Dell 490/WinXp (4 cores) with GF 7800 GTX & 162.18.

Is it somehow missing because it’s showing in Swedish? (Well, it’s not listed in the help, which is in english, and the other options are, so I guess I’m out of luck somehow.)

I found the setting on another system we have here, but that one was running a Quadro card.

And I still don’t like to poke the nvidia registry to make our apps run well, when they could’ve provided some sort of API for this.

they have. join the nv developer programme.

I’ve tried several times, but apparently a medical simulation company of 60 ppl that only does about 15 MUSD/year isn’t worth their time.

Sigh. I need to stop bitching. I just have a bad mood day. :-/

I have the same problem: poor performance on GF7 when NVidias mutlithreading support is enabled. The current Version of nHancer has a checkbox for switching this feature (Compatibility->OpenGL Control->Multithreading).

Threading enabled, I get 40FPS. Threading turned off, it is 150FPS.

Here’s an observation I made, using Sysinternals Process Explorer (PE), on a Dual Core Pentium D 3.2GHz and Geforce 7950GX2.

When multithreading is disabled (via nHancer), PE displays only one active thread: the application exe itself. There is also nvoglInt.dll loaded, but it doesn’t consume any CPU time. Running the app with heavy load, results in 50% CPU load for the app thread. Since there’s no multithreading in my app, that’s what I expected.

But: by default, NVidia enables its multithreading feature. And here, the nvoglInt.dll obviously has it’s own thread. When running my app, this thread consumes about 15-25% CPU load and my app also consumes only 15-25%. The total load NEVER exceeds 50%!

It looks like the two threads are running on the same core. I would expect CPU load to rise over 50% when the driver has it’s own thread, but that is not the case here!

Any ideas are VERY appreciated!

CatDog

150 fps to 40 fps ? You should be happy. On my 2 machines, I go from 80 fps down to 2-5 fps when multithreading is enabled. I just don’t get what NVidia’s doing with it, because it’s been enabled in their driver for months (if not years) now, and I don’t see any sign of improvement.

I would love to get more details about this mutli-threading optimization from NVidia itself. Nobody knows a PDF with some recommendations/explanations about it ?

Y.

Hmm, so is this an issue for all OpenGL apps running on dual core/GF7 hardware?

I spend days and weeks on finding out what I am doing wrong. I rewrote half of the startup code, but nothing helped.

Oh, btw, during testing I made another interesting observation. In a very special situation, GLIntercept logged an error after wglSetPixelFormat(). That situation was:

[ul][li] create a context (ChoosePixelformat, SetPixelFormt, wglCreateContext)[] make it current[] do not deactivate that context by calling wglMakeCurrent(0,0)[*] try to create a second context just as above[/ul][/li]At the last step, GLIntercept logs something like this:

wglSetPixelBuffer() failed, glGetError() = GL_INVALID_OPERATION

(Note that I called GDI32.SetPixelBuffer, so that call to the wgl-Routine seems to come from there. Also, SetPixelBuffer does return a valid pixel format! As I see it, that failure indicates an internal error in the OpenGL driver.)

That GL_INVALID_OPERATION only occures when multithreading is enabled, but it is not directly related to the performance issue, because creating just a single context is slow also. It’s just a weird symptom.

And if wglMakeCurrent(0,0) is called before SetPixelBuffer, the error vanishes also, but doesn’t fix the performance thing.

CatDog

Here are the two GLIntercept logs:

Multithreading DISABLED using nHancer

Multithreading ENABLED, being default driver setting

Any comments?

I would not rule out a bug in GLIntercept in this case, but is does seem weird.

Have you tried running with GLIntercept FullDebug profile to see if you get any different results?

Hm… maybe you are right. Here is the FullDebug-Log: XML TXT

It doesn’t show any error. The previous Logs were done with the “Authors Profile” and I can reproduce them perfectly.

Note this call
“glGetIntegerv(GL_MAX_DRAW_BUFFERS,…);”
that is missing in the full debug version. (I’m not doing this from my app!)

Anyway, even though it’s a bug in GLIntercept, it is a bug in the driver too! Because I can only see it when multithreading is enabled. I’m pretty shure that the driver messes things up, and GLIntercept just only doen’t know how to handle that mess. :slight_smile:

After all, I only mentioned this to demonstrate, that there is a malicious impact on the application. And it’s related to the multithreading option. It does something evil!

CatDog

That call to
glGetIntegerv(GL_MAX_DRAW_BUFFERS,…)
is an internal GLIntercept call, I fixed this about a week after the 0.5 release.

(I really should do another release with all the bug fixes)

I’d appreciate the new release to see if it changes anything. (Btw, thanks for GLIntercept! It’s a very useful tool!)

In spite of that I’m curious about the reason for the reported errors. Because after all this seems to be a procedure to expose the driver bug from the application.

CatDog

Not sure if glIntercept does this already, but could it be useful to (have a switch to turn on to) display calling thread ID?

On a tangent, I have myself been bitten by both the two largest vendors implementations in the past when it comes to SMP systems (one loves k-mode spinlocks, the other unsuccessfully tries to implement their own version in user-mode) and performance and power consumption of any system often go straight to hell (I tried to find a more polite way to say it, but failed).

I think it’s (way over) time they realized the only way to improve system performance overall is to use system provided locking primitives (on SMP systems) - mutex and semaphore comes to mind. Sucking insane amounts of CPU cycles (and therefore power) just so their power-sucking drivers will have a 0.1% edge on a 3D benchmark is just absurd, especially when it makes the remainder of an application run like on a C64 due to their spinlocks busy waiting.

It seems that I accidentally found a solution to the problem!

Here is what I did.

I don’t know, which change in detail solved it, or if it was a combination of all. While messing with the triangles, I had the “threaded optimization” turned off all the time. After finishing all changes, I was curious and switched it back on - and couldn’t believe my eyes! Absolutely no lagging anymore, process explorer reports both cores working when rendering. And I’m getting a significant speed increase with scenes that contain many batches.

Comments appreciated!

CatDog