Backstabbed by skanky IHVs

Well, I’ve experienced “the lag” even when drawing nothing, so it’s not just related to how you pass the data…

Please, why do you have to be insulting?[/QUOTE]I was expressing my profound disbelief of the numbers given - reasonable disbelief as it seems now. As you felt it to be just insulting instead, i clearly failed at that - sorry.

Please, why do you have to be insulting?

this coming from someone who titled the thread
‘Backstabbed by skanky IHVs’
youve gotta laugh :slight_smile:

Uh, I’ve always taken “skanky” to be quite a light way of saying “knavish”, that’s how people use it where I come from. Apparently the dictionary quite disagrees with me. Meeeeh! Wrong again.

Well, I’ll add that to my other apology. Until someone deals the final blow and proves me wrong about ATI too, I hope you at least see where I was coming from. It was a rather unfortunate post either way, I guess I just panicked.

At 8Mtris/s, I’d panic too. :slight_smile:

Hmm. I just looked through some code and I am already doing all these things.

Yet threaded optimizations still seems quirky. When set to “Auto/On” the OS reports between 58-61% busy (dual core CPU). When set to “Off” the OS reports between 8-11% busy. It looks like the driver is hogging all the resources of 1 core (50%) with threaded optimizations enabled. In other words, it seems the driver is running a thread that is implemented with “busy-waiting”/“polling”. This happens regardless of whether anything is rendered, only the front and back buffer is swapped as far as I can tell.

What API have you used to create and manage your GL context CatDog?

Your description fits exactly to what’s happening with threaded optimization going wrong. (Lord crc, maybe you are talkting about some other problem?)

Hm, if you really do everything on the list, there must be something else. What a pity, I thought I had tracked it down. But somehow I managed to get rid of the lagging.

Ok wait, there’s two more things:

  • Don’t use GL_XXX_STRIP (use GL_TRIANGLES and GL_LINES instead)
  • Optimize batches using the Forsyth method

In fact, the problem was gone, after I made some HUGE changes to the entire VBO layout, as described in this thread. So I don’t know exactly, what helped.

I’m using plain Windows API for the setup and everything. No other libs involved here.

CatDog

edit Btw, the current version of Google Earth suffers from this also. Here, after some varying period of time, one of the cores goes to 100% and stays there. It’s the nvogl.dll running for life. So, we are not alone.

It sounds ridiculous to have to avoid and conform to so many things just to get it to work right… is it safe to say that it’s best to just disable it?

I’ve noticed something similar (on Linux anyway) when using glFinish() + vsync. The driver spinlocks on glFinish, until it is ready to perform the swap. If I use glFlush instead, it passes right through onto the glXSwapBuffers function (which performs an implicit flush anyway…), which does NOT spinlock.

But isn’t that what glFinish() is supposed to do (not returning until all GL commands are finished)?

CatDog

You’re right. But shouldn’t it use some form of polling instead of spinlocking on the CPU, or at least sleep? I’ve found that by loading in my own custom sched_yield() which performs a very small sleep, the CPU usage drops to 0%. :smiley: I suppose the tight loop they used was to ensure MAXIMAL performance…

Well, don’t know if anybody is interested, but I found out another one: two sided lighting is evil.

  • Use VBOs for vertex arrays
  • Always interleave vertex attributes <-> never mix VBOs (e.g. one VBO normals, the other one texcoords)
  • Use indexed primitives
  • Regardless of which glDrawXX()-command is used: never exceed MAX_INDICES or MAX_VERTICES
  • Don’t use GL_DOUBLE
  • Don’t use immediate mode
  • Don’t use GL_XXX_STRIP (use GL_TRIANGLES and GL_LINES instead)
  • Optimize batches using the Forsyth method
  • Don’t use two sided lighting

To make it short:

  • Don’t use OpenGL at all, except for pushing your preformatted and optimized vertex arrays over the bus. Do all driver work by yourself.

CatDog

In other words, “know your hardware”. Does not look like shocking news to me… Want it to be fast - know what’s the hardware and what happens in your code (and the code you call into). Sure, it’s somewhat messy in OpenGL, with a myriad of ways of doing the same thing (where usually only one of them is optimal)…

Or, disable that threading stuff in the driver. :slight_smile:

I’ve also found two-sided lighting to be very slow, so instead I branch on gl_FrontFacing in the fragment shader which runs much more efficiently. Various other things are unusable on my card as well, such as changing the polygon mode to lines…