ATI, GL_NV_Fence, GL_Sync, and Threads..

qzm · September 16, 2008, 1:38pm

I find myself in a difficult situation.
The short version, NVidia cards work, ATI does not, we must be doing something wrong?

We have quite a bit of code that uses OpenGL for offscreen rendering, it renders multiple frames to renderbuffers, and we read them back off the card - not an uncommon situation these days.

Now, because we like to overlap workloads, and the rendering can quite often involve layers of sub-buffers, we need a LOT of synchronisation in render stages. We use threads quite heavily and work hard to overlap work.

On NVidias OpenGL implementations, we can use GL_NV_Fence to check the status of work and therefore avoid a lot of stalls and busy-waits on the CPU (which are fatal to us).

On ATI There does not seem to be ANY way to do with without either blocking the GPU, or the main CPU. This is a critical failing and I suspect that I am simple missing something.

We have an implementation that ‘Works’ on ATI, but the overhead is about 30%, something we cannot afford.

I would love to hear input from others as to workarounds for ATI, or are they really unusable in such applications?

There was once upon a time a proposal for better Fence type generic extension (GL_Sync), but it seems to have died, Apple have an extension, as do NVidia, it seems that ATI do not when this is so critical in some areas.

Am I an idiot? there MUST be an answer to this…

Korval · September 16, 2008, 5:39pm

there MUST be an answer to this…

Why “must” there? The second sentence of your post was a statement of truth: “NVidia cards work, ATI does not.”

That’s pretty much the beginning and the ending of the OpenGL story right there. If you are on Windows, GL only works consistently on nVidia implementations. ATi treats OpenGL as a second-class citizen, if they bother to treat it as anything at all.

Oh yes, ATi could implement NV_fence or whatever. But they’re not going to. They have a policy of implementing nothing that isn’t actually a core feature (with the exception of a few select ARB extensions).

ATi supports OpenGL like this. They look at what applications they need to support without honking off their customers (ie: gamers). They support just enough of OpenGL to keep those games working. And that’s pretty much it. High-profile upcoming games can push ATi’s GL support, but that’s about it.

It’s best to accept this and move on.

qzm · September 17, 2008, 2:10pm

While I agree with much of what you say Korval (and certainly it is exactly my experience) I do like to try and hope for the best…

We have several ‘real life’ issues.

Firstly, we have clients who have systems running ATI cards for several reasons, including the need to run other ATI-only software (yes it exists, dont ask me why…).

Secondly, it is always good to try and have a second source, should NVidia have other problems (use use a LOT of 9 series cards, non have failed but I wonder).

Thirdly, I like to at least appear to keep an open mind - if ATI has a way to address this, I would love to know, learn from it, and make our applications better.

We certainly have our share of problems with NVidia OpenGL also (thread optimisation causes SERIOUS issues at times, their buffer management can be ‘touchy’) and to be quite frank the support we have received from them on such issues is a joke - again because we are not game developers - NVidias reply is ‘you get support if you buy quadro’. We dont use ANY quadro features, nor need them, so we get no developer support at all?

Anyway, I was just figuring that someone must hit similar problems, or are games really written in such simple manners as to not hit these issues (yes, I know there are few gl games…)

Brolingstanz · September 17, 2008, 2:27pm

Incidentally, DX isn’t even going to see “true” multithreading support until 11 (360 aside). This is becoming a much bigger issue in games, so if you’re a trickle-down theorist you’ll see the writing on the wall there (for what that’s worth to you now).

Rob_Barris · September 17, 2008, 3:57pm

I see that ATI has release phase 1 of their OpenGL 3.0 extensions in the latest Catalyst driver.

http://www.beyond3d.com/content/news/692

So I guess this is a 2.x driver that includes the extensions below - from their release notes:

–

OpenGL™ 3.0 support - Phase 1

This release of Catalyst™ introduces OpenGL™ 3.0 extension support. In upcoming Catalyst™ releases AMD will continue to expand its support for OpenGL 3.0 extensions. The following is a list of supported extensions in Catalyst 8.9:

· ARB_half_float_pixel

· ARB_draw_instanced

· ARB_instanced_arrays

· EXT_texture_compression_3dc

· EXT_texture_compression_rgtc

· EXT_texture_compression_latc

· EXT_texture_shared_exponent

· EXT_depth_buffer_float

· EXT_gpu_shader4

· ARB_map_buffer_range

Glad to see AMD coming along with GL3 support.

Seth_Hoffert · September 17, 2008, 4:08pm

This is good to see - and sooner than expected!

macarter · September 18, 2008, 6:49am

Use a query to implement a fence.

glBeginQuery(GL_SAMPLES_PASSED_ARB, oq);
glEndQuery(GL_SAMPLES_PASSED_ARB);
glFlush();

Poll to synchronize.

for (;;)
{
    GLuint i;

    glGetQueryObjectuiv(oq,GL_QUERY_RESULT_AVAILABLE_ARB,&i);
    if (i) break;

    // short sleep here to preserve CPU cycles
}

With windows XP waitable timers sleeps of one microsecond or less are possible.

skynet · September 18, 2008, 12:04pm

Remove the glFlush(). You wouldn’t need a fence in first place, when you are flusing the pipeline anyway.
Also, instead of polling for the results availability, just query the result. It will automatically wait until the result is available.

macarter · September 18, 2008, 2:38pm

I have been living in the past. :o

You are correct with respect to the glFlush. It should not be necessary since glGetQueryObjectuiv() has an implicit flush.

Querying the results instead of polling has generally been implemented as a busy-wait. Qzm stated this was fatal to his application. I just tested the 8800 GT with the 175.16 driver and found it is not a busy wait. I wonder what ATI is doing…

qzm · September 18, 2008, 2:43pm

While the example above is not mine (and our issues tend to be a little more complex…) One of the interesting things we have found is that very often, situations where GL needs to ‘wait’ seems to be a very CPU intensive busy-wait, this is one of the issues.
Even on a fast quad-core, we can quite easily push 4 cores over 70% utilisation continuously with work (even though only one thread is ‘allowed’ to actually render, we can often find other allowable GL commands to issue in the wait period if we could be realsed to do it…

It is disappointing when we see cpu and gpu performance sitting on the table, and cannot find ways to access it - this have got better, but are still a long way short.

Can anyone comment on why the GL_Sync ideas that were hoping for in GL 2.0 never arrived? it would not seem that fence type operations should be a difficult addition (these things must be at least semi-managed in the drivers anyway or all hell would break loose).

These days it is crazy that an API such as OpenGL ever busy-waits, but it is quite easy to cause that in most drivers.

It is interesting to test the lastest NVidia beta drivers - in our tight timing offscreen rendering tests we are seeing 20%+ performance gains in important places, most it seems from what appears to be better scheduling, and less blocking situations.

Come on ATI (and NVidia to a lesser extent), there is value here, without these types of abilities (fences, better multi-thread operation and stability) it is next to impossible to fully utilise your hardware!

macarter · September 19, 2008, 6:18am

Haste makes waste. My test was in error. glGetQueryObjectuiv() is implemented as a busy-wait on the 8800 GT 175.16 driver. :o

So poll with a sleep in a loop to avoid burning CPU cycles.