ATI support for fragment.position and copy to depth texture

Hi –

It seems that I’ve run into a couple of cases in which the ATI driver falls into a software path, and I wanted to ask if anyone else had similar experiences and/or any suggestions for working around the following two cases:

  1. Using the fragment.position register in an ARB fragment program. Drawing an object that uses this register in its shader causes the frame rate to drop to a crawl, and then the object isn’t even rendered. I’ve been able to work around this by computing the screen coordinates in the vertex program and passing them through a texcoord interpolator, but that shouldn’t be necessary.

  2. Copying the depth buffer (or part of it) to a DEPTH_COMPONENT texture using glCopyTexSubImage2D(). Doing this also causes the frame rate to drop to a crawl, but then does the right thing. I’m assuming this has something to do with HyperZ compression being expanded in software, but I don’t have a work around. I need a copy of the depth buffer in a texture for some post-processing effects.

I’m using a Radeon X800 with Catalyst 5.6 drivers. Both of the above usages work great on all Nvidia hardware that I’ve tested on.

Do not know about 1).

But for 2) I had same problem once and I know ATI drivers will fall back to software if you use glCopyTex* from depth buffer having antialiasing turned on. If you disable antialiasing in driver the glCopyTex* from depth buffer should be ok and run fast. What I mean is that you can not copy from an antialiased depth buffer.

i’ve experienced another couple of situations during which ati drivers fall to software, these are mainly when you turn polygonoffset on and fetch fragment.position from within a fragment program
this happens on both 9x00 and Xx00 series, oh actually Xx00 series falls to software even when polygon offset is disabled
more than this, some time ago i wrote to devrel telling them that their fragment programs were fetching state.matrices in some transposed fashion and light positions in object space
now they have fixed the matrix bug, but light positions are still buggy
and, another issue i would like to discuss is ati’s low precision for depth maps, doing depth peeling is almost impossible on ati boards, cass’ demo (order independent transparency 2) in nv sdk will show every sort of glitches
my feeling is ati has definitely stopped working on fragment programs, i guess they are concentrating on the shading language though i havent had time to check if these issues are still present with glsl
now let’s face the truth, working with ati boards/drivers is extremely annoying to developers, to say the least
it is a shame that we’ve had fragment programs for more than 2 years now and they are still full of bugs
i am no nvidiot, really, but it is obviously clear that nv drivers are top notch (when compared with ati’s i mean…)
btw eric, wasnt it you who had lots of troubles with ati about the c4 engine?
i guess you’ve been experiencing some funny crashes :slight_smile:
oh, and, congrats for your book on maths and 3d programming, i’ve found it very useful

another cause for fallback to sw rendering seems to be, if point sprites are used, that point size is set to more than 1 and fragment.position is accessed in a fp (actually i think it even happens without point sprites but simple points with a size greater than 1)… guess this register just isnt available in all possible rendering setups :slight_smile:

oh and i did notice the state.matrix.* and light problems in fps as well…

btw eric, wasnt it you who had lots of troubles with ati about the c4 engine?
i guess you’ve been experiencing some funny crashes :slight_smile:
Yes. I sent ATI a report about the driver hangs that I was experiencing, and they actually got back to me uncharacteristicly fast with an explanation of the problem. They even sent me a fixed driver. It turns out that the fragment program compiler was assigning textures to the wrong units when point sprites were enabled, even if I wasn’t actually rendering points with a particular shader. (There have been problems with point sprites on ATI hardware for at least two years – now I just turn point sprites off when I detect that I’m running on ATI cards and take the speed hit.) Apparently, the driver was doing something bad enough to lock up the VPU, and that awesome (yeah, right) VPU Recover mechanism couldn’t handle it, so you had to reboot.

I have to agree with you that Nvidia’s drivers have consistently been much higher quality. Nvidia obviously puts a lot more effort into their OpenGL implementation that ATI does, and they also make an effort to expose functionality at different levels (e.g., fragment programs in addition to GLSL). Every time I try some new technique on Nvidia hardware, it just works, even if I’m using something in a way for which it may not have originally been designed (but still perfectly valid). Nvidia hardware also handles boundary cases extremely well, doing the right thing with whatever you throw at it. Then when I try to run the exact same thing on ATI hardware, stuff breaks all over the place.

My intent isn’t to start a flame war here, so I’ll get back to my original issue. I currently have multisampling disabled when using the fragment.position register or copying to a depth texture, and I never use polygon offset. There shouldn’t be anything special enabled that could cause the driver to drop into software.

The only alternative that I have to using glCopyTexSubImage2D() to get the depth buffer into a texture is to render an additional pass that writes depth into a pbuffer. This is not an attractive option because it would be much slower than just doing a hardware-accelerated copy of the depth buffer after the ambient pass.

oh, and, congrats for your book on maths and 3d programming, i’ve found it very useful
Good to hear – thanks.

Originally posted by Eric Lengyel:
1) Using the fragment.position register in an ARB fragment program. Drawing an object that uses this register in its shader causes the frame rate to drop to a crawl, and then the object isn’t even rendered. I’ve been able to work around this by computing the screen coordinates in the vertex program and passing them through a texcoord interpolator, but that shouldn’t be necessary.
Except the polygon offset that has already been mentioned there’s also glLineWidth() and glPointSize() with anything > 1 that can cause it to go into software.

Originally posted by Eric Lengyel:
2) Copying the depth buffer (or part of it) to a DEPTH_COMPONENT texture using glCopyTexSubImage2D(). Doing this also causes the frame rate to drop to a crawl, but then does the right thing. I’m assuming this has something to do with HyperZ compression being expanded in software, but I don’t have a work around. I need a copy of the depth buffer in a texture for some post-processing effects.
With multisampling we always go to software. Without it we can currently work in hardware on R300, but not R420. To get hardware accelerated depth, regardless of multisampling and hardware, you can render to depth with FBOs. It may not fit perfectly into your situation since you apparently want the main depth buffer, but I think it should at least be faster than outputting depth to a color buffer.

Thanks for the tip on the fragment.position register – it enabled me to find the cause of the software path being taken. I never changed the point size using glPointSize(), so it was always 1.0, but I was enabling GL_VERTEX_PROGRAM_POINT_SIZE_ARB globally. Turning that off fixed the problem, but it really ought to be ignored if I’m rendering any primitive other than points.

That’s bad news about the depth buffer copy. I’ll have to find some kind of alternative that hopefully isn’t too costly.

Originally posted by Eric Lengyel:
but it really ought to be ignored if I’m rendering any primitive other than points.
I agree, but it’s unfortunately not as easy as it may look at first. The problem is that with a call to glPolygonMode() your triangles can suddenly be lines or points.

There is also glEnable(GL_LINE_SMOOTH) and glEnable(GL_POINT_SMOOTH).

Why do all of these things cause it to run in software? It seems to happen when shaders are used.

It’s because these features use shader resources.