Nvidia bug/question/suggestion

Zeno · October 27, 2000, 9:47pm

Hey all. I apologize for putting so much in one post, but I haven’t had a chance to post all week

First what I think is a bug. This has to do with alpha blending on the GeForce 256 vs TNT2. I’m blending textured, billboarded polygons with blend function GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA. The poly’s are sorted according to window z coord. The texture environment is GL_MODULATE, the filters are both GL_LINEAR, and the texture format is GL_ALPHA. The typical way of drawing the blended polys is to set the depth mask to GL_FALSE and render them after rendering all opaque objects. Unfortunately, this does not work right on the GeForce. Blending still happens, but instead of my texture fading to totally transparent at the edges of the polygons (which it should do), it has an abrupt cutoff to it when the alpha reaches some small but non-zero value. I believe this is a bug for two reasons. One is that it works right on my TNT2 at home, and the other is that doing glDepthMask(GL_FALSE) causes this to happen WHETHER THE DEPTH TEST IS ENABLED OR NOT!! Is this a driver bug or am I just really missing something? I believe the newest drivers are installed (6.18) on both machines.

Second a question. I find myself rendering a lot of 2-d height fields. I.e. for every x,y value, there is an associated z value. Usually, all the z values change every frame but none of the x and y values do. Is there some way that I can have the graphics card keep the x and y values and just send the z values every frame? That would cut bandwidth to 1/3 of it’s current usage, which would be sweet.

Finally, I remember Matt asking for feature suggestions on Nvida graphics cards. Here is one that would be on the top of my list:

Call this extension ZENO_LAZY_SORT_EXT

When GL_BLEND is enabled, every time a polygon gets rendered it’s alpha value is checked. If it is not equal to 1.0, that polygon gets stored on the graphics card in some sorted data structure (binary tree, hash table, etc) based upon it’s window z coordinate. Then, once SwapBuffers() is called, the card can take this cache of poly’s (which is already sorted), put the depth buffer in read-only mode, and draw the polygons. There are SO many benefits to this sort of hardware acceleration:

The user does not have to do the math to transform the polygon’s z coordinate to window coordinates - the graphics card does this anyway (with T&L), so this saves duplicate math.
The user does not have to call a sort routine on these transformed coordinates, which is usually n log n time at best.
The graphics card could do the sorting at the polygon level. Usually when I do this it is at the object level cuz they’re in display lists, so I can’t have transparent 3d objects and have them look right (except maybe spheres).
The user does not have to separate translucent and opaque objects in his scenes data structures.

Is there some reason that I don’t see why this would be an undoable or bad extension? Has it been proposed or does it already exist somewhere?

Thanks in advance for your help,

– Zeno

onyXMaster · October 27, 2000, 11:41pm

Hmm.
In first case - it seems that you have alpha testing enabled.
In second - passing only Z coordinate seems good, but is does not have a universal solution. Where do you expect card to hold data ? For obtaining close results, you should use compiled vertex arrays ( they theroretically can track changes - though they never do ), or vertex array range extension - where you can allocate video or AGP memory and update it as needed to allow hardware to pull the stuff directly.
And about polygon sorting - it breaks the OpenGL philosophy. It’s a better candidate for external library like glut. The only bad thing in implementing this in external library is inability of using the same rendering functions as you’ll use for plain rendering. Also, the SwapBuffers is not the best way, you probably should signal manually when your list is complete.
From the other way - if NVidia does not have anything againist such extension - I will be glad to help with suggestions.

[This message has been edited by onyXMaster (edited 10-28-2000).]

system · October 28, 2000, 7:47am

Problem is, to work with arbitrary shapes
and polygons, this extension needs to know
how to split intersecting polygons. Once you
get there, you’re WAY into scene graph
territory (Inventor, PHIGS, etc) and out of
the land of OGL as I understand it.

mcraighead · October 28, 2000, 10:02am

It does sound like a bug, can you send me the app?
No, not really. Even if you built a VAR buffer with all the x’s and y’s in it already, it wouldn’t help much because the z’s have to be interspersed, and then you end up with noncontiguous AGP writes, which are SLOW. If you have a static heightfield, and you want to minimize bandwidth, I’d suggest using VAR with vertices stored as shorts, not floats, since then you can store a vertex in 8 bytes rather than 12.
The extension you describe would be essentially impossible for us to implement in a way that would be useful for developers. You’re essentially describing deferred rendering, which is a totally different approach to 3D. It has its upsides and it has its downsides, and the short of it is that it needs specialized hardware. (I happen to believe it’s one of those overhyped technologies – its backers will tell you it’s the next big thing, but the only really successful product to ever use it has been the Dreamcast.)

Matt

Zeno · October 28, 2000, 5:26pm

Matt - I can’t send you the full app, but I made a demo that should display the problem (I’ll check for sure when I go to work tomorrow). Where / how should I send it? What all do you want included?
Ok. I didn’t think that there was really anything I could do, but I thought I’d ask.
onyXMaster: I don’t think that something like polygon sorting would be better for an external library. The reason I thought it would be good for hardware acceleration is because ALL the necessary math is already being done by the graphics card. I agree, though, that SwapBuffers wouldn’t be the best way to execute the drawing of the translucent poly’s, it was just meant to be an example. glFlush() or something new would be a better idea.

bgl: You’re right that it wouldn’t be practical for intersecting polygons (if they were both translucent), as it would have to split them or revert to some pixel-level technique. However, I don’t think this case should come up very often, should it? Collision detection should prevent most intersection, and the places where it wouldn’t might not matter. For instance, if you had a polygonal character swimming, the intersection wouldn’t matter because the character is opaque, so the depth buffer would fix intersections when rendering the water. Another case would be trees made of 2 intersecting quads. This also wouldn’t matter, since the billboards are either opaque (leaves and branches) or totally transparent (air). I’m sure there are some situations, though…

mcraighead: I know it is sorta weird because it is not really immediate mode in the sense that drawing everything else is, but think of it more like creating a display list with GL_COMPILE as you draw your opaque stuff. Thought of this way, why would it require special hardware? I would think just a driver re-write (but I don’t know nearly as much about this as you). Here’s what you could do: Have the driver detect alpha != 0 poly’s as they go through (after the stage in the pipeline where projections are done, so you have window z coords). Stick these into a hash table in graphics memory, to be rendered later when the appropriate call is made. Perhaps you would need the user to pre-specify how many poly’s this table should be able to hold so that you could allocate the right amount of memory. Which of the above steps, if any, couldn’t be done without changing the hardware?

Thanks all ,
Zeno

mcraighead · October 29, 2000, 10:18am

A small demo is fine (in fact, preferable).

100% of the steps you propose would require massive changes to both driver and HW. In short, it’s not going to happen.

Display list compilation is a completely SW task, and it’s quite a bit different – it’s more of an API capture mechanism than a geometry capture mechanism. What you propose is a little like feedback, but feedback is also unaccelerated and generally considered (along with selection) to be one of the WORST features in OpenGL…

There are all sorts of other uses for alpha that we’d have to know about in advance. Alpha is just another color component. There’s nothing special about it that lets us detect whether something is transparent.

The other problem is that AGP is, for the most part, a one-way street (this is not strictly true, but I can’t go into the reasons any further). 3D hardware operates on a fire-and-forget model. We give it commands, we tell it to do them, and then we play no further role in carrying them out. The CPU has no opportunity to do anything like building a hash table using intermediate results – for one, the intermediate results are not available, and for another, the CPU has better things to be doing, like game logic.

Matt