Currently, the graphic programmer must sort the transparent polygons or objects… This is tedious. Could the driver/hardware do this like PoverVR or Dreamcast?
No, the driver, nor the hardware, can’t do this for you. Because of the way OpenGL works, this is impossible. OpenGL is not a scenegraph API, more or less meaning it only knows about the current primitive being drawn. To sort the primitives in a scene, it has to know about all the primitives, but as I said, OpenGL does only know about one single primitive, and that’s the one being drawn.
This can only be done by a high-level API, which deals with whole scenes, not just a single primitive.
I don’t think it’s impossible in principle for an immediate-mode API. I’m a bit hazy on the details, but I think ILM’s REYES renderer featured something they called an “A-buffer”, which maintained something like a linked list for each framebuffer pixel, including both colour and depth for each fragment listitem. Once rendering is complete, the list is “collapsed” correctly to a single pixel value.
Okay, it eats memory like crazy, but then they used to say the same thing about Z-buffering, and RAM’s getting cheaper all the time. Nasty memory access patterns too, but as average triangle sizes are shrinking toward the 1-pixel mark anyhow it’s not as insane as it might appear.
Don’t know how Dreamcast/PowerVR work, but IIRC they’re tiled architectures, which would reduce the memory-hogging problem. Maybe they do something similar.
I just remember there is a presentation over at NVIDIA describing order independant transparency. It’s not as easy as just tossing geometry to OpenGL, and it requires a GeForce3, but still it’s order independent.
The biggest problem with fully-general order-independent transparency in the API is that it requires a (literally) unlimited amount of memory. There is no algorithm, and there cannot be an algorithm, that handles arbitrary depth complexity fully correctly in a system with limited resources, and does so transparently within an immediate-mode API.
So, any architecture that claims to handle it has undocumented limitations.
As an example, you could never implement the GF3 algorithm transparently in OpenGL without memory requirements blowing up – it requires that the whole scene be batched up.
Agreed, but so what? You could state just as truthfully that perfect, fully general depth-buffering requires a (literally) unlimited amount of memory. That doesn’t stop it being damned useful, and people use it, and deal with the artifacts thrown up by the necessarily imperfect implementation.
On the whole, the “correct” colour value for a layered set of transparencies isn’t intuitively obvious. It’s only the really crass errors that stand out, and I think there’s hope for a finite-memory algo capable of correcting these.
The main problem with any traditional z-buffer based immediate mode rendering system is that it is functioning in immediate mode. It is, by definition, rendering every triangle you send it as you send it. OpenGL somewhat requires this functionality in the spec, though you can get around it by deciding not to actually scan convert anything until a glFinish is called.
Matt is right about memory cost increasing, but, honestly, that is acceptable. Clever card manufacturers don’t need to have 5+GB/sec bandwidth in their on-chip video memory. They could decide to use a virtual memory approach as described by John Carmack in one of his older .plan files, which would free up significant portions of bandwidth.
32MB on top of the usual 64MB that are allotted to textures is perfectly fine for even a highly complex scene. 32MB is sufficient for storing 500,000 polygons per frame (16MB for the currently rendering frame, 16MB for the frame being currently constructed, assuming 32bytes per polygon). And a 128MB card would allow for even more vertex data.
Not only that, you get all the added benifits of using a deferred-tile-based renderer (knowing that nearly 100% of your fillrate is going towards your scene, etc), so you can do more with even a 500MPixel than a 2GPixel fillrate (which translates to more cost of hardware going into larger memory pool).
But… How can I order transparency correctly and fast without a GF3 card? Perhaps using any kind of destination alpha mode? Currently, for my 3d engine I draw in back-to-front transparent objects ( that can’t be overlapped ) with a hash-table. I use depth test with forced 2 sided object ( no cull ), but this produces some artifacts…
“How can I order transparency correctly and fast without a GF3 card?”
You say that as though a GeForce3 is going to help you. It isn’t.
That said, any sorting optimizations are very situation dependent.
Well, why can’t create something like z-buffer for transparency? I think it will be cool.
It would be cool for us developers but the main argument against, I suppose, is that it’s hardly feasible now in hardware. This kind of buffer would in fact be some sort of matrix of sorted lists (one for each pixel), with for each element of a list, color+alpha, depth and blend func. The frame buffer would be finally “flattened” on a glFlush/Finish. On the other hand, implementations could limit the maximum size of these lists.
Apart from the amount of memory that this would take, this kind of data structure would surely be extremely cache-unfriendly!
Just build a bsp for the transparent polygons and the problem is fixed.
Order independent transparency can be done
in a multi pass algorithm, based on an
entended OpenGL depth buffer model including two z-buffers and two z-tests.
If anyone is seriously interested (you work
for NVIDIA, or whatever) … drop me a line:
Being an immediate mode API means OpenGl has to process polygons as they are sent but this doesn’t mean it has to write them to the frame buffer a they are sent. They could be chaced in a display list structure like with powerVR cards or cached for transmission across the network in a distributed renderer. Of course all this is invisible in the driver.
It might be nice to give the option to automatically defer/sort blended polygons when you compile or render display lists? Then cards that can do it in hardware can use hardware and other cards can do it in software.
There are limitations on the number of layers of transparency you could have in such a scheme but they are negligible. (e.g. I think the Dreamcast powerVR chip can only correctly handle 64 layers of blending per pixel - but you can just cull the deepest/least luminant, layers and it won’t be noticable).
I think would be an excellent addition to OpenGL!!!
Ok so I know there are memory/speed issues but if there was a glEnable(GL_DEFERRED_TRANSPARENCY_SORT) it would be a real advantage for most users. The default would be for the opton to be disabled and novice users can be told to enable it. Those who don’t really care about having ultra performance would get the correct results and rest can do there own sorting. I think most users are not necessarily hard core graphics people and so long as it works that good enough for them.
This would also allow OpenGL to have improvements from new sorting techniques as they arise and the PowerVR crowd to do all the sorting on the card. I’m sure NVidia, ATI, etc. would like the ability to tweak the sorting for their card to improve benchmark performance, etc.
As for me, there is no serious need in that thing. Sorting can be easily done by application - why to make hardware more complicated?
Yeah, that will be very cool for small demo programs, but if you are developing serious allpication, based at object-data, then self-sorting is not a problem.
Well it is potentially more efficient for the driver/card combination to do the sorting. Also, sorting in the application is more limiting in terms of the amount and complexity of transparencies you can have.
“As for me, there is no serious need in that thing. Sorting can be easily done by application - why to make hardware more complicated?”
Because hardware can sort per-pixel. To do that in software, you have to split up every single polygon in cases of intersection. If the Kryo III had the feature-set of a GeForce3 or Radeon 8500, I’d actually consider buying one just so I don’t have to learn some polygon depth sorting algorithm.
Yes, Korval, I agree with you. Hardvare can sort pixels. But this is really needed only when transparent objects are intersecting! Say, who is making such things? I can hardly immagine scene that has intersecting transparent objects(maybe nice transparent ater, and nice transparent ship on it).
I can think of a more simple example! An opaque ship in translucent water: you always need to draw the ship first right? Well threre you go. Now what if you have a big dynamic system and there objects that could be in the water, completely submereged or not submerged at all. And if parts of the objects are blended… It rapidly becomes a headache.
Plus I don’t want to render objects in depth order I want to sort objects by material/shader properties for efficiency. So if I want complicated transclucent shaders that is absolutely going to kill performance unless the hardware helps out.