Better way to do transparency?

Hi everyone, right now I’m depth sorting my models and rendering them from furthest to closest, so that I can use opengl’s built in transparency. It’s working okay, but it breaks down when one model intersects with another or is inside another (imagine a semitransparent sphere bouncing around inside another semitransparent sphere–which one is further back?).

I did a bit of research and found out that OpenRL does real-time raytracing, so that seems to be an option. Would you guys recommend using something like OpenRL? Is it too processor intensive for a game?

There was also a technique out there called Depth Peeling, which looks really cool. It’s a shame that I have to send the geometry for each pass though (and I have a nagging suspicion there’s a way around re-sending like that).

Those two options look pretty promising, but before I choose one of those, what are your thoughts on them? Or do you have any other options that I could investigate?

Thanks a bunch!

There are three possible solutions I know that produce per-pixel accurate transparency (some of them provide this only with some restrictions related to the blending equation).

  1. depth peeling
  2. weighted average blending
  3. A-buffer
  4. OIT using linked list buffers

About the first two, you can read here:
However, the best solution I know is ATI’s OIT solution (option #4) but that’s limited to DX11 class hardware:…sts_forweb.ppsx

Hey, thanks for the reply. I’ve been investigating the first two, and they seem pretty legit. I can’t find any resources on the A-buffer though. Is that the same thing as the z-buffer?

No. A-buffer keeps a list of fragments per sample (e.g. for blending). Z-buffering just gives you a single fragment per sample.

See the first few slides of this presentation:

That’s really cool! Is there any built-in support for this nowadays, since its 4 years later now? Or any other simpler way to do it?

Newer hardware makes it simpler, but no it’s not “built-in” yet.

To give you some ideas, see this:

IIRC, this encodes a linked-list of nodes per pixel, which they use for translucent shadow volumes, but the concept is the same. They did it two ways: frag shader and compute shader.

And there are several folks that have used MSAA render targets to encode up to N levels of transparency for a simple A-buffer, some with reduced X/Y resolution for alpha.

Thanks for the amazing resources! I’m definitely going with the A-buffer, because its 8x faster than depth peeling. Two quick questions:

  1. Is this technique compatible with screen space ambient occlusion, screen space deferred shading, or screen space shadow mapping? I don’t need any of these, I’m just curious.

  2. The powerpoint says that the stencil buffer is being written to (by subtracting from all of the samples) and read from (by comparing the subsamples with the reference value). I thought we couldn’t read from and write to textures at the same time?

First of all, the problem with the A-buffer is that you have several “layers” of the color buffer (based on how many levels of transparency you would like to provide for a single fragment) thus it has a significant memory overhead, that’s why linked list buffers are preferred if your target hardware supports it. I agree, that depth peeling is a definite no from performance point of view but weighted average is also a pretty good drop-in feature for OIT. I don’t want to convince you, but I just wanted to clarify this in order to help your decision.

  1. Theoretically it is possible to do SSAO and deferred shading with A-buffers, but one needs to heavily modify the way how the evaluation of those happens. I don’t know what exactly you mean by screen space shadow mapping but shadow mapping in general is also possible but again, with modifications needed to take all the layers into consideration (however, in order to have transparent shadows, well, that’s a more complicated question).

  2. Reading and writing to the same texture is in fact supported officially only on GL4 class hardware (even though some earlier cards know it, e.g. ATI HD3000). I do not know the exact way how NVIDIA implemented their stencil routed A-buffer but I suppose here they mean stencil operations. If you check stencil operations they indeed allow you to read and write them (stencil test + stencil write operations). The stencil and the depth buffer are such that they allow read-write ops since the very beginning of OpenGL’s history.

All this information is a goldmine. Thanks a bunch!

So I did a bit of calculations, and with a resolution that requires a texture of 2048x1024, I would need 16mb of space on my graphics card per layer. If I was to have 8 layers, that’s 128mb of space… I suppose I could include a setting for how many layers I could use, but you’re right, that is pretty expensive. Maybe I’ll also include a setting to do weighted average for the low-end users.

So people keep saying that linked-list buffers are only available on “DX11-class hardware”. What exactly does that mean if I’m using opengl instead? I’m guessing a lot of DX11 class hardware also supports opengl. What percentage of users do you think has DX11 class hardware?

So people keep saying that linked-list buffers are only available on “DX11-class hardware”. What exactly does that mean if I’m using opengl instead?

That means OpenGL 4.1 or better. Though I’m not sure if unextended 4.1 can do all of the things you need to in order to implement linked lists. If the ARB keeps adding to GL on their previous schedule, then we’ll probably see OpenGL 4.2 at GDC.

What percentage of users do you think has DX11 class hardware?

There’s no way to know for sure, but the Steam hardware survey provides some numbers. How useful they are depends on how much your audience overlaps with Steam users.

Sorry for keeping this thread alive so long, but my research is coming to a close, and I’m about to send out my plans to my group. Would you say everything in this is correct?

So I’ve spent a long time researching and asking around on forums about modern methods of order-independent transparency, and it seems “stencil routed a-buffers” are the way to go. The idea is explained in and

According to…true#Post290535
it has been done in opengl, and can nowadays be done using opengl 3.2’s multisample textures (

There’s an implementation (which only works on Fermi cards) at with some source code, so it may be as simple as using most of their code.

The benefits of this are that we can have models that intersect with other models, and models that are inside other models. In the previous version we had to do things like only rendering the far sides of tunnels, and splitting things in half so they could be in different places in the sorted order.

The costs of this are that for a play area (everything left of the menus) anywhere between 513x1025 and 1024x2048, each “layer” will take 16mb of space on the video card, and usually around 8 layers are used, totalling 128 megabytes. Also, anything behind 8 layers of transparency will be dropped, but that shouldn’t be a problem in our case because in most of our program, after 6 layers of transparency, anything behind them will be impossible to discern anyway.