True transparency

Neither of the two sorting methods can use display-lists. (oh well, there’s a trick for sort-by-object, but it’ll waste VRAM).

Btw sort-by-object does polygon-sorting of each object, too! But since the material for the object is the same, you only need to re-shuffle the vtx-indices. Thus you can keep a static VBO with the big heavy vtx-data constant; and only stream an IBO, that contains the vtx-indices in a sorted fashion. You transform those triangles on the cpu, and sort by Z again on the cpu, and finally upload the triangles’ vtx-indices to a streaming IBO. Same stuff, but much heavier and complex, happens in sort-by-polygon.

Now, in sort-by-polygon you have materials clashing with each-other. It’s possible in the z-order to have this arrangement: Mesh7.Triangle99 , Mesh7.Triangle14, Mesh3.Triangle56, Mesh2.Triangle10 . Those Mesh7, Mesh3 and Mesh2 will usually use different textures, different shaders, different shader-uniforms. Ouch :slight_smile: . In the worst case you will have to switch all those states on every triangle drawn.

It’s really not straight-forward. Depth-peeling, and its dual- and octo- versions could be really the easiest way to go, if you want extreme robustness. They’re currently slow and waste a lot of VRAM, though. Sooo, if fidelity and robustness can be sacrificed in your project, look back at PlayStation1. It has no z-buffer. All geometry onscreen is manually sorted-by-polygon. In modern gpus you can translate that system to one uber-shader, one texture-atlas (or preferably a texture-array), and the OT (“ordering table”). (shader-branch coherency is guaranteed in most cases, so a uber-shader won’t be that bad) Or make N passes for the M-sized ordering table, giving N blend/raster-modes and M bins of z-precision. It’ll make the count of raster-mode changes limited to NxM, instead of #triangles. Even depth-peeling can’t give the robustness of having those N modes (example modes: alpha-blend, additive-blend, refraction-mode). Still, it’ll require dynamic creation of streaming VBOs.

Hi Ilian,

How can you use the VBO/IBO solution with different objects? is you have vertices/triangles of mesh1 and vertices/triangles of mesh2? Normally you call mesh1.Draw() then mesh2.Draw(). How is it possible to sort triangles of the two objects together?

Thinking to the past many famous application do correct transparency with OpenGL since a long time. I believe that during the '90 deepth peeling was not available and probably not even VBO and IBO, how could they do correct transparent triangles intersections with plain OpenGL?

Thanks,

Alberto

Thinking to the past many famous application do correct transparency with OpenGL since a long time.

[citation needed] which ones ? In the CAD departement, neither 3dsmax nor blender do this correctly.
Games are often completely wrong as soon as more than one transparency plane are overlaid.

Mmm… ZBuffer, I’ve never notices something so bad in these 3D applications. In your opinion what are they doing for such a good compromise of quality/speed?

Thanks,

Alberto

It has never ever been possible with rasterization. Well, unless they start splitting the intersecting triangles at the intersection line - which is computationally expensive and thus they don’t do it.

Which is only possible thanks to the z-buffer; and nowadays rendering in big batches is the optimal way. But in the past, you would draw triangle by triangle anyway. Transparent objects can’t update the z-buffer, so sorting should be done like in the past. Most recent games generally give-up sorting at all, as ZbuffeR noted. Some just sort objects, without sorting the triangles in those objects. Gamers don’t notice, and in the cases artifacts in static geometry would be really noticeable, it’s the artists’ task to simplify the level.

To do transparency as well as in CADs, you have to dynamically merge geometry of mesh1 and mesh2. As I wrote, you may have to do mesh1.numTris+mesh2.numTris number of state-changes in the worst case. That’s where the specialized drivers for Quadro and FireGL kick-in, at fast state-changes and better performance in sending many tiny batches via PCIe/AGP.

Ilian,

To do transparency as well as in CADs, you have to dynamically merge geometry of mesh1 and mesh2. As I wrote, you may have to do mesh1.numTris+mesh2.numTris number of state-changes in the worst case. That’s where the specialized drivers for Quadro and FireGL kick-in, at fast state-changes and better performance in sending many tiny batches via PCIe/AGP.

Please, help me with this.

Suppose you have a bulding structure made of columns and floor planes.

Normally we loop over the object array and call Mesh.Draw() for each object (in this case we use a display list).

Now Suppose that all the objects are made transparent.

When transparency come into play we need to collect all the triangles from the mesh objects and put them (at each frame) inside a new array, sort them and draw them from farther to nearer after opaque objects: what do you mean state-changes?

Thanks,

Alberto

State-changes happen when you have to use another texture, another shader, other shader-uniforms, etc.
Thus, every triangle from the merged+sorted list should have info about what state it needs.

Really, really, really, look at PlayStation 1 (aka PSX).
Basics of OT (“Ordering Table”):
http://www.exaflop.org/docs/naifgfx/naifsort.html
Extensive info, incl. types of packets:
http://www.raphnet.net/electronique/psx_adaptor/Playstation.txt
(search for “Packet Descriptions” in that file)

Basic description:
http://psx.rules.org/gpu.txt

Ilian,

I will give a look to the links you provided thanks. Just one last question. In the introduction below what are the outhor is referring to polygon[loop].z ? Is this the eye to centroid of the triangle distance or something else? Does a more smart way to compute this distance exist rather then doing d = Sqrt((eye.X-centroid.X)^2 + …) ?

A basic sort algorithm
The simplest and probably slowest routine you could create for your 3d engine would go something
like this:-

  	do
  	{
  		sorted = 1;				for ( loop = 0; loop < num_polys_to_sort - 1; loop++ )
  		{
  			if ( polygon[ loop + 1 ].z > polygon[ loop ].z )
  			{
  				temp_polygon = polygon[ loop ];
  				polygon[ loop ] = polygon[ loop + 1 ];
  				polygon[ loop + 1 ] = temp_polygon;
  				sorted = 0;
  			}
  		}			} while ( sorted == 0 );

It’s the average transformed-to-screenspace Z of the triangle.

vec3 v0 = transform_to_screenspace(triangle[i].vert0);
vec3 v1 = transform_to_screenspace(triangle[i].vert1);
vec3 v2 = transform_to_screenspace(triangle[i].vert2);

float z = (v0.z + v1.z + v2.z)/3;

Wow, in this case you need also to project each mesh vertex with the glu.Project() at each frame!? It is faster than doing d = Sqrt(…)? Maybe it is possible to get the source of the glu.Project() and remove operation on X & Y that are not useful in this case…

Thanks,

Alberto

It is useful for cookie cutter alpha to avoid sorting in a zbuffered scene but it does not “do the trick” for alpha blending. The foremost transparency totally occludes any distant transparency, there is no accumulation of blended results. It is certainly useful in many situations, but the levels of transparency are quite limited in addition to the lack of accumulation.

ROFL

That’s about right, the rest is down to optimization, e.g. you might do better moving the eye into object space.

Now, the whole idea of somehing like depth peeling is it moves the sort problem into the massively parallel pixel domain problem which simplifies the problem but places the burden on hardware features (and image based data overheads).

Most people get by with some cheesy hack w.r.t. transparency that works in most cases.

Zbuffers are great, they solve a lot of problems and make our live easy. Blended transparency takes that wonderful cruch and beats you with it if you care about truly general purpose blending. Most developers just compromise with an implementation that works 95% of the time or perhaps even 100% of the time with their scenarios. This is not necessarily laziness, just a recognition that the ‘correct’ solution is often ridiculously slow for marginal benefit.

Thanks for your clarification Dorbie,

Just to be sure, can you please help me with the correctness of following steps?

  1. collect triangles from the scene objects that have a transparent color
  2. sort them using the following formula
vec3 v0 = transform_to_screenspace(triangle[i].vert0);
vec3 v1 = transform_to_screenspace(triangle[i].vert1);
vec3 v2 = transform_to_screenspace(triangle[i].vert2);
float z = (v0.z + v1.z + v2.z)/3;
  1. draw opaque objects
  2. draw the sorted list of transparent triangles

And:
Does this approach work with legacy hardware?
Is the only hardware accelerated improvement we can apply the use of IBO/VBO that Ilian mentioned above?
Is this solution what the most of CAD 3D applications doing? (not in raytracing of course)

Thanks,

Alberto

Hi Ilian,

Ordering Tables really enlightened me, thanks for pointing me there!

There is one thing I still can’t understand though. If you do:

vec3 v0 = transform_to_screenspace(triangle[i].vert0);
vec3 v1 = transform_to_screenspace(triangle[i].vert1);
vec3 v2 = transform_to_screenspace(triangle[i].vert2);

float z = (v0.z + v1.z + v2.z)/3;

v0.z and v1.z and v2.z are all something like 0.9994885 and if I use this averaged z value to find a position in a 256 elements ordered table:

int positionInTable = z * 256;

ot[positionInTable] = myTriangle;

I only fill the latest element of the table and never all the others.

What am I doing wrong? I am following the instructions found on this page: http://www.exaflop.org/docs/naifgfx/naifsort.html

Thanks again,

Alberto

myTriangle->pNext = ot[positionInTable];
ot[positionInTable] = myTriangle;

Each element of ot[] is essentially the start of a linked-list. You prepend elements to those linked-lists (prepending speeds-up the code).
There’s also the trick (visible in the PSX code tutes) to use a pre-allocated big array to do element-allocation from, instead of using malloc/new.

Hi Ilian,

I probably didn’t explain myself correctly. I was pointing that the z values I got from the gluProject() of my triangles vertices are all close to 1 (as you know depth buffer is not linear) and in this way only the last element in the array is filled ( with a long linked list ).

What am I missing ?

I am converting the depth value to integer simply multiplicating the 0 to 1 depth value by 256 and always getting 255 for all triangles. I can probably resize it to 1.000.000 and get some more element filled but is it the right approach?

Thanks,

Alberto

It’s the projection-matrix that gets in the way, I guess. (honestly I’m very forgetful and bad at the matrix-transforms).
I’d inspect matrix values, refresh on maths and find the way to increase the range.

Ilian,

Are you sure that I don’t need to linearize the depth value ?

The article you pointed me to doesn’t provide any clue on this…

Thanks,

Alberto

On the contrary, it’s better to quantize into ot[] by 1.0/z , to have more layers near the camera (where gamers can notice artifacts more easily)