GLSL concurrency + depth testing

Hi, I am implementing Per Pixel Linked Lists in OpenGL 4.3 using 2 SSBOs:

// -------------- PPLL Structures --------------- //
                                                  //
struct PixelPointers                              //
{                                                 //
    uint root_ptr;                                //
    uint closest_ptr;                             //
};                                                //
layout( std430, binding=4 ) buffer PixelMap       //
{                                                 //
    PixelPointers pixel[];                        //
};                                                //
                                                  //
struct Fragment                                   //
{                                                 //
    vec3  pos;                                    //
    float depth;                                  //
    vec3  normal;                                 //
    uint  mat_id;                                 //
};                                                //
struct FragNode                                   //
{                                                 //
    uint     next_ptr;                            //
    Fragment frag;                                //
};                                                //
layout( binding=6 ) uniform atomic_uint counter;  //
layout( std430, binding=5 ) buffer PPLL           //
{                                                 //
   FragNode	frags[];                          //
};                                                //
                                                  //
// ---------------------------------------------- //

Notice how I am storing the root of each linked-list of fragments per pixel (root_ptr). However, I am also storing the index of the fragment nearest to the viewpoint (closest_ptr). This is how I store it:

	// ---------------------- depth check -------------------------- //
	                                                                 //
	uint closest = pixel[pixelIndex].closest_ptr;                    //
	if(closest == 0 || gl_FragCoord.z < frags[closest-1].frag.depth) //
		  atomicExchange(pixel[pixelIndex].closest_ptr,index+1); //
		                                                         //
	// ------------------------------------------------------------- //

As you can see, I do my own depth check. I first get the index of the closest fragment (zero means there’s no linked-list) and then use the index to access the fragment node and compare its depth to the current fragment’s depth. Then I atomically replace the pointer if I found a new nearest fragment. However, I get some visual glitches that suggest there’s more than 1 kernel looking at the same pixel. It suggests that there’s two fragments from the same pixel being processed simultaneously and both test as nearest and override the pixel’s pointer(index) to the nearest fragment. The result is that sometimes a fragment from behind is painted instead of the nearest fragment. Let me clarify with an example:

There’s two possible fragments to be written onto a pixel, A is near, B is far. The pixel’s closest_ptrstarts as ZERO. Let’s imagine both fragments start being processed at the same time. They both look at closest_ptrand see a ZERO so they both are going to atomically replace it with their index (a pointer to them), but A does it first and B does it afterwards. The closest_ptrwill actually be pointing to B which is incorrect!

How would I make this “thread safe”?

I found a way to create a sort of critical section:

	uint closest = pixel[pixelIndex].closest_ptr; 
	uint actual_closest = pixel[pixelIndex].closest_ptr;
	do {
		closest = actual_closest;
		if(closest == 0 || gl_FragCoord.z < frags[closest-1].frag.depth)
			actual_closest = atomicCompSwap(pixel[pixelIndex].closest_ptr,closest,index+1);
		else
			break;
	} while(closest != actual_closest);

I’ll explain: I’m overriding a value based on a comparison. So if the value that I used to compare is different than the value at the atomic moment of exchange (actual_closest), I don’t override because of incoherence. If coherent, I override, simply. However, if I couldn’t override, I need to go back and try comparing again with the corrected value (actual_closest) and then override iff coherent. Metaphorically speaking, my kernels were trying to take a shower at random times and sometimes they would take a shower simultaneously which would cause them to have cold water; with this fix, they wait until nobody is using the shower so they all can have hot water :slight_smile:

However, I don’t like this loop. Looks hacky, verbose and dangerous. Please tell me if you find other ways to fix my problem! Or let me know if this fix is actually OK to use.

What you actually want is a lock across a critical section which I don’t think is available so your hack or something similar is the best you can do. You could look at the barrier command to see if offered help.