ssbo/image load store woes

I’m implementing OIT /w linked lists and I have some really crazy artifacts. On some (majority) of the fragments i get exactly what is expect of this algorithm. On a few of them i’m getting some kind of flickering, which i suppose comes from bad list formation.

I am sure i build/bind/use the ssbo/atomic counter/image buffer correctly from cpu code. Double checked them.

I’m drawing the scene like this (i’m using a single vbo for drawscene to make this as contained as possible, general self occluding geometry):

drawscene()                                              //pass1 
glMemoryBarrier(GL_ALL_BARRIER_BITS);        //overkill, just shader_store | image_access should be sufficient
drawFullscreenQuad();                                //pass2

Pass1 creates the fragments and the list heads for each pixel.
Pass2 reads the heads of each pixel and gets all the fragments in the list
Pretty much basic stuff, classic OIT.

Pass 1 (relevant) code:

layout(binding = 2) uniform atomic_uint atomicbuffer;
layout(binding = 0, r32ui) coherent uniform uimage2D imageBuffer;
struct FragmentData
	vec3 color;
	float alpha;
	float depth;
	uint nextFragment;
layout(std430, binding = 1) buffer FragmentBuffer
	FragmentData fragments[];

void main()
	float alpha =0.3;
        vec3 color = someinnocentbrdf();

       // get list head for this pixel
	uint head = imageLoad(imageBuffer, ivec2(gl_FragCoord.xy)).x;                    // image is initialized to 1200*700*4+1 <- magic value that says end of list, guaranteed to no be reached by counter
        //get fragment number
	uint counter = atomicCounterIncrement(atomicbuffer);                               // starts from 0 each frame
        //write some data for this fragment
	fragments[counter].color = color;
	fragments[counter].alpha = alpha;
	fragments[counter].depth = gl_FragCoord.z;
        //option 1
	uint oneovermax = 1200*700*4+1;                                                          // ignore obvious nonsense if, read paragraph after code section, 1200*700*4+1 is magic value for end of list
	if(oneovermax==head) fragments[counter].nextFragment = oneovermax;
	else fragments[counter].nextFragment = head;
        //option 2
	//fragments[counter].nextFragment = head;                                              // !!always crashes video driver !!
        //store new head for list
	imageAtomicExchange(imageBuffer, ivec2(gl_FragCoord.xy), counter);
	memoryBarrier();                                                                                   //is this really necessary? Shouldn't the atomic exchange suffice? Just added it for good measure,will be cleaned in production code.

Now with option 1 it seems to work. With option 2 i get a guaranteed driver crash (on a Nvidia GTX460M, latest driver). I’m totally puzzled. They should be doing absolutelly the same thing.
I manually checked for a possible memory overflow and i am nowhere near the edge of the buffer. (using like 20% of Alha buffer memory)
In pass 2 i just walk the list … nothing to write home about.

So my problems are:

  • flickering on some pixels
  • crash when writing like option 2

I suppose i miss something on the topic of synchronization (only thing that can be causing bad ordering AND apparently a deadlock). Any ideas? No code needed just a “it’s this way it should be done” or “rtfm @lineX paragraphY” will suffice.