OpenGL 4.5, confusion about the use of barrier() and memoryBarrierShared()

leoner · May 18, 2023, 8:48am

Hi everyone,

While reading “OpenGL Programming Guide 9th Edition” (so up to Opengl 4.5) I stumbled upon this compute shader example (Example 12.11):

layout (local_size_x = 1024) in;

// input and output images
// ...

// Shared memory
shared vec4 scanline[1024];

void main(void)
{
// Get the current position in the image.
ivec2 pos = ivec2(gl_GlobalInvocationID.xy);

// Read an input pixel and store it in the shared array
scanline[pos.X] = imageLoad(input_image, pos);

// Ensure that all other invocations have reached this point
// and written their shared data by calling barrier()
barrier();

vec4 result = scanline[min(pos.x + 1), 1023] - scanline[max(pos.x-1,0)];

imageStore(output_image, pos.xy, result);
}

I am not sure about the comment

// Ensure that all other invocations have reached this point
// and written their shared data by calling barrier().

Does barrier() ensures that? In particular I don’t understand if barrier should be used to synchronize instruction execution across all the invocations within a work group or it should also be used to make memory operations visible to all the invocations within the work gourp (synchronizing execution + memory).

From the barrier() docs

barrier provides a partially defined order of execution between shader invocations.
…
For any given static instance of barrier in a compute shader, all invocations within a single work group must enter it before any are allowed to continue beyond it. This ensures that values written by one invocation prior to a given static instance of barrier can be safely read by other invocations after their call to the same static instance of barrier .

… values written by one invocation…to what? All possible writes? Shared variables, imageStore, …?

Besides, if barrier() ensures that all invocations have written their shared data (so it’s safe to read), why and when should anyone use memoryBarrierShared() ?

Regarding memoryBarrierShared(), from the reference:

In particular, any modifications made in one shader stage are guaranteed to be visible to accesses performed by shader invocations in subsequent stages when those invocations were triggered by the execution of the original shader invocation (e.g., fragment shader invocations for a primitive resulting from a particular geometry shader invocation).

Aren’t shared variables only available in compute shaders? How could subsequent stages access those?

Thanks in advance for the help!

Alfonse_Reinheart · May 18, 2023, 3:20pm

The GLSL specification is quite readable and way more likely to be accurate than the reference manual. It’s also been updated to GLSL 4.60, while the reference manual has not.

It looks like the documentation for the memoryBarrier* functions were all copy-and-pasted from the specification, so errors crept into various places.

In any case, the standard makes it clear that barrier does provide memory dependencies for shared and TCS output variables:

A barrier() affects control flow but only synchronizes memory accesses to shared variables and tessellation control output variables.

leoner · May 19, 2023, 8:08am

Thanks a lot. I guess it’s a good habit to always check the specification when in doubt.

However, I still find it difficult to understand why and when should I use memoryBarrierShared.

From GLSL specification 8.17. Shader Memory Control Functions:

When these functions return (memoryBarrier*), the effects of any memory stores performed using coherent variables prior to the call will be visible to any future* coherent access to the same memory performed by any other shader invocation.

* An access is only a future access if a happens-before relation can be established between the store and the load.

To establish the happens-before relation, shouldn’t I use barrier? And if I use barrier, aren’t the shared variables written before the barrier already safe to read without calling memoryBarrierShared?