Compute shader imageStore/Load synchronization

Hey. Im trying to synchronize results between workgroups in compute shader. What im trying to achieve is to count how many voxels are filled in screen-space.
So, i what i need is that write from one invocation should be visible in other invocation with imageLoad, so i can get the proper number of voxels.
Here is the code (doesnt work!):

uint voxel = imageLoad(Map, ivec3(coord)).x;
if (voxel == 0) {
    atomicAdd(voxels_cnt, 1);
    imageStore(Map, ivec3(coord), uvec4(1.0));

And where are those invocations located? If those invocations are generated by another dispatch call, then you need to insert a glMemoryBarrier between the two calls.

If those invocations are part of the same workgroup, then you can use the GLSL barrier operation to ensure that the writes have happened, and that call needs to be preceeded by memory barriers of the appropriate kind to make the writes available afterwards. It’s best to use shared variables for this, but if the memory space is too big, you’ll need to use coherent-qualified image or buffer resources.

If the two invocations are in different work groups of the same dispatch, there is nothing you can do. There is no mechanism to ensure the ordering of execution of invocations between work groups in the same dispatch, and without ensuring ordering of execution, you cannot ensure ordering of reading of any written data.

If all you’re doing is counting the number of voxels, it’s not clear why you need to do any writing other than the atomic increment of the counter. Given some large space of voxel data, each CS invocation ought to be responsible for a specific subset of voxel data. There’s no reason for any CS invocation to even attempt to count voxels from another CS invocation, so it’s not clear why you have this global data at all.

Thanks for reply. I have two invocations in different work groups - so i have nothing to do… That’s not cool…
Maybe you can advice me something. How i can get a list of all voxels (their coordinates) that are filled (not empty)?

I don’t understand how the code you write would accomplish that. The code you wrote (if it worked) would take each filled voxel and mark it unfilled.

If for some reason I wanted to get the coordinates of the filled in voxels, then I’d just have each invocation read their voxel’s data, detect if it’s filled in, and if it is, write that coordinate into an array. The array index being written to would come from an atomic value, so that two invocations aren’t writing to the same value. The atomic value becomes the count for the number of elements in the list.

At no point is any invocation trying to write to the same memory as some other invocation (outside of the atomic, which is fine, since it’s atomic).

I tried using DispatchCompute(SIZE, SIZE, SIZE), where SIZE is volume resolution. Then sample 3D texture in shader using gl_GlobalInvocationID, and if it is not zero — add coordinate to the list.
But this is VERY slow (FPS drops from 100 to 30, and is completely unplayable).
Maybe you know better way?

Better way to do what? I can’t optimize your entire application from a basic description of one component of it.

Optimization at this level is mainly about the algorithms you’re implementing. That is, why you’re trying to compute these voxels to begin with. What do you do with the data? If you’re building this list, how many times do you use it? Is this even a good algorithm for putting in the GPU in the first place?

Basically, there are too many specific details that are missing to be able to suggest anything concrete.

Thanks anyways! I will try to ask on other forums. Maybe change the algorithm, will see.