Atomic counters and branching

Consider the following compute shader being executed as 1000 workgroups with the single invocation in each.

layout(local_size_x = 1) in;

layout(binding = 0, offset = 0) uniform atomic_uint counter; // = 800 before glDispatchCompute
const uint counterMax = 1000;

void main() {
    uint initialCounterVal = atomicCounter(counter);
    if (gl_GlobalInvocationID.x >= counterMax - initialCounterVal) return;
    // I expect only 200 invocations with global IDs [0, 199] reach here

Is it guaranteed that the counter after such compute call will be equal to 1000, and the expectaion from the code comment is correct?

If yes, let’s imagine a hypothetical low-end GPU with only 10 compute units. How are the compute shaders implemented there?
As I naiively imagine, after the first group of 10 invocations (let’s call them “warp” not to confuse with workgroups) is executed, the counter value will be 810, so the second warp will get 810 in initialCounterVal, …, the 11th warp will get 910. Condition goes true at this point, and invocations with IDs from 100…200 interval don’t reach the increment instruction.

So to ensure my expectation from the code comment to be correct, will the implementation store some diffs of atomic counters for each warp, or how will it handle this situation?

An atomic operation is atomic. There are no concurrency issues, and every atomic operation completes entirely before any other execution agent has the chance to interfere.

But note the first word of that sentence: “An”. Singular

A single atomic operation is atomic. Two sequential atomic operations is not in aggregate atomic. Each individual operation is atomic, but any number of things could happen between them.

Every single invocation could get the same value from atomicCounter(counter). That is a perfectly valid thing that could happen.

What you seem to be trying to do is to put a cap on the total number of invocations that do some task. To do that, the task must be different from the invocation counter itself. Each invocation should bump the count, but if you want to have a count of the number that do the task, that needs to be a separate counter:

    uint oldInvCounter = atomicCounterIncrement(invCounter);
    if (gl_GlobalInvocationID.x >= counterMax - oldInvCounter) return;
    // I expect only 200 invocations with global IDs [0, 199] reach here

Separate counter for done tasks looks like the right way to go. Thanks!