Consider the following compute shader being executed as 1000 workgroups with the single invocation in each.
layout(local_size_x = 1) in;
layout(binding = 0, offset = 0) uniform atomic_uint counter; // = 800 before glDispatchCompute
const uint counterMax = 1000;
void main() {
uint initialCounterVal = atomicCounter(counter);
if (gl_GlobalInvocationID.x >= counterMax - initialCounterVal) return;
// I expect only 200 invocations with global IDs [0, 199] reach here
atomicCounterIncrement(counter);
}
Is it guaranteed that the counter after such compute call will be equal to 1000, and the expectaion from the code comment is correct?
If yes, let’s imagine a hypothetical low-end GPU with only 10 compute units. How are the compute shaders implemented there?
As I naiively imagine, after the first group of 10 invocations (let’s call them “warp” not to confuse with workgroups) is executed, the counter value will be 810, so the second warp will get 810 in initialCounterVal
, …, the 11th warp will get 910. Condition goes true at this point, and invocations with IDs from 100…200 interval don’t reach the increment instruction.
So to ensure my expectation from the code comment to be correct, will the implementation store some diffs of atomic counters for each warp, or how will it handle this situation?