Without stating further details about the problem at hand, I need to implement some kind of a critical section.
I have some experience with CUDA and a pattern equivalent to this used to work well:
[[loop]] while(true)
[[branch]] if if (atomicCompSwap(lockRef, 0, 1) == 0) {
// Critical section here... (outside the 'if' statement we would have a guaranteed deadlock if two threads from the same group were attempting to use the same lock)
atomicExchange(lockRef, 0);
break;
}
lockRef corresponds to some 0-initialized buffer element and multiple threads may use the same one.
Unfortunately, when I try to use this in glsl and compile SPIRV, I have deadlocks in an unit test with compute shader.
Here are my observations so far:
If I set group size to 1, no deadlock occurs (expected behavior even if the code was not divergent);
Also no error occurs if I run only one thread group and give each thread the same lock address;
Any other case results in a deadlock. I am especially surprised that individual locks per gl_LocalInvocationID and many workgroups also locks up…
As far as I can tell, the problems are the same on NVIDIA and AMD
Can anyone suggest what am I doing wrong? What is the fundamental difference between CUDA and SPIRV that makes the two use cases different?
Since you’re using Vulkan, I assume that lockRef is a work group shared variable. Well, Vulkan has exactly zero forward progress guarantees (within a work group). So there is no requirement that this works at all.
If you manage to make it work, that’s only by accident. It’s still undefined behavior.
Anyway, since there are no forward-progress guarantees, is there any legitimate way to implement a critical section?
(My actual required use case is within a fragment shader with per-pixel locks. I have found an extension named VK_EXT_fragment_shader_interlock, and it is more or less what I actually need, but it does not seem to be supported on AMD)
Unfortunately, I very much need my code to work on hardware from all vendors and until AMD decides to add support for interlock, it looks to me like I’m kind of screwed.
I’ll be using the modified code for now, since it seems to be working all right, even if it is technically unsafe and if AMD ever decides to support interlock or new GLSL standard creates better guarantees, or some driver update starts crashing/locking my application, I’ll switch to whatever is safer down the line.