I am doing a motion vector warp-like algorithm using compute shader, the logic is like:
{
cur_pos = currentPixelPos();
cur_rgb = getRGB(cur_pos);
cur_depth = getDepth(cur_pos); // From a input buffer.
mv = getMotionVector(cur_pos);
warp_pos = cur_pos + mv;
warp_depth = depth_buffer[warp_pos]; // Another buffer.
/// Multiple positions may be warped to the same pixel,
/// position with the smallest depth wins.
if (cur_depth < warp_depth) {
depth_buffer[warp_pos] = cur_depth; // Keep the smallest.
rgb_result[warp_pos] = cur_rgb;
}
}
There is a data race since depth_buffer
and rgb_result
with the same warp_pos
may be updated by multiple shader invocations (from the same or different workgroups). I tried atomicMin()
to update depth_buffer
and memoryBarrierBuffer()
to sync:
{
cur_pos = currentPixelPos();
cur_rgb = getRGB(cur_pos);
cur_depth = getDepth(cur_pos); // From a input buffer.
mv = getMotionVector(cur_pos);
warp_pos = cur_pos + mv;
atomicMin(depth_buffer[warp_pos], cur_depth);
memoryBarrierBuffer();
if (cur_depth <= depth_buffer[warp_pos]) {
rgb_result[warp_pos] = cur_rgb;
}
}
Still I cannot get a stable result. Is there any method to achieve this in single vkCmdDispatch()
?