Compute shader synchronization in single dispatch

I am doing a motion vector warp-like algorithm using compute shader, the logic is like:

{
    cur_pos = currentPixelPos();
    cur_rgb = getRGB(cur_pos);
    cur_depth = getDepth(cur_pos); // From a input buffer.
    mv = getMotionVector(cur_pos);
    
    warp_pos = cur_pos + mv;
    warp_depth = depth_buffer[warp_pos]; // Another buffer.
    /// Multiple positions may be warped to the same pixel, 
    /// position with the smallest depth wins.
    if (cur_depth < warp_depth) {
        depth_buffer[warp_pos] = cur_depth; // Keep the smallest.
        rgb_result[warp_pos] = cur_rgb;
    }
}

There is a data race since depth_buffer and rgb_result with the same warp_pos may be updated by multiple shader invocations (from the same or different workgroups). I tried atomicMin() to update depth_buffer and memoryBarrierBuffer() to sync:

{
    cur_pos = currentPixelPos();
    cur_rgb = getRGB(cur_pos);
    cur_depth = getDepth(cur_pos); // From a input buffer.
    mv = getMotionVector(cur_pos);
    
    warp_pos = cur_pos + mv;
    atomicMin(depth_buffer[warp_pos], cur_depth);
    memoryBarrierBuffer();
    if (cur_depth <= depth_buffer[warp_pos]) {
        rgb_result[warp_pos] = cur_rgb;
    }
}

Still I cannot get a stable result. Is there any method to achieve this in single vkCmdDispatch()?