The spec says this about memory consistency:
Within a work-item memory has load / store consistency. Local memory is consistent across
work-items in a single work-group at a work-group barrier. Global memory is consistent across
work-items in a single work-group at a work-group barrier, but there are no guarantees of
memory consistency between different work-groups executing a kernel.
To insure load/store consistency within a work-item, do I need to use atomic operations?
The spec says that atomic operations are atomic for a device. For a GPU with multiple SIMD streams, does this mean that atomics are actually consistent across work-groups, not just within a work-group?
For non-atomic read/write accesses across workgroups, is the result merely undefined, or can it cause a crash (other than due to software not prepared to handle the inconsistency)?