I have a question about global memory access.
I compared two ways to access the global memory.
One is to let all threads access the same location and the other is to let threads access their own memory.
I found the former spent less time.
The former involves memory conflict, why it is faster? Due to cache?