question about global memory access

enjoyOpenCL · August 20, 2014, 10:23am

Hi,
I have a question about global memory access.
I compared two ways to access the global memory.
One is to let all threads access the same location and the other is to let threads access their own memory.
I found the former spent less time.
The former involves memory conflict, why it is faster? Due to cache?

Many thanks.

CL

Dithermaster · August 21, 2014, 4:55pm

In modern hardware, the case of all work items reading the same location (known as a “broadcast”) does not cause a conflict and is therefore not serialized (and so fast).