We have the global, constant, local and the register address space. Are these logical address spaces on the same device memory chip or are they implemented on separate memory chips? For example, since local represents the memory that is accessible by a work group, local memory is mapped on a compute unit’s own separate memory chip. Or is it upto the GPU vendor to come up with the kind of implementation they want?
OpenCL describes a memory hierarchy where each memory has different capabilitites, for example, is it possible to preform memory fence operations for work-items accessing local memory, such that comminication is possible between the work-items in question. However, while OpenCL describes a conceptual device architecture (see figure 3.3 in the specification) it does not require that these map directly 1:1 with a particular device.
It may be reasonable to assume that some implementations, particuarlly for today’s GPUs, will map local memory to a compute device’s scratch pad memory but it is not a requirement of OpenCL, rather an implementation must provide the ability to preform fence operations and allow communication between work-items within the same work-group.
Yes. Thank you.