I’m new here and am (probably) looking for SYCL terminology that I’m missing. I need a chunk of memory only one work item can access at any point in time. I’ll provide an example use case, but don’t focus on specifics; it’s the concept I’m after.
Example
Imagine that, as an intermediate step, you need to explicitly compute a (small) matrix product within each kernel, but the size of these matrices is defined at host-runtime.
- the resulting matrix must be explicitly stored within the kernel
- the resulting matrix’s size is not known at compile time, but it is known by the host at runtime
- the resulting matrix is not used in any other work item, and need not be written back to the host
I’d like to avoid allocating extra memory for each work item for this purpose.
What I looked at so far
I found private_memory but as of the writing of this post, it’s use is discouraged, along with the entire hierarchical parallel_for invoke infrastructure.
I took a look at local_accessor, but that system seems to be a manually managed cache at work-group scope. I didn’t find any guarantees on how many work items can access it simultaneously.