You need to change the signature of your functions to pass pointers to local mem:
void Fun(__local float2* t1, __local float* t2, float eta, unsigned int v)
If I understand you correctly you should:
- Download work-group data into local memory from global memory (preferably in coalesced fashion - I’m assuming you’re targeting gpu architecture)
- Process the data in local memory - by invoking your functions with appropriate pointers
- When done write results to global memory
I don’t quite get “(…) transfer work-groups to local memory”. Work-group is execution, memory is memory. Work-groups consist of work-items executing in parallel. Every work-item from a work-group executes a kernel. Local memory is shared by all work-items in a work-group, while global memory is accessible to all work-items from all work-groups.
In general you should batch and minimize your transfers from/to global memory and work as much as possible on fast local memory.