From what it seems the usage of workGroupFunctions in OpenCL kernels, e.g. work_group_scan_exclusive_add pushes the implementation of the underlying algorithm onto the device (if it does indeed support __opencl_c_work_group_collective_functions).
Could someone please verify whether my understanding is correct? Thank you