Why no async_work_group_copy for halfs in core specification

Half data types are in the core OpenCL 1.0 specification as a “data storage format”. However, async_work_group_copy is only extended to work on halfs in the fp16 extensions. Isn’t async_work_group_copy just a bit-wise copy? Why does it need special hardware support to work on halfs?

Is it dangerous for me to just cast my half pointers to short pointers so I can use async_work_group_copy?


It seems OpenCL devices don’t really work on half if they don’t support pf16 extension :

In that case you have to use vload_half to work on a real float from a half.

Yup, which I’m doing in the inner loop of my code to load floats out of shared memory. It was just odd that I couldn’t bitwise copy the half values from global to shared memory. Anyway, I just cast the half pointer to a short pointer and then I can use async_work_group_copy on the data that way, it appears to work fine on my Tesla card. Though it does feel hackish.

Halfs are really nice since my application is bounded by shared memory size, I can effectively double my occupancy by using halfs in shared memory.

You are correct that async_work_group_copy should have been allowed for half type in the core specification instead of associating it with the cl_khr_fp16 extension. This is most likely an oversight. Thanks for catching it. In any case you can make this work by casting the pointers to a short. This is not however an ideal or a clean solution but something that does work.

Also note that only the scalar half type is in the core spec. So async_work_group_copy could only be supported for half type and not the vector variants of the half. The vector variants are enabled by the cl_khr_fp16 extension.