In OpenCL spec, there are 2 versions of this kind of build-in functions for half type. the only difference I found is that they have different requirement of alignment. does it mean that vloada_half() will have an higher performance? And what is the purpose of this differentiation. Thanks!
vload_halfn allow you to load a 1, 2, 4, 8 or 16 component half-vector where the alignment requirement is that p be aligned to a 16-bit i.e. size of a scalar half boundary.
vloada_halfn allow you to load a 1, 2, 4, 8 or 16 component half-vector where the alignment requirement is that p be aligned to the size of half vector. vloada_halfn should, in most cases, give you better memory access performance compared to the unaligned vload_halfn version.