I am checking sycl performance.
- I used
malloc_device
where data transfer takes more time but kernel execution takes less time. - In SYCL USM model (
malloc_shared
), it takes low data transfer time but more kernel execution time.
Hardware: Nvidia V100 32Pcie
Why ?