SYCL Performance

I am checking sycl performance.

  • I used malloc_device where data transfer takes more time but kernel execution takes less time.
  • In SYCL USM model (malloc_shared), it takes low data transfer time but more kernel execution time.

Hardware: Nvidia V100 32Pcie

Why ?

Some context (for those like me that don’t know SYCL):

https://www.intel.com/content/www/us/en/docs/oneapi/optimization-guide-gpu/2023-1/unified-shared-memory-allocations.html

I believe this answers your question.

1 Like