Failure on USM shared malloc with large sizes

sajjadrsm · June 21, 2021, 12:02pm

Hi everyone,

I have a question about USM shared malloc arguments.

In my program, I use a USM shared pointer with size 2*N*N:

std::vector<float> A(2 * N * N);
auto A_acc = (Array<float, 2, N, N> *)malloc_shared( sizeof(Array<float, 2, N, N>), 
                         deviceQueue.get_device(), deviceQueue.get_context());
new (A_acc) Array<float, 2, N, N>(A.data());

Inside the kernel I reference the shared memory via A_acc[0][0][i][j] and A_acc[0][1][i][j].

My program works when N is set to values smaller than 24 (like 8, 16 or 24), but it fails at size 32.

May I ask if there is any limitations on using USM pointers inside kernels?

Thanks & Regards
Sajjad

keryell1 · June 21, 2021, 3:58pm

Can you post the full example or at least a minimum test case showing the problem, which compiler & system you are using, which accelerator are you targeting? Otherwise it is difficult to help. For example I have no idea about what is “Array” and so on.

sajjadrsm · June 21, 2021, 9:07pm

Thanks for your response. This is how I am defining Array:

template<typename T, int N, int... Rest>
struct Array : std::array<Array<T, Rest...> , N>{
  using std::array<Array<T, Rest...> , N>::operator[];
};

template<typename T, int N>
struct Array<T, N> : std::array<T, N>{
  using std::array<T, N>::operator[];
};

I am using the CUDA backend on DPC++ where the malloc_shared calls cuda_piextUSMSharedAlloc in the Cuda plugin API and then it makes a call to cuMemAllocManaged.

keryell1 · June 22, 2021, 1:44am

I see, your Array is multidimensional array defined by using recursively a std::array.
I cannot see any obvious reason for the failure.
Perhaps some alignment constraints?
For N = 32, this requires 8192 bytes, which might hit a bug when using more than one 4K page?
Anyway, it looks related to a specific implementation with a specific back-end. so I suggest you open an issue on GitHub - intel/llvm: Intel staging area for llvm.org contribution. Home for Intel LLVM-based projects. with a complete example which can compile and exhibits the bug at run time so they can directly try the code.

sajjadrsm · June 22, 2021, 8:57pm

Thank you very much for the pointers.

system · December 19, 2021, 8:58pm

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.