Failure on USM shared malloc with large sizes

Hi everyone,

I have a question about USM shared malloc arguments.

In my program, I use a USM shared pointer with size 2*N*N:

std::vector<float> A(2 * N * N);
auto A_acc = (Array<float, 2, N, N> *)malloc_shared( sizeof(Array<float, 2, N, N>), 
                         deviceQueue.get_device(), deviceQueue.get_context());
new (A_acc) Array<float, 2, N, N>(;

Inside the kernel I reference the shared memory via A_acc[0][0][i][j] and A_acc[0][1][i][j].

My program works when N is set to values smaller than 24 (like 8, 16 or 24), but it fails at size 32.

May I ask if there is any limitations on using USM pointers inside kernels?

Thanks & Regards

Can you post the full example or at least a minimum test case showing the problem, which compiler & system you are using, which accelerator are you targeting? Otherwise it is difficult to help. For example I have no idea about what is “Array” and so on.

Thanks for your response. This is how I am defining Array:

template<typename T, int N, int... Rest>
struct Array : std::array<Array<T, Rest...> , N>{
  using std::array<Array<T, Rest...> , N>::operator[];

template<typename T, int N>
struct Array<T, N> : std::array<T, N>{
  using std::array<T, N>::operator[];

I am using the CUDA backend on DPC++ where the malloc_shared calls cuda_piextUSMSharedAlloc in the Cuda plugin API and then it makes a call to cuMemAllocManaged.

I see, your Array is multidimensional array defined by using recursively a std::array.
I cannot see any obvious reason for the failure.
Perhaps some alignment constraints?
For N = 32, this requires 8192 bytes, which might hit a bug when using more than one 4K page?
Anyway, it looks related to a specific implementation with a specific back-end. so I suggest you open an issue on GitHub - intel/llvm: Intel staging area for contribution. Home for Intel LLVM-based projects. with a complete example which can compile and exhibits the bug at run time so they can directly try the code.

Thank you very much for the pointers.