Unity + Vulkan - SIGSEV thrown by prebuilt binary

Greetings!

Our company uses the Unity engine (more precisely, a Unity pre-built binary with a Vulkan back-end) in order to simulate robotics hardware. For technical reasons, we need to use Vulkan as some of the features we use are not supported by the OpenGL back-end. Up till now we’ve been emulating in-house, running simulations in our own set of servers.

I’ve been tasked with moving emulation to the cloud, so I’ve been trying to spin up an AWS instance that can run our unity binary. However, the binary terminates with a SIGSEV on the cloud server. Upon inspecting with gdb the error seems to stem from the Vulkan shared library:

Thread 1 "swemu.x86_6" received signal SIGSEGV, Segmentation fault.
0x00007ffff063d49a in vkGetDeviceProcAddr () from /lib/x86_64-linux-gnu/libvulkan.so.1
(gdb) backtrace
#0  0x00007ffff063d49a in vkGetDeviceProcAddr () from /lib/x86_64-linux-gnu/libvulkan.so.1
#1  0x00007ffff6a76c65 in vulkan::LoadVulkanLibraryPhase3(VkInstance_T*, VkDevice_T*) ()
   from /home/ubuntu/{...}/UnityPlayer.so
#2  0x00007ffff6a27687 in vk::Initialize() () from /home/ubuntu/{...}/UnityPlayer.so
#3  0x00007ffff6a30c79 in CreateVKGfxDevice() () from /home/ubuntu/{...}/UnityPlayer.so
#4  0x00007ffff69a3c7e in CreateClientGfxDevice(GfxDeviceRenderer, GfxCreateDeviceFlags) ()
   from /home/ubuntu/{...}/UnityPlayer.so
#5  0x00007ffff6e91c4b in CreateGfxDevice(GfxDeviceRenderer, GfxCreateDeviceFlags) ()
   from /home/ubuntu/{...}/UnityPlayer.so
#6  0x00007ffff6e91fb1 in InitializeGfxDevice() () from /home/ubuntu/{...}/UnityPlayer.so
#7  0x00007ffff6d43a62 in InitializeEngineGraphics(bool) ()
   from /home/ubuntu/{...}/UnityPlayer.so
#8  0x00007ffff6d5211a in PlayerInitEngineGraphics(bool) ()
   from /home/ubuntu/{...}/UnityPlayer.so
#9  0x00007ffff6f31efc in PlayerMain(int, char**) () from /home/ubuntu/{...}/UnityPlayer.so
#10 0x00007ffff5b6a083 in __libc_start_main (main=0x2010e0 <main>, argc=1, argv=0x7fffffffe3c8, init=<optimized out>, 
    fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe3b8) at ../csu/libc-start.c:308
#11 0x0000000000201029 in _start ()
(gdb) quit

Frames #1 onward seem to belong to the Unity binary, which might be calling vkGetDeviceProcAddr wrongly. I’ve put effort into making the AWS instance environment match as closely as possible what we have in-house:

  • OS: Ubuntu 20.04
  • Kernel version: 5.4.0-xxx generic
  • Nvidia Driver Version: 455.23.05
  • CUDA Version: 11.1
  • Vulkan Instance Version: 1.2.131

Major differences between our in-house servers and the cloud setup:

  • AWS server is virtualized, our servers run on bare-metal
  • AWS server uses Tesla T4 GPUs, our house servers use Tesla K80

We also routinely run the emulator on our laptops, which sport GeForce GTX 1650 Ti cards. I’ve ran MD5 sums on libvulkan.so.1 on both our servers and the AWS instance and they seem to match. For some reason, the call done by the Unity binary to vkGetDeviceProcAddr() fails on the AWS server, although it runs healthy on our local setups. I feel a bit lost regarding where to look. Perhaps someone on the forum has experienced something like this before? Any hints would be appreciated!!