Crash in amdvlk64.dll during vkCreateInstance

Hi all,

I’m seeing the below errors collected from customer machines by our crash reporting code; I can’t reproduce it locally because we’re still awaiting our AMD-graphics workstations, and all the ones we own already have some Nvidia or Intel GPUs. The crash rate suggests that it’s only happening on a minority of AMD machines and so I speculate that it’s an old driver that’s the cause.

For now we are only probing for the existence of Vulkan support on the host and using the data as a factor in the decision of whether to implement our future graphics work in Vulkan or whether we have to write both DX12 and Metal code (we need to support macOS).

I’ve also attached our code so that if I’ve made an error in that you can point it out and I can fix it.

Our application is intended for a very broad and non-technical/non-gamer audience so we need to build something that works for everyone without users debugging their own driver installation.

Our code

std::string getVulkanVersion() {
  auto vulkan = LoadLibrary(L"vulkan-1.dll");
  if (!vulkan) {
    return "None (could not load vulkan-1)";
  }
  std::string vkVersionString = "None (all probes failed)";
  auto vkCreateInstance =
      reinterpret_cast<PFN_vkCreateInstance>(GetProcAddress(vulkan, "vkCreateInstance"));
  auto vkDestroyInstance =
      reinterpret_cast<PFN_vkDestroyInstance>(GetProcAddress(vulkan, "vkDestroyInstance"));
  VkApplicationInfo appInfo = {};
  appInfo.sType = VK_STRUCTURE_TYPE_APPLICATION_INFO;
  appInfo.pApplicationName = "Messenger version probing";
  appInfo.applicationVersion = VK_MAKE_VERSION(1, 0, 0);
  appInfo.pEngineName = "philwill-0";
  appInfo.engineVersion = VK_MAKE_VERSION(1, 0, 0);
  appInfo.apiVersion = VK_API_VERSION_1_1;
  VkInstanceCreateInfo createInfo = {};
  createInfo.sType = VK_STRUCTURE_TYPE_INSTANCE_CREATE_INFO;
  createInfo.pApplicationInfo = &appInfo;
  constexpr std::array<const char*, 1> requiredExtensions = {
      VK_KHR_WIN32_SURFACE_EXTENSION_NAME,
  };
  createInfo.enabledExtensionCount = requiredExtensions.size();
  createInfo.ppEnabledExtensionNames = requiredExtensions.data();
  createInfo.enabledLayerCount = 0;

  VkInstance instance;
  VkResult result = vkCreateInstance(&createInfo, nullptr, &instance);
  if (result == VK_SUCCESS) {
    vkDestroyInstance(instance, nullptr);
    vkVersionString = "1.1";
  } else {
    appInfo.apiVersion = VK_API_VERSION_1_0;
    VkResult result = vkCreateInstance(&createInfo, nullptr, &instance);
    if (result == VK_SUCCESS) {
      vkDestroyInstance(instance, nullptr);
      vkVersionString = "1.0";
    }
  }
  FreeLibrary(vulkan);

  return vkVersionString;
}

Stack Trace 1

Crash reason:  EXCEPTION_ACCESS_VIOLATION_READ
Crash thread:  44
Crash address: 0xffffffffffffffff

Operating system: Windows NT
                  10.0.18362 
CPU: amd64 family 6 model 78 stepping 3
     4 CPUs


Thread 44
0 amdvlk64.dll + 0xae51
1 amdvlk64.dll + 0xb091
2 amdvlk64.dll + 0x71f248
3 amdvlk64.dll + 0xb170
4 amdvlk64.dll + 0x71f200
5 RtlpHeapGenerateRandomValue32 ntdll.dll + 0x24
6 RtlpHpLfhSlotAllocate ntdll.dll + 0xcf0
7 + 0x2bbafef0000
8 RtlpFreeHeapInternal ntdll.dll + 0x3f4
9 RtlpHpFreeWithExceptionProtection ntdll.dll + 0x1e
10 + 0x7ff900000018
11 dxgi.dll + 0xf6db
12 user32.dll + 0x28d6a
13 dxgi.dll + 0xc5eb8
14 bsearch ntdll.dll + 0x81
15 RtlpLocateActivationContextSection ntdll.dll + 0x14b
16 + 0x2bbafeb1c10
17 bsearch ntdll.dll + 0x81
18 RtlFindActivationContextSectionString ntdll.dll + 0x138
19 static  sxsisol_SearchActCtxForDllName() ntdll.dll + 0x114
20 RtlDosApplyFileIsolationRedirection_Ustr ntdll.dll + 0x2ba
21 LdrpApplyFileNameRedirection ntdll.dll + 0xff
22 LdrpPreprocessDllName ntdll.dll + 0xa7
23 LdrpFindLoadedDll ntdll.dll + 0xa9
24 vulkan-1.dll + 0x7c74c
25 vulkan-1.dll + 0x7bd2d
26 vulkan-1.dll + 0x76d7c
27 RtlSetLastWin32Error ntdll.dll + 0x40
28 + 0xe0e90ff260
29 vulkan-1.dll + 0x445a1
30 vulkan-1.dll + 0xc6860
31 vulkan-1.dll + 0x46487
32 vulkan-1.dll + 0xbdf38
33 vulkan-1.dll + 0xe3390
34 vulkan-1.dll + 0x303c8
35 vulkan-1.dll + 0xbdf48
36 vulkan-1.dll + 0xbe542
37 vulkan-1.dll + 0xbd298
38 RtlpHpSegPageRangeCommit ntdll.dll + 0x2e5
39 RtlpHpSegMgrCommit ntdll.dll + 0x158
40 + 0x69736e6574784520
41 RtlpHpSegMgrCommit ntdll.dll + 0x10e
42 + 0xe0e90fe8b0
43 vulkan-1.dll + 0x46487
44 vulkan-1.dll + 0x22304
45 vulkan-1.dll + 0x36648
46 vulkan-1.dll + 0x292db
47 vulkan-1.dll + 0x28b91
48 RtlpHpFreeVA ntdll.dll + 0x5c
49 RtlpHpSegMgrCommit ntdll.dll + 0x158
50 RtlpHpSegPageRangeCommit ntdll.dll + 0x2e5
51 RtlpHpSegLfhVsDecommit ntdll.dll + 0xd3
52 RtlpHpVsContextFree ntdll.dll + 0x3d0
53 RtlpFreeHeapInternal ntdll.dll + 0x57f
54 RtlpHpFreeWithExceptionProtection ntdll.dll + 0x1e
55 RtlFreeHeap ntdll.dll + 0x6c
56 vulkan-1.dll + 0x88c90
57 igvk64.dll + 0x47866b
58 vulkan-1.dll + 0x2ff0b
59 amdvlk64.dll + 0xaff2
60 amdvlk64.dll + 0xb170
61 vulkan-1.dll + 0x36f2d
62 wcstombs addon.node + 0xde09
63 RtlpHpFreeWithExceptionProtection ntdll.dll + 0x1e
64 + 0xe0e90ff330
65 KERNELBASE.dll + 0x259fd
66 vulkan-1.dll + 0x2b660
67 vulkan-1.dll + 0x2a08c
68 vulkan-1.dll + 0xb030
69 vulkan-1.dll + 0x2b530
70 vulkan-1.dll + 0xc11c8
71 vulkan-1.dll + 0xc11c8
72 vulkan-1.dll + 0x2b530
73 vulkan-1.dll + 0x2b660
74 vulkan-1.dll + 0x4f40
75 vulkan-1.dll + 0x88d04
76 vulkan-1.dll + 0x22533
77 vulkan-1.dll + 0x9ac0
78 vulkan-1.dll + 0x2fe76
79 vulkan-1.dll + 0x3d225
80 vulkan-1.dll + 0x4fe0
81 + 0x7ff9ab760000
82 vulkan-1.dll + 0x5290

Stack Trace 2

Crash reason:  EXCEPTION_ACCESS_VIOLATION_READ
Crash thread:  55
Crash address: 0x8

Operating system: Windows NT
                  10.0.18362 
CPU: amd64 family 6 model 79 stepping 1
     20 CPUs


Thread 55
0 amdvlk64.dll + 0x2cd6cb
1 amdvlk64.dll + 0xdccb68
2 amdvlk64.dll + 0x2775f0
3 amdvlk64.dll + 0x158b36
4 amdvlk64.dll + 0x18801e
5 amdvlk64.dll + 0x15d270
6 + 0x7ff8db350000
7 amdvlk64.dll + 0x15b45a
  • Use GetProcAddress only to get vkGetInstanceProcAddr. Then use vkGetInstanceProcAddr. Shouldn’t matter in practice, but who knows.
  • You are potentially ignoring alternative VkResults. That vkCreateInstance failed with e.g. VK_ERROR_OUT_OF_HOST_MEMORY does not automatically imply Vulkan 1.1 is unsupported.
  • You could instead do:
    const auto pvkEnumerateInstanceVersion =
        vkGetInstanceProcAddr(NULL, "vkEnumerateInstanceVersion");
    uint32_t version = VK_API_VERSION_1_0;
    if( pvkEnumerateInstanceVersion ){
        VkResult err = pvkEnumerateInstanceVersion( &version );
        if( err ) return "buggy";
    }
    return std::to_string(VK_VERSION_MAJOR(version)) + "." + std::to_string(VK_VERSION_MINOR(version));
    
    That would avoid need for vkCreateInstance.
  • Instance version does not say that much. It can just be simply updated with Vulkan RT. You possibly want to enumerate physical devices and their versions.
  • Your trace does not seem complete. It does not show which function fails; vkCreateInstance assumably? Also how does that guy have 20 CPUs??

I will update to use vkGetInstanceProcAddr, and I hope that’s the problem. It could well be…

The probe treating anything that’s not total success as failure is intentional - if we’re going to write Vulkan code we’re interested in where is it likely to work and not where is it theoretically possible for it to work in ideal conditions.

Similar to the above we’ve been burned before by trusting libraries query-capability routines, which is why we prefer to attempt to do (some of) what we’re interested in doing even though it is more expensive. In this case imagine that on these systems vkEnumerateInstanceVersion had worked - we would falsely think we can proceed to create a vulkan context here when in fact it doesn’t work.

I don’t have symbols for the vulkan-1.dll on the user’s machine so those frames will never be symbolized. This is a release build and probably as a result of that the line number reported for my code is totally useless. Like you I’m assuming it’s in vkCreateInstance.

20 CPUs is pretty normal on workstations - a 1-way Intel 10900X will report like this. One of my colleagues has 56.

Probably not. The vulkan-1.lib likely uses the regular WinAPI call and it has to work.

Just a bad driver or corrupted system.

vkEnumerateInstanceVersion can still be informative, and provide some idea how old the driver is via the patch version.

BTW Instance version is distinct from device version. The instance version can be 1.1 while the device is not.
Also I think the Instance does fail vkCreateInstance if there are no devices. But technically I think you shoud check if there is at least one Physical Device. And you should check Vulkan version of the Physical Device. Vulkan version of the instance is not that useful in of itself.

1 Like