GPU hangs when rendered to OpenXR DirectX12 runtime


I have a perfectly fine working desktop DirectX12 application that renders stuff using compute shaders and then present it on screen. Recently I decided to add VR support it and natively introduced OpenXR integration. I did some basic test (i.e. render cubes) and it works fine. However when I plug my compute pipeline back into executing command lists my GPU hangs and I see the following error:

D3D12: Removing Device.
D3D12 ERROR: ID3D12Device::RemoveDevice: Device removal has been triggered for the following reason (DXGI_ERROR_DEVICE_HUNG: The Device took an unreasonable amount of time to execute its commands, or the hardware crashed/hung. As a result, the TDR (Timeout Detection and Recovery) mechanism has been triggered. The current Device Context was executing commands when the hang occurred. The application may want to respawn and fallback to less aggressive use of the display hardware). [ EXECUTION ERROR #232: DEVICE_REMOVAL_PROCESS_AT_FAULT]

I have both DirectX 12 debug layer and OpenXR debug layer enabled and they do not show any prior errors to be fixed. Also, when I try to debug app with NVidia NSight it tells me it doesn’t support Direct3D11On12, which I believe is what Oculus runtime compositor was implemented on.

I have the same codebase which toggles VR presentation using compile-time flag and shaders do not contain any presentation dependent parameters, so basically I launch the same compute shader on the same machine with the same input buffers but when OpenXR is enabled, GPU hangs and removes on the first frame.

I’m completely lost on how I can debug this thing and what potential workaround can be.

I’m on Windows 11, Intel 13900KF, RTX4090 if that helps. Latest drivers and latest SDK version from GitHub, I compiled it myself.

I’d greatly appreciate any debugging tips or directions. What I can’t wrap my head around is how OpenXR can even potentially interfere with command list I schedule myself on app side.

“device removed” tends to be “some requirement of the API was violated and so we broke”. I think there are some tools specifically to troubleshoot issues that cause the device removal errors.

That said, my best guess would be something to do with synchronization/barriers or resource state. There may be implicit synchronization introduced by a “present” call that isn’t introduced by rendering to an OpenXR-provided swapchain image, that you need to provide yourself.

You might try using PIX for debugging. I don’t remember if RenderDoc works with the Oculus runtime very well, but I know it works with the Windows Mixed Reality one which has a simulator (no hardware required) mode, which could be another option.

@ryanpavlik thank you for the reponse. turns out openxr added just a bit of workload to expose a initialization issue with one of the shader. took me a while to debug, but I did it!

1 Like