OpenXR without rendering loop

Hi there, I’m new to OpenXR and am attempting to build a native application that drives OpenXR enabled haptic devices. This application is meant to run in the background and so I do not need or want a rendering loop. There should be no image sent to the HMD. What’s tripping me up is Section 9.2 of the spec which reads:

During XrSession creation the application must provide information about which graphics API it intends to use by adding an XrGraphicsBinding* struct of one (and only one) of the enabled graphics API extensions to the next chain of XrSessionCreateInfo.

Has anyone had any success in building headless OpenXR applications? Do I need to simply pass a graphics api in, but never call xrBeginSession? My thinking is this would not work based on the session lifecycle section which states:

When the runtime determines the application is eligible to receive XR inputs, e.g. motion controller or hand tracking inputs, it notifies with XR_SESSION_STATE_FOCUSED state. The application can expect to receive active action inputs.

Because this is a background application, I need to be able to send/receive information from the controllers even when the application is not focused. Is this something OpenXR is just not capable of? Is there some other way I should be attempting to communicate with controllers?

After some more reading, this looks like the extension I want:
https://www.khronos.org/registry/OpenXR/specs/1.0/html/xrspec.html#XR_MND_headless

However, it looks like this extension may only be supported by the Monado runtime…this would be a good extension for all runtimes to implement I would think

Yeah, the base spec only has support for exclusive foreground applications.

In my opinion, this is a huge limitation with OpenXR currently. There are many different application types that don’t fit this model.

  • Headless applications that don’t need to render, but can always receive input.
  • 2D overlay applications with decoupled on-demand rendering that may conditionally take over input.
  • 3D overlay applications that sync rendering with the main scene application, and could either take over input or share it with main application.

There are extensions for both headless and overlay applications, but currently only Monado seems to support them. It would be nice to have a an interface to query support for and specify different application types, ideally in a future version of the core spec.

I would agree that this seems to be a huge limitation with the current spec. From browsing the forums I’ve seen a number others requesting this overlay on-demand rendering capability specifically. I’m in that first category. I may try out Monado and see how that goes. I’d really like to see this extension upgraded to a KHR or EXT and supported by other runtimes

SteamVR just added support for XR_MND_headless in the latest beta.

Good news! I’ve actually already switched to using OpenVR/SteamVR because OpenVR already provided an easy route for writing headless applications

1 Like

I would personally really like headless plus an API for reproducing the required rendering on the client side, or shader injection into the server side (or both) because OpenXR to me looks like it could be about 50% to 75% more efficient and is basically leaving a lot of GPU power on the table, meaning that end-users have to buy way more expensive systems, and even if they do they don’t get to leverage the full power of their systems. (Edited: I know on low power systems the differences can be really dramatic, but on beast systems I don’t know if the multiple full-screen copies of huge textures become trivial. There’s nothing in VR that shouldn’t work on very inexpensive systems if the implementation is streamlined.)

FWIW as an owner of a WMR set who has never seen a reason to install Steam (which SteamVR depends on) I run into a lot of scenarios with pseudo commercial VR apps that just don’t work with Microsoft’s runtime, and I’m not sure how SteamVR interacts with it, but the one time I did install SteamVR to try it, I found that it was very bloated and like 2 or 3 times slower than the basic WMR runtime.

In this case the headless extension is meant for applications that don’t render anything to the headset at all.

With client and server, do you mean applications running on the same system, or or a remote split rendering solution?

Off-topic reply: Yes I mean, what’s needed for efficiency (which always comes later, look at D3D12 and Vulkan) is the option to choose headless and implement the drawing of the final image manually. It’s a trade off of more work for better performance, that will lower the bar to VR in terms of what consumer devices can reasonably implement it (more affordable devices.) If there was a performance bottleneck in the pose and controller parts of OpenXR then it too could have an equivalent system to reduce the number of layers in the implementation, to be closer to “metal”. Right now doing full copies and holding multiple screen buffers is very costly when VR barely works with the highest-end GPUs at the same level of non-VR. That’s mainly because the API is not designed for efficiency, and arguably is designed for monopolies to retain control of their devices instead of taking the same route of treating VR sets like monitors with 6DOF controllers attached. In many ways OpenXR is a giant power grab that’s designed to satisfy all of these power players so they can maintain a walled garden commercial incubator for VR. But eventually it won’t be that way. Another approach to better VR would be to bypass OpenXR by making something like Monado expose the implementation layer so that apps can hook into it, but really OpenXR has data structures for reporting meshes and so on, and there’s no reason it couldn’t provide necessary information for the client to render their own low-level swap-chain.

For the record I just wanted to raise awareness for where “headless” can lead us, and not derail this topic.

With client and server, do you mean applications running on the same system, or or a remote split rendering solution?

Sorry I guess should’ve answered this directly. By “client” I meant the usual Khronos/OpenGL meaning of something the user code software owns and manages, as opposed to (server) that is resources/logic on the driver/GPU’s side. E.g. a “client” memory buffer in system memory. (Technically speaking a GPU is a remote/split rendering solution. Sorry for confusion.)

That is an interesting idea. It would require a lot more functionality to be implemented on the application side though. Currently, the runtime has to take care of compositing system UI, reprojection, lens distortion compensation, and frame timing. The latter three have to be implemented with care to not cause motion sickness, and adapt to the hardware differences with each headset. Originally the Oculus SDK and SteamVR worked by exposing the headset as a regular monitor and having the application render to it, but from what I understand it didn’t allow reprojection, so it couldn’t be used comfortably.

Off-topic: Yes (like I say) I think a simpler solution (both could coexist) is to provide a way to merge a “client” effects pass shader into the shaders that generate the final image as sent to the VR display, since all nontrivial applications usually require some form of effects pass at some stage that can be rolled into the full screen copy that the runtime does. But also SteamVR (for example) does a lot of unnecessary, very expensive processing that just doesn’t make sense on low-profile systems. That alone cripples them, making them unsuitable for VR when they’re perfectly capable of doing VR many times over… especially if your idea of a VR game can look more like Starfox than something extremely graphics intensive. (Edited: I tend to prefer a simplified presentation in games because modern day games cause sensory overload for me and I don’t even know what’s happening.)

I see where you’re coming from, but I don’t really think it’s as much of a power grab as you suggest. At least, not at the level of the engineers who are actually writing the spec. The reason the input is so abstracted is so that software written today can run on obscure hardware, homemade hardware, hardware that doesn’t exist yet. There’s a lot of experience out there in various vendors writing earlier APIs so everybody knows the errors to not repeat, etc. A guiding principle has been preservation and compatibility of apps/experiences/content, and the knowledge that eventually, all software becomes legacy software. It’s also focused on allowing for innovation on the runtime/device side, which is why e.g. the runtime is the one that allocates the swapchain images (but according to the wishes of the app) - it might allocate them in a different API, etc. so it can avoid unnecessary copies that SteamVR must do because of its more permissive API.

I won’t say the rendering extensions are perfect or without opportunity for improvement. I’ve got a list of errors/flaws in the two Vulkan extensions and am hoping somebody will bite the bullet and write the last base vulkan enable extension…

I’m not sure that anybody will take you up on letting you manually render to the headset yourself without their compositor involved, and such a thing wouldn’t even be possible for very-remote rendering setups/streaming, etc. The closest I see to the “walled garden” you mention, is that every vendor is very aware that the home screen and system UI is very important to user experience, and a user’s experience in all apps reflects (for better or worse, whether or not deserved) on the device and runtime. The apps are also interested in quality of experience, but the difference is that for many games, when they ship or shortly after, the team may disband, while the runtime is generally under continuous development.

Some of the need for e.g. GTX970+ GPUs is for pre-emption of increasing granularity, which is pretty important for hitting frame deadlines on head-mounted displays. If you’re in a CAVE-like system where the display isn’t moving with your head, you don’t really need time warp or pre-emption so you can get away with older architectures.

I’m afraid we’re derailing this topic, but I just want to say by “walled garden” I don’t mean to impute anybody or come across as “there’s a conspiracy man” but I will say that technically a VR set is a monitor, and technically its sensors are not different from game controllers, and OpenXR is doing things completely different from how we did things just fine for decades up to now (everything was done independently) (and OpenXR is thinking more like “metaverse” which is not a very “open” way of thinking about VR) (so there’s clearly commercial incentive for lock-down and tying hands for non-commercial inclined developers) and finally there isn’t any technical “compositing” needed for VR because it’s a full-screen experience. Therefore it should be simpler than drawing to a desktop window, not more complicated.

I think desire for games to be more performant will force it to open up as it matures, unless big game companies make backdoor deals to use private solutions provided by the device manufactures. That’s why D3D12 is so low-level now, and it will happen with OpenXR as it gets more robust, or it will be replaced by something that is more low-level as an alternative solution, because GPUs are very limited in what they can accomplish. We’re always trying to squeeze blood from a stone, it will come to pass. Right now VR is very primitive, and hardly proven, and hardly utilized. So it’s natural for OpenXR to be more conservative, especially since it’s new and everyone has to be able to implement it.

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.