OpenXR Extension for Body Skeleton Tracking

Full body bone tracking using Intel RealSense, MS Kinect, OpenCV, etc… should be supported through the OpenXR standard extensions.

Two common data formats for user bone positions are BVH and FBX. Either standard should provide a relatively straight forward implementation path to add a new OpenXR extension that would be complementary to the OpenXR hand tracking.

I have seen a few posts on this but am hopeful this can be prioritized into the specification.

What can help us the most, is if you provide description of use cases. We can’t accept design ideas, in general, but use cases are great to know.

For example: do you need the bones to be a “standard” length, or is it ok/better if they actually represent the physical user? Is it ok if the length of bones changes slightly during runtime as the software gets a better estimate of the tracking?

And also, more simply, what do you want to use it for? Avatar display? Interaction? And, how do you accommodate folks whose skeleton might not match your model (giving folks an option to show themselves as without a limb as in reality or with a virtual limb, for avatar display stuff. For interaction stuff, what kind of body interactions would you want, so that we can provide those through the action system so they can be mapped/remapped as desired by the user? Sometimes there it might even just be, I don’t want to stand up and use my feet, let me sit and use my elbows to “kick” the door open or whatever :grin:) This last bit, offering embodied interaction while not excluding folks and blocking remapping, is probably one of the hardest design problems in the space for me.

Thanks Ryan.

Ideally user movement in physical space should be able to map to a VR avatar’s rigging and trigger OpenXR actions.

A few more thoughts

  1. A standard extension to provide to the OpenXR pipeline the user bone positions (in BVH, FBX, or similar bone format) in user local 3D space. Intel DepthSense, MS Kinect, NVidea DepthStream, OpenCV etc…would be able to provide this.

  2. A standard extension for interpretation of user bone movements to trigger OpenXR actions ( a gesture interpreter similar to hand tracking).

  3. A standard extension for applying user bone positions (in BVH or FBX format) to the current VR avatar rigging (in BVH or FBX format). I would leave it up to the implementation to sort out the complexities of this. Dark Souls, Minecraft, Grand Theft Auto, etc… would be able to sandbox the avatar linking.

I think how this is done without exotic camera based devices is people strap auxiliary trackers onto different points of their bodies.

So the simplest starting point, would be a standardized way of approximating a full body skeleton based on this kind of input, and then whatever comes from camera systems could use the same definitions that come from that.

I’ve asked before about pure software solution for extrapolating hand movements into best guess elbow/shoulder positions, so game avatars can have more than the hamburger-helper hands but also not annoy users by getting it wrong or make developers “reinvent the wheel”.

I know how this is typically done on a hardware level. We’re thinking here on the higher level, how to expose this data in a forward compatible, accessibility-compatible, way.

Yeah, I think having cross-vendor “gesture recognition” is valuable but will likely be hard to specify.

The interface of the gesture recognition plugin should be very simple to allow a variety of implementations. The inputs are the users’ bone positions, the output are the actions, OpenXR doesn’t need to include gesture analysis deeper than that.

For the OpenXR rigging plugin use a similar approach. Rigging would require a OpenXR user bone position provider, a game engine which exposes the avatar bones, and a OpenXR rigging plugin to link them. Again keep the rigging plugin interface simple with only user bones and avatar bones, so there is flexibility to have the plugin/game resolve accessibility, non human mapping, and other fun extensions…

OpenXR doesn’t really have “plugins” as you’re describing, sounds more like an engine level thing. I do agree it would be nice to have pluggable gesture detection, and it would be far better to have it in the runtime and thus in the realm of input remapping/rebinding, than in each individual application.