I’m working on a research project where we use Varjo XR headsets and we process the video pass-through from the headset cameras in various ways before presenting it to the user in real-time. For now, we are using the Video Post Processing API from Varjo SDK. However, besides being not portable, it has its own limitations. This is why we wanted to implement a custom engine that uses OpenXR.
I sort of assumed that OpenXR would allow for getting the raw/undistorted image from the camera directly “out-of-the-box”, but now it seems to me that this is not the case. Note that we don’t want to only use the video-passthrough as background like in the case of using blend or alpha mode. We want to get the image from the camera, transform it in multiple compute passes and present it to the device.
An application to apply styles, such as color mapping and edge rendering, to passthrough.
However, I could not find any general KHR, EXT or Varjo extension for this. Have I missed anything or am I out of luck for now until Varjo implements such an extension? Do you have any idea for a workaround if it is so? My project really depends on this feature.
This is somewhat challenging for privacy reasons, which is why it’s not provided by default. I would suggest looking to see if the FB extension meets your needs, and working with Varjo to implement that or another extension.
I see. If anyone else is looking at how to do it, for now, I plan to use Varjo SDK for accessing the video stream, then process the frames (possibly using OpenCL) and use their OpenXR runtime to present the frames back to the headset. I’ll report back on the results. I think delay might be an issue.
Is there any information available on what the privacy challenges are with this? It seems most platforms using passthrough cameras already have user accessible privacy settings for them.
Either way, a way to access the camera data, as well as any generated depth data and projection parameters would be very useful for many applications. For displaying passthough to the user, the current way, even with the FB extension, severely limits the ways the camera image can be modified and composited.
There’s some ongoing work on providing access to more “processed” data about the environment.
OpenXR has no internal concept of privacy preferences or permission preferences. (The closest is I think in the gaze extension, where it mentions that the underlying platform might have permission controls that are outside the scope of the spec.) Before providing raw image access, we’d want to fill that gap, which is a very large task that at least I am not eager to begin (far easier to do wrong than right). Additionally, some vendors might not ever feel comfortable providing raw image access even with permission prompts, etc. meaning it might be a lot of work for a spec few would implement.
Just as with e.g. interactions, if you tell us (in this case, the WG) your high level goals, we might be able to come up with a better (in this case, privacy-preserving and more likely to be widely implemented) solution Hope this helps clarify things.
Thanks for the reply! Good to know there is work being done on this.
As for use cases, being able to access the camera pixels directly in fragment shaders could allow some graphical effects to be applied directly to the output, such as the refraction effect in this example I implemented with the OpenVR camera interface: https://twitter.com/rectus_sa/status/1445302471170183170
The might also be applications in computer vision, for example feeding the camera data into object recognition algorithms. Another may be multiplayer applications where users could share the passthrough view with other users. These use cases are just speculation though.
I’m pointed to this discussion recently due to some customer want to access the Photo&Video camera on HoloLens 2 and hoping this thread can inspire some discussion on such extension in OpenXR API surface.
If we move forward with a “camera pixels input” extension going forward on openxr, I’d like to point out that the camera inputs are not always “passing through” as on Vajro or Quest. On HoloLens2, the camera is offset from the display’s perspective, and need to be located separately instead of assuming the input pixels can overlay/underlay on the virtual pixels app is rendering into.
I have some experience with implementing in-application passthrough using the tracked camera interface in OpenVR. The interface provides the undistorted camera frames along with projection matrices and the HMD pose for the frame exposure time. This is enough information to get fixed depth passthrough working, although it is very difficult to set up (at least in my experience).
I use the pose and projection data, along with the HMD MVP matrices for the current frame to calculate a 3x3 matrix to transform from screen space to the frame texture space, using homogeneous coordinates since the transform isn’t linear in 2D space when the camera isn’t coplanar with the screen. I generate two of these matrices for different projection depths and interpolate between them to be able to sample at different depths for different parts of the screen (not sure if this produces fully correct results). The matrices can either be used directly in the fragment shader to produce UVs to sample the frame texture, or in the vertex shader to output a per-vertex float3, and let the rasterizer interpolate it.
Depending on how high level data is desired, an OpenXR interface could provide the same data as the OpenVR interface does, or something simplified like the per-frame 3x3 matrices I described above.
The OpenVR interface doesn’t provide depth information in any way, even though SteamVR calculates it for their Room View 3D feature. For correct projections outputting depth would be required, either as a texture or mesh.
Disclaimer: I’m not a graphics engineer, so I’ve likely got some things wrong.