Why using SLAM for Virtual Reality?

blog-image

Virtual Reality (VR) is a computer-generated environment that the user perceives through a device known as a headset and where the user can interact with the virtual environment using controllers. This visual, auditory and haptic setup makes the user feel immersed in a virtual world, and is being used in many applications, the main ones being gaming, culture, education and architecture.

Nowadays, the VR market is booming and is demanding for systems that are more affordable for users, require little or no setup and minimise the side effects of VR such as motion sickness, which consists of a dizziness induced by the mismatch between the real and virtual motions.

As a consequence, it is necessary to track as precisely as possible the user’s headset position at any time in order to render the virtual scene accordingly and make VR immersive. The most stable solution today is based on lighthouse technology which enables room scale tracking using external time-of-flight sensors, called base stations. Those base stations sweep the room with multiple synchronous pulses and laser lines looking for photo-diodes embedded on the headset and the controllers. By keeping careful track of the timings between pulses and sweeps, the tracking system can know the position of each element at any instant. The headset and controllers are further equipped with an Inertial Measurement Unit (IMU), whose measurements are fused with the optical measurement from the base stations. This solution provides accurate positions at 1000 Hz and is called “external tracking”.

External tracking requires the user to solidly mount the base stations on a tripod, furniture or walls. It is crucial to maintain them static all the time to avoid jitter. Moreover, the room must be completely covered by the base station sweeps. If the headset is occluded from the base station, we say tracking is lost and the system does not compute position until tracking recovery. In addition to being complicated to set up, this approach is costly to produce.

Visual-Inertial SLAM

Visual-Inertial Simultaneous Location And Mapping (SLAM) is an alternative solution to track a headset and, nowadays, most VR and Augmented Reality (AR) systems are using it. SLAM refers to a class of computer vision and robotic algorithms that use information from cameras and optionally from IMUs to give a precise position of a moving agent with respect to a map at any time.

SLAM applications go beyond VR/AR. Every autonomous vehicle (self-driving cars, flying drones, or even rovers) needs a localisation system that perceives its environment. From there, it can take decisions, refine a navigation plan or control its trajectory or landing. In VR/AR, the agent is the headset, and the cameras and IMU sensors are mounted directly into the headset. The SLAM system builds a map of the room while simultaneously keeping track of the headset and controllers location within this same map.

VI-SLAM based solutions bring practical advantages: it reduces manufacturing cost by using relatively cheap cameras over lasers and improves the user experience by removing the long setup stage. But moreover, having cameras directly mounted on the headset paves the way to:

  • Beyond room-scale VR/AR
  • More interaction between the real and virtual worlds by tracking objects and peoples
  • Realistic rendering of the real world environment in the VR world to boost the user experience
  • And much more possibilities…

SLAM Challenges in VR/AR

Developing SLAM for VR/AR is worth the fight to unlock the full potential of AR/VR experiences. However, it goes without saying that VR/AR is different in many ways from robotic navigation or autonomous driving. VR has constraints and specificities making its SLAM a unique and exciting challenge.

Getting Centimetric Accuracy and Super Fast Recovery

Playing a game in VR requires tracking in challenging situations like rapid motion, especially rotations that can go up to 500 degrees/s. Such speeds lead to serious motion blur on the captured images which need to be compensated at run time. Moreover, daily-life situations generate dynamic interferences (moving people in the field of view for example) that the algorithm should filter out. The tracking system should also minimize the number of track losts but, more importantly, make them undetectable for the user in order to not hamper the experience.

Saving Computational and Power Resources

SLAM is based on images, which should be acquired, preprocessed (for example removing blur) and transferred to the computational unit, which is in charge of position estimation. The estimation is done every millisecond both for the headset and controllers in real-time. AR/VR prohibits estimation with more than 20 ms delay.

Maximizing User Experience

How does a person feel inside a VR/AR experience? Answering this question requires a focus on the user and a shift to perceptual metrics. VR/AR does not exactly target the best accuracy, but the one that makes the users feel well.

Dealing with Low-Cost Sensors

To allow VR/AR to the larger audience, sensor prices are decreasing. Tracking is done with low-cost consumer-level cameras and IMUs, whose measurements are heavily polluted by noises. Hardware synchronization and calibration are online problems to achieve each time the headset is used.

In Arcturus Industries, we are developing real-time positional tracking and 3D perception tools that are based on state-of-the-art robotic and computer vision technologies and turned into the user experience.