For a long time, the prospect of a truly real-time virtual reality (VR) and AR (AR) has been a bit enticingly near, yet with each new technology advancement it’s just a bit beyond reach. However, the good news is that we are moving closer. However, in order for AR as well as VR to truly be immersive, all our senses have to believe that the experience to be real.

The ability to create authentic VR or AR experiences is dependent on how precisely and consistently engineers reproduce the components that make up what we perceive as reality beginning with understanding the human body’s physiology and neurology. We need to understand the multiple sensory signals that are essential in discerning 3D patterns in our environment, and then recreate them using technology within headsets. 

 

Realization based on technology

VR devices block the eyes of the user, providing an immersive environment in which sensors provide a feeling of being present and interaction with objects. VR devices project virtual images over the physical world, with the use of sensory cues to ensure the same experience for both physical and virtual elements. Also referred to as mixed-reality devices, 3D AR systems blend real-world elements in an augmented environment.

Every configuration has its own set of specifications, however common features that are driving these systems forward include real-time 3D tracking and sensing efficient and energy-efficient computation processing, high-quality display and graphics immersive audio, machine learning as well as AI-based algorithms. user-friendly human interfaces and innovative applications.

A vivid visual experience

With the most advanced technology for display and graphics which allow us to render higher-quality digital objects and fit more pixels in smaller areas with more clarity and light than ever before, however there’s a lot more to do. It’s not just about rendering images that look lifelike, but also doing it using a wide enough area of vision (FOV) on tiny display screens that have the necessary visual clues.

Modern high-resolution smartphones display more than 500 Pixels per Inch (PPI). For immersive VR measuring PPI isn’t sufficient. The number of pixels per inch (PPD) from the visual field that is covered by the monitor is more appropriate measurement.

At the center of vision, the average human eye is able to see an angle around one-half of an inch. Each eye has an FOV that is horizontal, which is around 160 degrees and a vertical FOV of approximately 175deg. The two eyes collaborate to create stereoscopic depth perception. They span approximately 120 deg wide and around 135deg of high FOV. This means that we’ll need around 100 pixels (MP) per eye, and around 60 MP to stereo vision to give a visual acuity of 60 PPD. Compare this with the current modern VR headset displays today, which are about 3.5 MP.

Because the manufacturing technology isn’t yet able to allow this level of pixel density, designers have to choose between rendering the key elements of visual images in high resolution. The decision is in accordance with a thorough understanding that the visual process operates.

Eye tracking, foveated rendering and eye tracking

Human vision is restricted to a tiny visual field of around +-1deg around the optical eye’s axis with the fovea being the center. This means that vision is sharpest at the center and blurred at the edges. By using real-time sensors to detect a person’s gaze We can render a greater number of polygons within the central gaze region, concentrating computing power there. This can dramatically reduce the fidelity of graphics (polygon number) elsewhere. The foveated rendering process can dramatically decrease the graphics load and the power consumption.

The human eye is blessed with an abundance of cone photoreceptors located on the fovea. This results in high acuity of the visual the central eye. Photoreceptor density decreases dramatically at the edges, resulting in reduced visual acuity. (Source: E. Bruce Goldstein, “Sensation and Perception”)

Researchers across the globe are examining this issue and designers of devices are experimenting with multi-display designs, that is, a display with high resolution is able to cover foveal vision, while display with a lower pixel count provide peripheral vision. Future display technologies will allow dynamic real-time projection of higher resolution visual content within and around the direction of gaze.

Convergence and accommodation not matching

Another major concern is ensuring that the oculomotor signal is consistent to account the eye’s convergence and accommodation discord. Humans perceive the world stereoscopically by using two eyes aligned with an object. Through the process of accommodation, each eye’s lens alters shape to focus light coming from different depths. The distance that the two eyes join is identical to the distance that each eye can accommodate.

In the current market for VR as well as AR headsets there’s an unbalance between the convergence distance and the accommodation distance. The real-world light is modified by reflections and refractions coming from a variety of sources at different distances. In a headset it is possible to generate all light by one source at a distance. When eyes collide to look at the virtual object the lens shapes of their lenses must continuously adjust to focus the fixed-distance light coming from the screen, resulting in various degrees of mismatch between distances, which can lead to discomfort or disorientation of the eyes.

 

Different approaches are being investigated for various options, including dynamically movable optics, and focus-tunable liquid crystal lenses that change their the focal length when the voltage is altered.

3D spatial audio

To truly experience the audio experience in AR/VR must be in sync with and complement the visual experience to ensure that the position of the sound is perfectly aligned with the image that the user is seeing. In real life the majority of people are able to close their eyes and comprehend what the exact location is of the audio. This is due to the brain’s ability to perceive and translate to the “time of arrival” and the intensity of the sound. It happens instantly and immediately within the actual world, however in VR headsets 3D spatial audio needs to be processed and programmed.

The problem is that every person is different in how they experience sound signals as the spectrum of signals is altered based on various factors such as the size of the head and ear and shape, as well as mass. This is referred to as the head-related transfer function. It is something that modern technology is designed to approximate. In the process of enhancing this function will allow users wearing headsets to detect the sound emanating from virtual objects, and provide accurate spatial signals.

Low-latency inside-out tracker

Monitoring a person’s head movement in real-time is a must in VR/AR. All the time, systems need to be able to identify the location of the headset in 3D space in relation to other objects, all the while ensuring high precision and low latency for rendering and presenting the relevant visual and audio information in accordance with the head’s direction and position, and quickly refresh it when you move.

In the past, VR headsets tracked head motions using “outside in” tracking methods with sensors external to the space were placed by the user within their surroundings. Nowadays, “inside out” tracking offers simultaneous localization and mapping technology, as well as visual-inertial optical odometry. built on a combination of computers and precisely calibrated motion sensors, which allows motion tracking within the headset.

Through “inside out” tracking, modern headsets are able to precisely follow the user’s movements real-time using sensors built into the headset. (Source: Meta)

The biggest challenge to overcome is getting a low latency in motion, which is the time between the start of a user’s movements to the release of photons from the final pixels of the image frame displayed on the screen. This is the duration of the acquisition of sensor data and processing interfaces, graphic computations, interfaces as well as image rendering and display refreshes.

In real life we monitor our head’s movement according to changes in our visual field that we detect by our eye and motion data that is detected by our vestibular sensor system. Long time lags in the VR headset could result in a visual-vestibular discord that can result in disorientation as well as dizziness. These days, VR headsets typically have motion-to-photon lag times between 20 and 40 milliseconds however smooth and perceptual experiences need less than 10 milliseconds.

Human interactions and inputs

The immersive experience requires users to be able to interact realistically by using objects that are virtual. They should be able to grasp and hold the object and the object must react in real-time, following the physical laws.

The latest headsets allow users to select objects using simple hand movements. As the computer vision technology improves through rapid advancements in AI futuristic headsets, they will have more sophisticated gesture control capabilities.

Future-generation headsets will also allow multimodal interaction, in which eye-tracking technology allows users to make decisions by focusing their attention on objects in virtual space, and then move them using hand gestures. As soon because AI technology continues to advance as local processing with low latency becomes a possibility headsets will also be equipped with the ability to recognize voices in real time.

The advancements in computer vision and AI technology facilitate natural interactions between users by using eye gaze, gestures or voice commands. (Source: David Cardinal)

Moving forward

We can today experience some of the mainstream VR and exciting industrial AR applications however they’re not fully immersive. Although the path to VR isn’t immediately clear with billions of investments in related technologies, the possibilities are nearly endless. For instance, McKinsey estimates that the metaverse could generate between $4 trillion and $5 trillion by 2030.

Through constant efforts to conquer technical hurdles by tackling technical hurdles, we can be able replicate real-life experiences using technology, eventually diminishing the difference between the real physical world as well as the digital when we encounter them.

Achin Bhowmik is the head of the Society for Information Display, and also the CTO and executive Vice President for engineering and development of Starkey.

Leave a Reply

Your email address will not be published. Required fields are marked *