Flare Algorithm to Optimize VR Video Transmission

The algorithm is developed by scientists from Indiana University in Bloomington (USA) and AT&T Labs researchers
31 October 2018   822

At the 2018 MobiCom conference, scientists from Indiana University in Bloomington (USA) and AT&T Labs researchers presented the Flare system. It's able to predict the direction of a person’s gaze while watching a streaming VR video and, based on this, show him or her only the necessary frames.

The researchers implemented the Flare prototype in the form of an Android application for smartphones that can be used as a VR helmet screen.

The resolution, and therefore the size of a 360-degree video, is four times larger than that of ordinary videos for viewing on a monitor. In addition, for a more complete immersion requires online transmission with a minimum speed of 60 frames per second. Modern wireless networks can not afford this.

The system with the help of sensors on the smartphone monitors the movement of the user's head in three dimensions. Using the linear regression method, she predicts a future trajectory a second forward, and Tikhonov’s regularization method allows Flare to “look in” even further. After that, the system calculates how much of the video should come into the person’s field of view and downloads only these frames, but with a small margin in case of an error.

Specific frames are calculated on a 4 by 6 grid. In addition to coordinates, a request to the server also contains an indication of the recording quality, determined by the number of frames and the data transfer rate.

The researchers used data from 130 volunteers who watched a dozen VR-videos for several minutes each for the training of the algorithm. Testing showed that when watching a video using an LTE connection, Flare's system helped increase the picture quality by 22%, while reducing the required network bandwidth by 35%.

MelNet Algorithm to Simulate Person's Voice

It analyzes the spectrograms of the audio tracks of the usual TED Talks, notes the speech characteristics of the speaker and reproduces short replicas
11 June 2019   339

Facebook AI Research team has developed a MelNet algorithm that synthesizes speech with characteristics specific to a particular person. For example, it learned to imitate the voice of Bill Gates.

MelNet analyzes the spectrograms of the audio tracks of the usual TED Talks, notes the speech characteristics of the speaker and reproduces short replicas.

Just the length of the replicas limits capabilities of the algorithm. It reproduces short phrases very close to the original. However, the person's intonation changes when he speaks on different topics, with different moods, different pitches. The algorithm is not yet able to imitate this, therefore long sentences sound artificially.

MIT Technology Review notes that even such an algorithm can greatly affect services like voice bots. There just all communication is reduced to an exchange of short remarks.

A similar approach - analysis of speech spectrograms - was used by scientists from Google AI when working on the Translatotron algorithm. This AI is able to translate phrases from one language to another, preserving the peculiarities of the speaker's speech.