Scientists to Create Deepfakes for Dancing

With help of artificial intelligence, fake dancing videos can now be created
28 August 2018   1398

Four scientists from the University of California at Berkeley developed an algorithm that creates on the basis of a video with a dance a fake record on which another person performs the same movements. For deep processing, it requires a twenty-minute shooting at 120 frames per second.

The technology is based on an algorithm on generative-competitive neural networks. A separate subroutine processes pre-recorded video (source and target) and imposes motions on a simple figure - the frame of the human body.

AI Dancing
AI Dancing
AI Dancing
AI Dancing

Then the algorithm transfers professional movements to the record of amateur dance and "aligns" the final video so that the figure does not strongly "jerk" from frame to frame and the person was where he was supposed to.

Researchers admit that the synthesized video looks realistic though it is not devoid of artifacts: body parts sometimes tremble or even disappear, and some frames look blurred. In addition, the algorithm does not know how to handle the behavior of the tissue when a person moves, so people on the target video wear tight clothing that almost does not form wrinkles.

This type of video processing is called a "deep fake". In mid-August, 2018, experts from the Carnegie Mellon University presented Recycle-GAN, which is capable of recreating the facial expressions of one person on the face of another, modeling the blossoming of the flower and changing the weather on video recordings. A similar result is provided by the application FakeApp, released in January 2018, as well as the algorithms Face2Face and HeadOn.

MelNet Algorithm to Simulate Person's Voice

It analyzes the spectrograms of the audio tracks of the usual TED Talks, notes the speech characteristics of the speaker and reproduces short replicas
11 June 2019   331

Facebook AI Research team has developed a MelNet algorithm that synthesizes speech with characteristics specific to a particular person. For example, it learned to imitate the voice of Bill Gates.

MelNet analyzes the spectrograms of the audio tracks of the usual TED Talks, notes the speech characteristics of the speaker and reproduces short replicas.

Just the length of the replicas limits capabilities of the algorithm. It reproduces short phrases very close to the original. However, the person's intonation changes when he speaks on different topics, with different moods, different pitches. The algorithm is not yet able to imitate this, therefore long sentences sound artificially.

MIT Technology Review notes that even such an algorithm can greatly affect services like voice bots. There just all communication is reduced to an exchange of short remarks.

A similar approach - analysis of speech spectrograms - was used by scientists from Google AI when working on the Translatotron algorithm. This AI is able to translate phrases from one language to another, preserving the peculiarities of the speaker's speech.