Voice Assistant to Recognize Voiceless Commands

Technology, based on neural network, can be used in public places without the risk of disturbing others
22 October 2018   681

Developers from Tsinghua University have developed a voice assistant for smartphones that recognizes commands from the user's lip movements. This technology can be applied in public places without the risk of disturbing others.

Yuanchun Shi and colleagues presented an article at the UIST 2018 conference in which they described lip recognition technology and its translation into text. Such a voice assistant uses the front camera and the convolutional neural network. The algorithm tracks 20 control points that accurately describe the shape of the lips, and also determines how open the user's mouth is. This allows you to recognize the beginning and end of the command. The second algorithm decrypts the data. In this case, while all the calculations occur separately on a powerful PC.

For recognition, a limited set of commands is used — a total of 44, which apply to both individual applications and specific functions, such as turning Wi-Fi on and off. System-wide tasks are also supported, such as responding to a message or highlighting text.

The developers claim that the average recognition accuracy is 95.5%. It is based on the results of training on the speech of 21 people. Tests were conducted in the Beijing subway. As a result, it turned out that this method is considered more comfortable by users.

So far, the developers do not specify when the new application will appear in the release. However, if a powerful computer is still needed for recognition, it will not happen soon. Or the system will require a permanent connection to the network.

Neural Network to Create Landscapes from Sketches

Nvidia created GauGAN model that uses generative-competitive neural networks to process segmented images and create beautiful landscapes from peoples' sketches
20 March 2019   172

At the GTC 2019 conference, NVIDIA presented a demo version of the GauGAN neural network, which can turn sketchy drawings into photorealistic images.

The GauGAN model, named after the famous artist Paul Gauguin, uses generative-competitive neural networks to process segmented images. The generator creates an image and transfers it to the discriminator trained in real photographs. He in turn pixel-by-pixel tells the generator what to fix and where.

Simply put, the principle of the neural network is similar to the coloring of the coloring, but instead of children's drawings, it produces beautiful landscapes. Its creators emphasize that it does not just glue pieces of images, but generates unique ones, like a real artist.

Among other things, the neural network is able to imitate the styles of various artists and change the times of the day and year in the image. It also generates realistic reflections on water surfaces, such as ponds and rivers.

So far, GauGAN is configured to work with landscapes, but the neural network architecture allows us to train it to create urban images as well. The source text of the report in PDF is available here.

GauGAN can be useful to both architects and city planners, and landscape designers with game developers. An AI that understands what the real world looks like will simplify the implementation of their ideas and help you quickly change them. Soon the neural network will be available on the AI ​​Playground.