Facebook to Open QNNPACK's Source Code

The technology, designed for AI on mobile devices, is used in Facebook applications for image processing
30 October 2018   1133

Facebook has released the open source code for the QNNPACK AI library (Quantized Neural Network PACKage). It is designed for AI on mobile devices. The technology is used in Facebook applications for image processing. Since the computing power of mobile devices is lower than that of data processing servers, the developers used the latest advances in neural networks to keep the system performance at the proper level.

The library architecture is based on the convolutional neural network. Such a network is considered the most suitable for the recognition of visual images. To improve productivity, engineers applied the modified im2col memory matrix transformation and memory transformation technologies.

The im2col technology is a breakdown of the processed image into columns-vectors by the number of incoming channels. The creators of QNNPACK finalized this system by including a bypass buffer.

The bypass buffer contains pointers to rows of incoming pixels that should be involved in the calculation of outgoing. Using matrices, developers were able to reduce the buffer size relative to standard im2col implementations.

Facebook engineers refined the even-distributed convolution algorithm (depthwise convolution) by adding batch processing of 3 × 3 groups. For packet computing, general purpose registers (GPR) are used. The 3 × 3 convolution processing requires 18 registers (9 incoming and 9 for the filter), while the 32-bit ARM core architecture supports only 14. But since the filter remains unchanged during processing, the developers were able to reduce the resources needed for its storage to single register.

In the QNNPACK library, other advanced approaches to optimizing neural networks are implemented, for example, low-precision calculations.

The tool is published as part of the PyTorch 1.0 framework, released in early October 2018.

MelNet Algorithm to Simulate Person's Voice

It analyzes the spectrograms of the audio tracks of the usual TED Talks, notes the speech characteristics of the speaker and reproduces short replicas
11 June 2019   312

Facebook AI Research team has developed a MelNet algorithm that synthesizes speech with characteristics specific to a particular person. For example, it learned to imitate the voice of Bill Gates.

MelNet analyzes the spectrograms of the audio tracks of the usual TED Talks, notes the speech characteristics of the speaker and reproduces short replicas.

Just the length of the replicas limits capabilities of the algorithm. It reproduces short phrases very close to the original. However, the person's intonation changes when he speaks on different topics, with different moods, different pitches. The algorithm is not yet able to imitate this, therefore long sentences sound artificially.

MIT Technology Review notes that even such an algorithm can greatly affect services like voice bots. There just all communication is reduced to an exchange of short remarks.

A similar approach - analysis of speech spectrograms - was used by scientists from Google AI when working on the Translatotron algorithm. This AI is able to translate phrases from one language to another, preserving the peculiarities of the speaker's speech.