Facebook to Open PyText Source Code

NLP-library (Natural Language Processing - processing of natural speech) is used in neural networks for the processing of written and oral speech
17 December 2018   960

Facebook has opened the source code for PyText, a library for processing spoken and written language. According to the developers, such a move should accelerate the development of the project.

NLP-library (Natural Language Processing - processing of natural speech) is used in neural networks for the processing of written and oral speech. According to the developers, the tool is useful for document classification, speech sequence marking, semantic analysis, and multitasking modeling.

The structure of the library makes it easy to move from the development of an NLP system to practical application. The company's engineers claim that using the PyText implementation of the neural network model that recognizes human speech will take only a few days.

Library features:

  • PyText is based on PyTorch, a framework with a developed ecosystem, so models created using the NLP library are easy to publish.
  • The tool includes several ready-made models. The structure of PyText allows you to modify them with little effort, which simplifies development.
  • Developers have included special models in the library that use the context of speech to better recognize the essence of statements. They are tested on datasets using the M Suggestions tool (one of the helper functions) in Facebook Messenger.
  • PyText can conduct distributed training, as well as work with several models at the same time.
  • Integration with the PyTorch framework allows the library to convert models to ONNX and use the Caffe2 engine to export them.
  • Scaling your own models in PyTorch is limited due to the multithreading limit of the Global Interpreter Lock principle in Python.
  • Exported models allow you to use C ++ features to improve performance.

The company is already using PyText in practice. According to the developers, the models created with its help make more than a billion predictions on Facebook every day. The opening of the source code and a free license should attract independent specialists to the improvement of the tool. At the same time, the company's engineers are not eliminated from further developing the system. They intend to focus on the use of its capabilities in the field of mobile devices.

Get more info at GitHub

MelNet Algorithm to Simulate Person's Voice

It analyzes the spectrograms of the audio tracks of the usual TED Talks, notes the speech characteristics of the speaker and reproduces short replicas
11 June 2019   311

Facebook AI Research team has developed a MelNet algorithm that synthesizes speech with characteristics specific to a particular person. For example, it learned to imitate the voice of Bill Gates.

MelNet analyzes the spectrograms of the audio tracks of the usual TED Talks, notes the speech characteristics of the speaker and reproduces short replicas.

Just the length of the replicas limits capabilities of the algorithm. It reproduces short phrases very close to the original. However, the person's intonation changes when he speaks on different topics, with different moods, different pitches. The algorithm is not yet able to imitate this, therefore long sentences sound artificially.

MIT Technology Review notes that even such an algorithm can greatly affect services like voice bots. There just all communication is reduced to an exchange of short remarks.

A similar approach - analysis of speech spectrograms - was used by scientists from Google AI when working on the Translatotron algorithm. This AI is able to translate phrases from one language to another, preserving the peculiarities of the speaker's speech.