Facebook has opened the source code for PyText, a library for processing spoken and written language. According to the developers, such a move should accelerate the development of the project.
NLP-library (Natural Language Processing - processing of natural speech) is used in neural networks for the processing of written and oral speech. According to the developers, the tool is useful for document classification, speech sequence marking, semantic analysis, and multitasking modeling.
The structure of the library makes it easy to move from the development of an NLP system to practical application. The company's engineers claim that using the PyText implementation of the neural network model that recognizes human speech will take only a few days.
- PyText is based on PyTorch, a framework with a developed ecosystem, so models created using the NLP library are easy to publish.
- The tool includes several ready-made models. The structure of PyText allows you to modify them with little effort, which simplifies development.
- Developers have included special models in the library that use the context of speech to better recognize the essence of statements. They are tested on datasets using the M Suggestions tool (one of the helper functions) in Facebook Messenger.
- PyText can conduct distributed training, as well as work with several models at the same time.
- Integration with the PyTorch framework allows the library to convert models to ONNX and use the Caffe2 engine to export them.
- Scaling your own models in PyTorch is limited due to the multithreading limit of the Global Interpreter Lock principle in Python.
- Exported models allow you to use C ++ features to improve performance.
The company is already using PyText in practice. According to the developers, the models created with its help make more than a billion predictions on Facebook every day. The opening of the source code and a free license should attract independent specialists to the improvement of the tool. At the same time, the company's engineers are not eliminated from further developing the system. They intend to focus on the use of its capabilities in the field of mobile devices.
Get more info at GitHub.