Google to Start New AI Services Test

Artificial intelligence uses neural networks of AutoML to recognize human speech and translate texts, as well as search for objects on images
26 July 2018   524

Google reported on the start of testing of new tools based on machine learning. Artificial intelligence uses neural networks of cloud service AutoML to recognize human speech and translate texts, as well as search for objects on images. In addition, the company launched alpha testing of tensor processors.

Google's goal was to provide machine learning for companies and developers who lack knowledge or resources to solve problems. Therefore AI learns to recognize human speech and translate texts. These skills are taught in the AutoML Natural Language and AutoML Translate services, respectively.

AI is empowerment, and we want to democratize that power for everyone and every business — from retail to agriculture, education to healthcare. AI is no longer a niche in the tech world — it’s the differentiator for businesses in every industry. And we’re committed to delivering the tools that will revolutionize them.

Fei-Fei Li

Chief scientist, Google AI

In addition to these tools, Google introduced:

  • Cloud Vision API, which learns to recognize handwriting from PDF and TIFF files. It also determines the location of the object in the image.
  • AI Contact Center is a tool designed for telephone conversations with subscribers. During the call, it recognizes human speech and tries to solve the problem. In case of failure, the AI ​​redirects the subscriber to the human operator (in Google it is called "agent's assist") and reports the information received.
  • Alpha testing of the third generation of tensor processors.

The company seeks to increase the presence of AI in all spheres of life in order to simplify it and direct it to development. Cloud service AutoML appeared in January 2018, and six months later beta testing began.

Google to Create Accurate Online Speaker Diarization Tool

The development is based on a recurrent neural network
14 November 2018   51

Google reported on the creation of an innovative diarization algorithm - dividing the incoming audio stream into homogeneous segments in accordance with the belonging of words to a particular person. The company claims that the technology created is more efficient than previously known.

The development is based on a recurrent neural network (RNN). This architecture allows the use of internal memory for processing sequences of arbitrary length and is well suited for working with split audio. In the development of Google for each speaker stands out a separate copy of the RNN, isolating the statements.

Google experts note that their algorithm is completely transparent and controllable, which allows you to adjust the processing of the audio stream.

The developers tested the effectiveness of the new diarization algorithm using the NIST SRE 2000 CALLHOME test. The determination error was 7.6%. The previously used methods of clustering and selection using a neural network showed an error of 8.8% and 9.9%, respectively. In addition to fewer errors, the algorithm has sufficient performance to process the stream in real time.

The definition of replica ownership is an important component of the speech recognition system. Correct diarization allows to adapt better to the peculiarities of pronunciation and accent and to qualitatively separate the statements of different people. The technology will be used, in particular, in creating subtitles for video recordings. Properly recognized speech is easier to translate into other languages, which, for example, would be useful for online training courses. And the ability to process sound in real time will allow you to do it even live.