Google to Update Its Speech Service

Cloud Speech-to-Text and Text-to-Speech services received new interesting features
30 August 2018   278

The Google Cloud team announced the stable release of the Cloud Text-to-Speech speech synthesis API with experimental audio profiles function and support for several new languages. Service for decoding audio Cloud Speech-to-Text has learned to recognize different speakers and independently determine the language of the recording from several possible ones.

Along with the transition to a stable working regime, the API for translating written speech into spoken language now supports a number of new languages ​​and voices created with the help of WaveNet technology. In total, 14 languages ​​and dialects are available, which are spoken by 30 standard "voices" and 26 ones that are based on WaveNet.

Audio profile function is available in beta mode. It allows you to automatically optimize the audio file for a particular device: smart watches and other wearable gadgets, smartphones, headphones, conventional and stereo speakers, smart home audio systems, car speakers. You can also set the mode to "default".

Cloud Speech-to-Text API received the function of recognizing speakers by voice. Using machine learning, the system, when transcribing, separates the replicas of different people and marks them with numbers. However, at the beginning of the audio file processing, you need to specify the number of speakers.

Also, the Google Cloud team added the automatic language detection function to the record. Using the API for their applications, the developer can specify up to 4 languages ​​in one query. At the time of writing, the tool supports 120 languages.

Telephone Filter
Telephone Filter

The technology of speech synthesis Google used for a long time only in its own products. For third-party developers it became available in March 2018 with a choice of 32 voices and 12 languages. And the service of decoding the oral speech used to be called the Cloud Speech API, and the current name was received in April 2018, along with new models for analyzing calls and video.

AI to be Used to Create 3D Motion Sculptures

The system developed by the MIT and Berkeley scientists is called MoSculp and is based on artificial inteligence
21 September 2018   119

MoSculp, the joint work of MIT scientists and the University of California at Berkeley, is built on the basis of a neural network. The development analyzes the video recording of a moving person and generates what the creators called "interactive visualization of form and time." According to the lead specialist of the project Xiuming Zhang, software will be useful for athletes for detailed analysis of movements.

At the first stage, the system scans the video frame-by-frame and determines the position of key points of the object's body, such as elbows, knees, ankles. For this, scientists decided to resort to the OpenPose library, developed by the Carnegie Mellon University. Based on the received data, the neural network compiles a 3D model of the person in each frame, and calculates the trajectory of the motion, obtaining a "motion sculpture".

At this stage, the image, according to the developers, suffers from a lack of textures and details, so the application integrates the "sculpture" in the original video. To avoid overlapping, MoSculp calculates a depth map for the original object and the 3D model.

MoSculp 3D Model
MoSculp 3D Model

The operator can adjust the image during the processing, select the "sculpture" material, color, lighting, and also what parts of the body will be tracked. The system is able to print the result using a 3D printer.

The team of researchers announced plans to further develop the MoSculp technology. Developers want to achieve from the processing system more than one object on the video, which is currently impossible. The creators of the technology believe that the program will be used to study group dynamics, social disorders and interpersonal interactions.

The principle of creating a 3D model based on human movements has been used before. For example, in August 2018, scientists at the same University of California at Berkeley demonstrated an algorithm that transfers the movements of one person to another.