MIT to Propose to Teach AI Like a Baby

The machine must independently observe people, “listen” to their conversations and form vocabulary
02 November 2018   806

MIT has developed a parser for artificial intelligence learning language. Its feature is learning through observation - as children do. This is reported by Engadget Com.

The method is not based on clear descriptions of words and concepts, but on the method of “weak control” and passive learning. The machine must independently observe people, “listen” to their conversations and form vocabulary. In the same way, children learn to speak, listening and learning words.

It is assumed that this approach will simplify the accumulation of vocabulary and allow programs and robots to more accurately perceive human speech and respond to it.

People in conversation often use only part of the sentence and violate the rules of grammar. Word analysis 'on the run' is supposed to improve the performance of AI systems and parsers. The parser does not rely on a specific context, and therefore, allows robots to perceive implicitly formulated orders.

The analyzer will help to find out how your child learns the language, which will help not only the developers of robots, but also specialists working with children.

MIT used the passive method of teaching the AI ​​network underlying the parser. Neural networks see video and text descriptions of actions, and the system correlated data and linked words with objects and actions. Researchers used 400 videos.

Scientists argue that the technology is easily scaled, and can be used where voice control or communication with AI is necessary.

MelNet Algorithm to Simulate Person's Voice

It analyzes the spectrograms of the audio tracks of the usual TED Talks, notes the speech characteristics of the speaker and reproduces short replicas
11 June 2019   310

Facebook AI Research team has developed a MelNet algorithm that synthesizes speech with characteristics specific to a particular person. For example, it learned to imitate the voice of Bill Gates.

MelNet analyzes the spectrograms of the audio tracks of the usual TED Talks, notes the speech characteristics of the speaker and reproduces short replicas.

Just the length of the replicas limits capabilities of the algorithm. It reproduces short phrases very close to the original. However, the person's intonation changes when he speaks on different topics, with different moods, different pitches. The algorithm is not yet able to imitate this, therefore long sentences sound artificially.

MIT Technology Review notes that even such an algorithm can greatly affect services like voice bots. There just all communication is reduced to an exchange of short remarks.

A similar approach - analysis of speech spectrograms - was used by scientists from Google AI when working on the Translatotron algorithm. This AI is able to translate phrases from one language to another, preserving the peculiarities of the speaker's speech.