AI to Associate Objects and Spoken Words

MIT scientists believe that this approach will simplify the automatic translation between several languages
21 September 2018   330

Scientists from the Massachusetts Institute of Technology’s Computer Science and Artificial Intelligence Lab published a report on a new model of machine learning, which is able to compare objects on the image with their voice description. As a basis, the researchers took the work of 2016 and improved it by teaching it to combine certain spectrograms of the voice with certain fragments of pixels. Engineers hope that in the future their model will be useful in simultaneous translation.

The MIT algorithm is based on two convolutional neural networks. The first divides the image into a grid of cells, and the second composes a voice spectrogram - a visual representation of the frequency spectrum - and also breaks it into segments in a single word length. Then the system compares each cell of pixels with a segment of the spectrogram and considers the degree of similarity. Based on this parameter, the neural network determines which pair "object-word" is correct and which is not.

We wanted to do speech recognition in a way that’s more natural, leveraging additional signals and information that humans have the benefit of using, but that machine learning algorithms don’t typically have access to.

David Harwath

Researcher, CSAIL 

After studying the database of 400,000 images, the system was able to match several hundred words with objects. After each iteration, it narrowed the matching parameter to determine specific words associated with specific objects.

MIT believes that this approach will simplify the automatic translation between several languages, since it does not require a text description of objects.

Image recognition systems and voice are already coping with their task, but they require a lot of resources for this. In April 2018, Google announced a development competition in the field of deep networks and computer vision on smartphones. It is designed to find ways to optimize the operation of real-time recognition systems.

Intel to Present Neural Compute Stick 2

Neural Compute Stick 2 is an autonomous neural network on a USB drive
15 November 2018   115

At the Beijing conference, Intel introduced Neural Compute Stick 2, a device that facilitates the development of smart software for peripheral devices. These include not only network equipment, but also IoT systems, video cameras, industrial robots, medical systems and drones. The solution is intended primarily for projects that use computer vision.

Neural Compute Stick 2 is an autonomous neural network on a USB drive and should speed up and simplify the development of software for peripheral devices by transferring most of the computation needed for learning to the specialized Intel Movidius Myriad X processor. Neural Compute Engine, responsible for the high-speed neural network of deep learning.

The first Neural Compute Stick was created by Movidius, which was acquired by Intel in 2016. The second version is 8 times faster than the first one and can work on Linux OS. The device is connected via a USB interface to a PC, laptop or peripheral device.

Intel said that Intel NCS 2 allows to quickly create, configure and test prototypes of neural networks with deep learning. Calculations in the cloud and even access to the Internet for this is not needed.

The module with a neural network has already been released for sale at a price of $ 99. Even before the start of sales, some developers got access to Intel NCS 2. With its help, projects such as Clean Water AI, which use machine vision with a microscope to detect harmful bacteria in water, BlueScan AI, scanning the skin for signs of melanoma, and ASL Classification, real-time translates sign language into text.

Over the Movidius Myriad X VPU, Intel worked with Microsoft, which was announced at the Developer Day conference in March 2018. The AI ​​platform is expected to appear in upcoming Windows updates.