PixelPlayer to Learn How Extract Musical Instrument

Massachusetts Institute of Technology scientists created new neural network
06 July 2018   307

Scientist from MIT managed to create a neural network called PixelPlayer, which is able to indetify and extact the sound of individual musical instruments. The key feature of the development is the use of the method of spontaneous learning. This is reported by Analytics Vidhya.

In similar developments, the method of controlled learning was previously used. As input data, the AI ​​received marked audio files, the manual marking of which required a lot of time. PixelPlayer processes video - this allows to opt out of the preliminary preparation of information. Spontaneous training eliminated the human factor and accelerated the process.

Development involves three algorithms at once. The first processes the video, the second - the audio track, and the third synchronizes the data. PixelPlayer determines the sound pertaining to each pixel in the image. In this way, the neural network detects individual instruments and determines the melody to be released.

After 60 hours of training, the AI ​​was able to recognize with high accuracy individual melodies on new video recordings that had not been shown to it before. According to the developers, PixelPlayer is able to identify up to 20 different tools. This number can be increased by providing additional data for processing. Errors occur about trying to divide class-like instruments, for example, saxophone-alto and tenor.

PixelPlayer has already considerable potential for practical application. With this tool the quality of old live recordings can be improved. Amateur musicians often try to "remove" a certain party aurally, and the development of MIT scientists can simplify this task.

AI to be Used to Create 3D Motion Sculptures

The system developed by the MIT and Berkeley scientists is called MoSculp and is based on artificial inteligence
21 September 2018   119

MoSculp, the joint work of MIT scientists and the University of California at Berkeley, is built on the basis of a neural network. The development analyzes the video recording of a moving person and generates what the creators called "interactive visualization of form and time." According to the lead specialist of the project Xiuming Zhang, software will be useful for athletes for detailed analysis of movements.

At the first stage, the system scans the video frame-by-frame and determines the position of key points of the object's body, such as elbows, knees, ankles. For this, scientists decided to resort to the OpenPose library, developed by the Carnegie Mellon University. Based on the received data, the neural network compiles a 3D model of the person in each frame, and calculates the trajectory of the motion, obtaining a "motion sculpture".

At this stage, the image, according to the developers, suffers from a lack of textures and details, so the application integrates the "sculpture" in the original video. To avoid overlapping, MoSculp calculates a depth map for the original object and the 3D model.

MoSculp 3D Model
MoSculp 3D Model

The operator can adjust the image during the processing, select the "sculpture" material, color, lighting, and also what parts of the body will be tracked. The system is able to print the result using a 3D printer.

The team of researchers announced plans to further develop the MoSculp technology. Developers want to achieve from the processing system more than one object on the video, which is currently impossible. The creators of the technology believe that the program will be used to study group dynamics, social disorders and interpersonal interactions.

The principle of creating a 3D model based on human movements has been used before. For example, in August 2018, scientists at the same University of California at Berkeley demonstrated an algorithm that transfers the movements of one person to another.