A team of researchers from DeepMind has developed a new architecture that allows neural networks to perform number-related tasks more efficiently. It involves the creation of a module with the basic mathematical operations described in it. The module was named Neural Arithmetic Logic Unit (NALU).
Scientists have noticed that neural networks are rarely able to successfully generalize concepts beyond the data set on which they were trained. For example, when working with numbers, models don't extrapolate the data to high-order quantities. After studying the problem, the researchers found that it also extends to other arithmetic functions.
When standard neural architectures are trained to count to a number, they often struggle to count to a higher one. We explored this limitation and found that it extends to other arithmetic functions as well, leading to our hypothesis that neural networks learn numbers similar to how they learn words, as a finite vocabulary. This prevents them from properly extrapolating functions requiring previously unseen (higher) numbers. Our objective was to propose a new architecture which could perform better extrapolation.
Lead researcher, NALU
The structure with NALU suggests predetermining a set of basic, potentially useful mathematical functions (addition, subtraction, division and multiplication). Subsequently, the neural network itself decides where these functions are best used, rather than figuring out from scratch what it is.
The tests showed that neural networks with a new architecture are capable of learning tasks such as tracking time periods, performing arithmetic operations on image numbers, counting objects on a picture, and executing computer code.
In March 2018, DeepMind introduced a new paradigm for learning AI models. Unlike standard methods, it does not require a large set of input data: the algorithm learns to perform tasks independently, gradually mastering the necessary skills.