In tasks that involve generating natural language, neural networks often map an input sequence of words to an output sequence of words. Facebook researchers used a similar technique on sequences of mathematical symbols, training a model to map math problems to math solutions.**What’s new:** Guillaume Lample and Francois Charton built a sequence-to-sequence model that solves integrals and ordinary differential equations.**Key insight:** To apply machine translation to math, an equation must be represented as a sequence of characters that capture its semantics. A mathematical expression represented as a tree — with operators as internal nodes and operands as leaves — maps unambiguously to a sequence. For example, the image above shows the tree for 2 + 3*(5+2). The corresponding sequence is [+ 2 * 3 + 5 2].**How it works:** The authors used existing math software to generate datasets consisting of (problem, solution) pairs for integrals and ordinary differential equations. For each type of problem, they trained a separate transformer model to predict solutions.

- For function integration, the authors generated three datasets by differentiating a proposed solution, integrating a proposed problem (using SymPy), and integration by parts.
- Similarly, they generated datasets for first- and second-order ordinary differential equations starting with randomly generated functions.
- The models presented their results using a beam search with beam sizes [1, 10, 50]. This allowed them to consider a greater variety of possible solutions before making a final decision.
- Since solutions to problems of these types are easy to verify, the model was able to validate its output. In many cases, all solutions in the beam were equivalent.

**Results:** The transformer model beat Mathematica, Matlab, and Maple on integration for the dataset generated by differentiating the solution (98.4 percent accuracy with beam size 1 compared to 84 percent for Mathematica, the best of those three math apps). It also beat the math software on differential equations with beam sizes 10 and 50. The model solved integration problems in the test set that SymPy couldn’t, showing that it generalized beyond the program used to generate its training dataset.**Why it matters:** Transformer networks can solve problems that dedicated commercial math programs can’t. That said, their solutions may not be 100 percent accurate.**We’re thinking:** Beating Mathematica is a remarkable result. Assuming the data distributions for training and test represented the most common problems in integrals and ordinary differential equations, this approach could open a vast frontier to state-of-the-art machine learning.