Let’s get this out of the way: A brain is not a cluster of graphics processing units, and if it were, it would run software far more complex than the typical artificial neural network. Yet neural networks were inspired by the brain’s architecture: layers of interconnected neurons, each of which computes its own output depending on the states of its neighbors. The resulting cascade of activity forms an idea — or recognizes a picture of a cat.
From biological to artificial: The insight that the brain learns through interactions among neurons dates back to 1873, but it wasn’t until 1943 that American neuroscientists Warren McCulloch and Walter Pitts modeled biological neural networks using simple mathematical rules. In 1958, American psychologist Frank Rosenblatt developed the perceptron, a single-layer vision network implemented on punch cards with the intention of building a hardware version for the United States Navy.
Bigger is better: Rosenblatt’s invention recognized only classes that could be separated by a line. Ukrainian mathematicians Alexey Ivakhnenko and Valentin Lapa overcame this limitation by stacking networks of neurons in any number of layers. In 1985, working independently, French computer scientist Yann LeCun, David Parker, and American psychologist David Rumelhart and his colleagues described using backpropagation to train such networks efficiently, following earlier work by Finnish mathematician Seppo Linnainmaa and American social scientist Paul Werbos. In the first decade of the new millennium, researchers including Kumar Chellapilla, Dave Steinkraus, and Rajat Raina (with Andrew Ng) accelerated neural networks using chips designed to accelerate computer graphics, or graphical processing units. This development has enabled ever-larger neural networks to learn from the immense amounts of data generated by the internet.
Fit for every task: The idea behind a neural network is simple: For any task, there’s a function that can perform it. A neural network constitutes a trainable function by combining many simple functions, each executed by a single neuron. A neuron’s function is determined by adjustable parameters (also called weights). Given random parameter values and examples of inputs and their desired outputs, it’s possible to alter those values iteratively until the trainable function performs the task at hand.
- A neuron accepts various inputs (for example, numbers representing a pixel or word, or the outputs of the previous layer), multiplies them by its parameter values, adds the products, and feeds the sum through a nonlinear function, or activation function, chosen by the developer. Consider it linear regression plus an activation function.
- Training modifies the parameter values. For every example input, the network computes an output and compares it to the expected output. Backpropagation uses gradient descent to change the values to reduce the difference between actual and expected outputs. Repeat this process enough times with enough (good) examples, and the network should learn to perform the task.
Black box: While a trained network may perform its task, good luck determining how. You can read the final function, but often it’s so complex — with thousands of variables and nested activation functions — that it’s exceedingly difficult to explain how the network succeeded at its task. Moreover, a trained network is only as good as the data it learned from. For instance, if the dataset was skewed, the network’s output will be skewed. If it included only high-resolution pictures of cats, there’s no telling how it would respond to lower-resolution images.
Toward common sense: Reporting on Rosenblatt’s Perceptron in 1958, The New York Times called it “the embryo of an electronic computer that the United States Navy expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence.” While it didn’t live up to that billing, it begot a host of impressive models: convolutional neural networks for images; recurrent neural networks for natural language processing; and transformers for images, text, speech, video, protein structures, and more. These models have done amazing things, exceeding human-level performance in playing Go and approaching it in practical tasks like diagnosing x-ray images. Yet they still have a hard time with common sense and logical reasoning. Ask GPT-3, “When counting, what number comes before a million?” and it may respond, “Nine hundred thousand and ninety-nine comes before a million.” To which we reply: Keep learning!