Protein Shapes Revealed A summary of the AlphaFold research paper

Published

Feb 05, 2020

Reading time

2 min read

A protein’s biological function depends largely on its three-dimensional shape, but deducing its shape from its sequence of amino acids has been a longstanding problem. Researchers at DeepMind reveal how they used deep learning to solve the puzzle.

What’s new: Andrew Senior and colleagues released long-awaited details about AlphaFold, a protein-folding model that wowed experts in a high-profile competition in late 2018. The paper is behind a paywall. This video offers some details.

Key insight: Research has shown that protein shapes are determined by the proximity of essential portions, or residues, of amino acids. The researchers found likely shapes by optimizing over possible structures that keep residues close to one another. Earlier methods predict whether residues are in contact with one another. AlphaFold predicts the distances and angles between residues, making the optimization easier.

How it works: AlphaFold extracts features from an input protein sequence, predicts relationships between residues, and uses those predictions to find the protein’s likely shape.

The feature extractor compares the input sequence with sequences in a protein database. It represents relationships between amino-acid pairs based on the similarities it finds.
The features feed a CNN trained on a dataset of 3D protein structures, which predicts the distribution of distances and angles between residues.
The model infers the protein’s physical stability based on the distances and angles. The physical stability equation is differentiable, so the predicted structure can be optimized by gradient descent. The most stable structure is the final output.

Results: At the 2018 CASP13 conference, AlphaFold predicted 24 out of 43 previously unknown protein shapes with high accuracy. The next-best model achieved 14 predictions of similar accuracy.

Why it matters: The ability to determine protein structures could have wide-ranging impacts on drug discovery, countering neurodegenerative diseases, and more. Stay tuned for further progress when CASP14 convenes in April.

We’re thinking: Hard problems don’t always offer enough training data to train an end-to-end neural network. In this case, combining a physical model with neural networks led to significant progress. This design pattern holds promise in many other domains from climate change to robot dynamics.

Subscribe to The Batch