Researchers aiming to increase accuracy in object detection generally enlarge the network, but that approach also boosts computational cost. A novel architecture sets a new state of the art in accuracy while cutting the compute cycles required.
What’s new: Mingxing Tan, Ruoming Pang, and Quoc Le at Google Brain modified existing feature pyramid networks to create the lightweight Bi-Directional Feature Pyramid Network. BiFPN is the cornerstone of a new object detection architecture called EfficientDet.
Key insight: A typical feature pyramid network includes a pretrained image processing network that extracts features of various sizes and combines the information. Some break large features into smaller ones, while others connect smaller features to identify larger ones. BiFPN improves accuracy by using both techniques and increases efficiency by reducing the number of connections.
How it works: An EfficientDet network includes an EfficientNet to extract features, BiFPNs, and classifiers to identify bounding boxes and class labels.
- BiFPNs create both top-down and bottom-up connections between differently sized features.
- Each BiFPN can also function as an additional layer, so the output of one can feed another. Stacking BiFPNs in this way makes it easier for the network to learn.
- The BiFPNs apply a learnable weight to features of different sizes. The weighting enables them to avoid focusing disproportionately on the larger features.
- The researchers remove network nodes that have only one input, eliminating connections that have little impact on the output.
Results: On the COCO object detection benchmark, the largest EfficientDet network tested topped 51 percent mean average precision, which measures the accuracy of bounding boxes. That score beat the previous state of the art by 0.3 percent, yet EfficientDet had only a quarter the parameters and required 1/13 the calculations of the previous state of the art.
Why it matters: Object detection continues to advance, driven by a steady stream of new innovations. EfficientDet represents two steps forward: an improvement in both accuracy and efficiency.
We’re thinking: Google’s AmoebaNet image classifier, which was designed by a computer, usually outperforms human-designed models. Yet humans crafted the record-setting EfficientDet architecture. Flesh-and-blood engineers still excel at crafting neural networks — for now.