Tight Fit

Published

Jul 03, 2019

Reading time

2 min read

If you’re pointing out an object, you don’t describe the background. Yet most object detection algorithms focus on a rectangle surrounding the object, not its precise boundary. New research offers a way to turn those boxes into tightly fitted curves.

What’s new: Researchers at the University of Toronto and Nvidia achieved state-of-the-art accuracy in boundary detection with Semantic Thinning Edge Alignment Learning. STEAL is a new approach that augments existing boundary detection networks.

Key insights: Human-drawn boundaries are often imprecise because people are impatient and emphasize quantity over quality. But we can use them as a starting point.

STEAL overcomes human inaccuracy by learning to infer the true boundary from a hastily hand-drawn version.
It pushes precision higher with a so-called thinning layer. This layer replaces a wide predicted boundary with a narrower version.

How it works: Given a human-drawn boundary, STEAL predicts the boundary a human would draw given more time. Then, given a boundary detection network, STEAL forms a new network by appending a thinning layer to the original network’s output. The new network is trained to reconstruct STEAL’s inferred boundaries, not the human-drawn ones.

During training, STEAL learns to infer boundaries from human annotations while holding constant the parameters, and thus the output, of the boundary detection network.
At the same time, the boundary detection network learns to predict STEAL's inferred boundaries.
STEAL learns to infer boundaries by optimizing an equation describing all possible boundaries arising from the human-drawn version.
Without STEAL, boundary detection networks confidently predict boundary pixels that aren’t part of the true boundary but are adjacent to it. STEAL's thinning layer works by identifying such pixels by examining directions perpendicular to the predicted boundary along every pixel in the boundary.
A separate classifier is used to determine how far to move each point along the boundary detection network's predicted boundary, in a direction perpendicular to this predicted boundary.

Why it matters: STEAL achieves a new state of the art, finding boundaries up to 18.6 percent more precise than its predecessor, CASENet SEAL, on hand-drawn and refined test sets. Looking at the output confirms that STEAL produces tight, accurate boundaries.

Takeaway: Object detection has a multitude of uses: image caption generation, face detection, and autonomous navigation to name a few. All these tasks have shown impressive results with object detection based on bounding boxes. Using STEAL’s more precise boundaries could reduce errors and further boost accuracy in these fields.

Subscribe to The Batch