AI is good at tracking objects in two dimensions. A new model processes video from a camera with a depth sensor to predict how objects move through space.
What’s new: Led by Chen Wang, researchers from Stanford, Shanghai Jiao Tong University, and Nvidia built a system that tracks objects fast enough for a robot to react in real time: 6D-Pose Anchor-based Category-level Keypoint-tracker (6-PACK). Why 6D? Because three-dimensional objects in motion have six degrees of freedom: three for linear motion and three for rotation. You can see the system in action in this video.
Key insight: Rather than tracking absolution location, 6-PACK tracks an object’s change in position from video frame to video frame. Knowing its position in one frame makes it easier to find in the next.
How it works: The network’s training data is labeled with a six-dimensional vector that represents changes in an object’s location and orientation between frames. From that information, it learns to extract keypoints such as edges and corners, calculate changes in their positions, and extrapolate a new position. Objects in the training data are labeled with a category such as bowl or mug.
- The researchers identify an object’s center in the first frame.
- The model uses that information to generate a cloud of points representing the object.
- Based on the center and point cloud, the network generates a user-defined number of keypoints, essentially a 3D bounding box.
- In each successive frame, the model uses the previous keypoint locations to estimate the center roughly. An attention layer learns to find the center more precisely. Then the network updates the point cloud, and from there, the keypoints.
Results: Tested on a dataset of real-world videos, 6-PACK predicted object position and rotation within 5cm and 5 degrees in 33.3 percent of cases, versus the previous state of the art of 17 percent.
Why it matters: The ability to track objects as they move and rotate is essential to progress in robotics, both to manipulate things and to navigate around them.
We’re thinking: Object tracking algorithms and visual keypoints have a long history stretching beyond the 1960-vintage Kalman filter. Deep learning has come to dominate object recognition, and it’s good to see progress in tasks like tracking and optical flow.