Researchers proposed a way for robots to find objects in households where things get moved around.
What's new: Andrey Kurenkov and colleagues at Stanford University introduced Node Edge Predictor, a model that learned to predict where objects were located in houses.
Key insight: A popular way to represent objects and their locations is a graph, in which each node is either an object or its location and an edge connects the two. If we want to track objects over time, a recurrent model could predict the locations of objects using a separate graph for each time step, but that would require a prohibitive number of graphs. Instead, a model can predict locations using a single graph in which each edge is annotated, additionally, with the time elapsed since the associated object was seen in the associated location. The model learns to predict the next most likely place to find an object based on the object’s most recent, frequent, and longstanding locations.
How it works: The authors simulated a robot looking for things in a household. They built (i) a simulator of houses, object locations, and when and where they moved; (ii) a graph that represented a house containing objects; and (iii) a machine learning system that predicted where objects might be found.
- The simulator presented a household in which objects moved randomly over time — as if people were moving them — according to predefined probabilities. For example, a mug might move from a cabinet to a table. At each time step, a simulated robot observed one piece of furniture and the objects on or inside it.
- The robot represented its observations as a graph. The nodes included rooms, furniture, and objects, while edges connected each object to every piece of furniture and every piece of furniture to a room. The node and edge embeddings represented the robot’s past observations; for example, where it last saw the mug, time elapsed since that observation, and how many times it had seen the mug there.
- The authors simulated the robot moving through 100 households with various floor plans. They built a training set of 10,000 graphs.
- They trained the machine learning system to predict whether an object was on/in a given piece of furniture (that is, whether an edge connected a given object and location at the current timestep). The system embedded previously observed nodes and edges using a separate vanilla neural network for each, concatenated the embeddings, and fed them to a graph neural network followed by a two-layer transformer. A vanilla neural network at the end of the transformer generated probabilities for all edges that connected a given object to various pieces of furniture.
Results: The authors tested their system’s ability to find a single object in a house versus a few baseline methods. The baselines included random guessing, always guessing the piece of furniture where the object was last seen, and a Bayesian model that guessed whether the object was on/in a given piece of furniture based on the percentage of times it had been seen there. On average, their system found the object in 3.2 attempts, while the next best model (Bayesian) took 3.6 attempts. Guessing the last-seen location required 6.0 attempts, and random guessing required 8.8 attempts.
Why it matters: Feature engineering helps to find a good way to represent data so a model can learn from it. In this work, engineering time-related features (such as the time elapsed since an object was on a piece of furniture or the number of times an object was observed on a piece of furniture over time) enabled a non-recurrent model to learn how graphs change over time.
We’re thinking: A physical robot likely would use object detection on its camera feed instead of a simulator that told it directly which objects were associated with which pieces of furniture. We look forward to future work that proves the concept using this more realistic setup.