How to build robots that respond to novel situations? When prior experience is limited, enabling a model to describe its uncertainty can enable it to explore more avenues to success.
What’s new: In reinforcement learning, meta-learning describes teaching a model how to complete multiple tasks, including tasks the model hasn’t seen before. One way to approach meta-learning is to divide it into two subproblems: creating a plan based on current surroundings and the task at hand, and taking action to implement the plan. Stanford researchers developed deep learning models that facilitate the planning phase by learning to generate better representations of the task.
Key insight: Deep learning has been used to learn vector descriptions of the initial state prior to accomplishing a task and the final state afterward. The new work uses probabilistic descriptions, allowing more flexibility in novel tasks. For example, instead of having to choose between the contradictory descriptions object 1 is on object 2 and object 2 is on object 1, the network updates its confidence in each statement throughout the planning steps.
How it works: Previous methods use a neural network model as a classifier to decide state descriptions from potential configurations. Instead, De-An Huang and his colleagues use the model’s confidence in each potential configuration to represent states. This approach produces a probabilistic description of current and final states.
- For training, the model takes a set of demonstrations of similar tasks plus the actions available to the planning algorithm. For testing, it takes a single demonstration of a novel task, the initial state, and the allowed operations.
- For both initial and final states, a network is trained to predict the probability that certain configurations are observed. For example, based on an image, learn the probability that object 1 is on top of object 2.
- The planning algorithm takes the probabilistic descriptions and selects the action most likely to move the initial state closer to the final state. Since the choice is a function of the state descriptions and potential operations, the planning algorithm requires no training.
Results: The authors’ approach achieves state-of-the-art meta-learning performance in sorting objects and stacking blocks. When sorting, it matches performance based on human heuristics. When stacking, it outperforms human heuristics plus fixed state descriptions with less than 20 training examples (although the heuristics win with 30 training examples).
Yes, but: The researchers achieved these results in tasks with a small number of operations and potential state configurations. Their method likely will struggle with more complex tasks such as the Atari games that made deep reinforcement learning popular.
Takeaway: In past models, misjudgments of surroundings and goals tend to accumulate, leading the models far from the intended behavior. Now, they can relax their fixed state descriptions by representing potential points of confusion as probabilities. This will enable them to behave more gracefully even with little past experience to draw on.