Brain-Controlled Robots Get More Versatile NOIR, a system to control robots via electroencephalogram for everyday tasks

Published

May 15, 2024

Reading time

3 min read

Brain-to-computer interfaces that enable users to control robots with their thoughts typically execute a single type of task such as reaching and grasping. Researchers designed a system that responds to a variety of intentions.

What's new: Ruohan Zhang and colleagues at Stanford introduced Neural Signal Operated Intelligent Robots (NOIR). Their method commands a robot to perform practical tasks, such as ironing a cloth or making a sandwich, via signals from an electroencephalogram (EEG), a non-invasive way to measure brain waves via electrodes attached to the scalp.

Key insight: Currently neuroscientists can derive from EEG signals only simple thoughts, such as the intention to move a limb. However, a sequence of simple thoughts can drive an arbitrarily complex action. Specifically, simple thoughts (such as the intention to move a hand) can drive a robot to perform complex actions by repeatedly (i) selecting an object, (ii) selecting an action to apply to the object, and (iii) selecting the part of the object to act upon. For instance, to iron a cloth, the initial sequence would be: (i) select the iron and (ii) grasp it (iii) by the handle. This sequence might be followed by (i) select the cloth and (ii) slide the iron across it (iii) starting at the nearest portion. And so on.

How it works: Users who wore EEG electrodes concentrated on specific sequences of thoughts to execute tasks as they watched a screen that displayed the output of a camera attached to either a robotic arm or wheeled robot with two arms.

Prior to attempts to control a robot, the authors recorded EEG signals to train the system for each individual user. Users spent 10 minutes imagining grasping a ball in their right or left hand, pushing a pedal with both feet, or focusing on a cross displayed on the screen (a resting state). The authors used the resulting data to train two Quadratic Discriminant Analysis (QDA) classifiers for each user.
To enable users to select objects, a pretrained OWL-ViT segmented the camera image to mark individual objects on the screen. Objects available to be manipulated flickered at different frequencies between 6 and 10 times per second. When a user concentrated on an object, the resulting brainwaves synchronized with the frequency of its flickering. The system selected the object that corresponded to the most prominent frequency.
Once the user had selected an object, the system presented up to four possible actions, such as “pick from top,” “pick from side,” and “push.” Each action was accompanied by an image of a right or left hand, feet, or a cross. To select an action, the user imagined using the designated body part or focusing on the cross. Given the EEG signal, one classifier selected the action.
To select a location on the object, the other classifier helped the user to point at it using a cursor. To move the cursor in one direction, the user imagined using one hand. To move it in the opposite direction, the user focused on a cross. The user repeated this process for each of three axes of motion (horizontal, vertical, and depth).
In case the system didn’t read a selection correctly, the user could reset the process by clenching their jaw.
To make the system easier to use, the authors adapted an R3M embedding model to suggest commonly selected objects and actions. R3M was pretrained to generate similar embeddings of paired robot instructions and camera views and dissimilar embeddings of mismatched robot instructions and camera views. The authors added several fully connected layers and trained them on the individual-user data to produce similar embeddings of images from the camera with the same object-action combination and dissimilar embeddings of images with other object-action combinations. Given an image from the camera, the model returned the object-action that corresponded to the most similar image.

Results: Three users controlled the two robots to execute 20 everyday tasks. On average, the system selected objects with 81.2 percent accuracy, actions with 42.2 percent accuracy, and locations with 73.9 percent accuracy. Users took an average of about 20 minutes to complete each task.

Why it matters: Brain signals are enormously complex, yet relatively simple statistical techniques — in this case, QDA — can decode them in useful ways.

We're thinking: Sometimes the simplest solution to a difficult problem is not to train a larger model but to break down the problem into manageable steps.

Subscribe to The Batch