OpenAI’s Next Act? OpenAI sets its sights on autonomous digital assistants.

Reading time
2 min read
OpenAI’s Next Act?: OpenAI sets its sights on autonomous digital assistants.

OpenAI is focusing on autonomous agents that take action on a user’s behalf.

What’s new: The maker of ChatGPT is developing applications designed to automate common digital tasks by controlling apps and devices, The Information reported.

How it works: OpenAI has two agent systems in the works. It has not revealed any findings, products, or release dates.

  • One system is designed to automate the use of business software such as accounting and contact management systems. The other performs web-based tasks such as collecting information on a particular topic or booking travel arrangements. 
  • A user would enter a prompt, such as a request to transfer data from a document to a spreadsheet or fill out expense reports and transfer them to accounting software. The agent would respond by moving cursors, clicking buttons, selecting or entering text, and so on.
  • In November, OpenAI introduced the Assistants API, designed to help developers build agent-like assistants that follow instructions to automate certain tasks. In 2022, it published research describing an agent that used a keyboard and mouse to play the video game Minecraft after being trained on video of humans playing the game.

Behind the news: Agents are on Silicon Valley’s radar, especially since January’s Consumer Electronics Show debut of the Rabbit R1, which accepts voice commands to play music, order food, call a car, and so on. Several other companies, academic labs, and independent developers are pursuing the concept as well.

  • Sierra, a startup cofounded by OpenAI chairman Bret Taylor, is creating conversational agents for businesses that can take actions like tracking packages, exchanging products, and resolving issues on a customer’s behalf.
  • Longtime Google researchers Ioannis Antonoglou, Sherjil Ozair, and Misha Laskin recently left the company to co-found a startup focused on agents.
  • Google, Microsoft, and other companies are exploring similar technologies that enable agents to move or edit files and interact with other agents, The New York Times reported.
  • The Browser Company recently announced that its browser Arc would integrate agents to find and deliver videos, recipes, products, and files from the internet.
  • Adept offers a system that monitors a user’s actions and can click, type, and scroll in a web browser in response to commands. (ACT-1 is available as an alpha test via waitlist.)

Why it matters: Training agents to operate software designed for humans can be tricky. Some break down tasks into subtasks but struggle with executing them. Others have difficulty with tasks they haven’t encountered before or edge cases that are unusually complex. However, agents are becoming more reliable in a wider variety of settings as developers push the state of the art forward.

We’re thinking: We’re excited about agents! You can learn about agent technology in our short course, “LangChain for LLM Application Development,” taught by LangChain CEO Harrison Chase and Andrew. 


Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox