As you can read in this issue of The Batch, Microsoft’s effort to reinvent web search by adding a large language model snagged when its chatbot went off the rails. Both Bing chat and Google’s Bard, the chatbot to be added to Google Search, have made up facts. In a few disturbing cases, Bing demanded apologies or threatened a user. What is the future of chatbots in search?
It’s important to consider how this technology will evolve. After all, we should architect our systems based not only on what AI can do now but on where it might go. Even though current chat-based search has problems, I’m optimistic that roadmaps exist to significantly improve it.
Let’s start with the tendency of large language models (LLMs) to make up facts. I wrote about falsehoods generated by OpenAI’s ChatGPT. I don’t see a realistic path to getting an LLM with a fixed set of parameters to both (i) demonstrate deep and broad knowledge about the world and (ii) be accurate most of the time. A 175B-parameter model simply doesn’t have enough memory to know that much.
Look at the problem in terms of human-level performance. I don’t think anyone could train an inexperienced high-school intern to answer every question under the sun without consulting reference sources. But an inexperienced intern could be trained to write reports with the aid of web search. Similarly, the approach known as retrieval augmented generation — which enables an LLM to carry out web searches and refer to external documents — offers a promising path to improving factual accuracy.
Bing chat and Bard do search the web, but they don’t yet generate outputs consistent with the sources they’ve discovered. I’m confident that further research will lead to progress on making sure LLMs generate text based on trusted sources. There’s significant momentum behind this goal, given the widespread societal attention focused on the problem, deep academic interest, and financial incentive for Google and Microsoft (as well as startups like You.com) to improve their models.
Indeed, over a decade of NLP research has been devoted to the problem of textual entailment which, loosely, is the task of deciding whether Sentence A can reasonably be inferred to follow from some Sentence B. LLMs could take advantage of variations on these techniques — perhaps to double-check that their output is consistent with a trusted source — as well as new techniques yet to be invented.
As for personal attacks, threats, and other toxic output, I’m confident that a path also exists to significantly reduce such behaviors. LLMs, at their heart, simply predict the next word in a sequence based on text they were trained on. OpenAI shaped ChatGPT’s output by fine-tuning it on a dataset crafted by people hired to write conversations, and Google built a chatbot, Sparrow, that learned to follow rules through a variation on reinforcement learning from human feedback. Using techniques like these, I have little doubt that chatbots can be made to behave better.
So, while Bing’s misbehavior has been in the headlines, I believe that chat-based search has a promising future — not because of what the technology can do today, but because of where it will go tomorrow.
P.S. Landing AI, which I lead, just released its computer vision platform for everyone to use. I’ll say more about this next week. Meanwhile, I invite you to check it out for free at landing.ai!