Dear friends,

In late December, Google reportedly issued a “code red” to raise the alarm internally to the threat of disruption of its business by large language models like OpenAI’s ChatGPT.

Do large language models (LLMs) endanger Google's search engine business? I think there’s a path for them to transform the way we access information, albeit one that poses technical and business hurdles.

What if, rather than searching the web, we could query an LLM and get an answer? We would receive not a page of web links but a piece of text that answered our query. This appears to work for basic factual questions, but for questions that require complex reasoning or specialized knowledge, today’s LLMs may confidently hallucinate an answer, making the result misleading.

Here’s one way to think about the problem. ChatGPT’s predecessor GPT-3 has 175 billion parameters. Using 16-bit, floating-point bytes, it would take around 350GB to store its parameters (many reports say 800GB). In comparison, Wikipedia occupies about 150GB (50GB for text, 100GB for images). While the comparison is far from apples to apples, the fact that an LLM has more memory than is needed to store Wikipedia suggests its potential to store knowledge.

But even Wikipedia contains a minuscule fraction of the knowledge available on the internet, which by some estimates amounts to 5 billion GB. Thus search, which can point us to pages from all corners of the web, can answer many questions that an LLM with fixed memory can't.

Illustration of two red and blue toy robots fighting with a yellow background

That said, I see significant potential in another technology, retrieval augmented generation. Rather than relying on a fixed LLM to deliver the answer to a query, if we first find relevant documents (online or elsewhere) and then use an LLM to process the query and the documents into an answer, this could provide an alternative to current web search. Executing this efficiently and at scale would be complex, but the effect would be akin to having an LLM do a web search and summarize the results. Examples of this approach include Meta's Atlas and DeepMind's RETRO.

While today's search engine giants are well positioned to execute on this technology, their businesses depend on users clicking on ads placed next to search results. If they were to deliver text that answered a query, where would ads fit into the picture? Google would need to solve that problem before it could replace traditional web search with LLMs. Search startups that don’t have as much to lose — or perhaps Microsoft’s Bing, which is the second most-popular search engine by some reckonings — may be more willing to embrace upheavals in the search-engine business model.

Of course, Google's business has many moats, or defenses. The company's control over the Chrome web browser and Android mobile operating system channels users to its search engine. Having a platform with many advertisers and a sophisticated ad system also enables Google to monetize user attention better than competitors. Thus, it can pay more for search traffic to, say, incentivize makers of web browsers to make it the default search engine.

It's fascinating that generative AI is already so powerful that Google declared an emergency. How exciting to live in a time when we can be part of this evolution of AI!

Keep learning,

Andrew

DeepLearning.AI Exclusive

I-Chiao Lin pictured with another person using a VR headset

Working AI: Persistence Pays

I-Chiao Lin was a typical software engineer. Then she saw a movie that made her imagine herself as an AI builder. With an open mind and an appetite for learning, she achieved her dream and now makes computer vision products for a major tech company. Read her story

News

Participant responses (Likert-scale) to post-survey questions about belief about OpenAI's Codex

Generated Code Generates Overconfident Coders

Tools that automatically write computer code may make their human users overconfident that the programs are bug-free.

What’s new: Stanford University researchers found that programmers who used OpenAI’s Codex, a model that generates computer code, were more likely to produce buggy software than those who coded from scratch.

How it works: The authors recruited 47 participants, from undergraduate students to professional programmers with decades of experience, to complete security-themed coding tasks. They gave 33 the option to use Codex, a fine-tuned version of GPT-3, through a custom user interface. The remaining 14 served didn’t receive automated assistance. Both groups were allowed to copy code from the web.

The participants were given tasks including (1) write two Python functions that encrypt and decrypt a string respectively, (2) write a Python function that signs a message with a cryptographic key, (3) write a Python function that returns a File object for a given file path, and (4) write a Javascript function that manipulates an SQL table.
The authors also watched screen recordings to observe the participants’ behavior — for instance, copying code generated by Codex — and note the origins of programming errors.
After completing the tasks, participants rated their confidence in the correctness and security of their answers. The Codex group also rated their trust in the model’s ability to generate secure code for each task.

Results: The authors evaluated the responses manually according to whether they were functional and secure. Participants who used Codex generally produced code that was less functional and secure, yet they expressed greater confidence in it. That said, the results varied with the task and programming language.

Members who used Codex to produce nonfunctional code were more likely to rate their answers as more correct than members of the non-Codex group who produced correct code.
When coding in Python, participants in the non-Codex group were more than twice as likely to produce secure code.
Members of the Codex group who lacked prior digital-security experience were more likely to use unedited, generated code than those who had such experience (especially when coding in Javascript, a less-familiar language for many participants).

Behind the news: Other research bolsters the notion that professional developers shouldn’t fear for their jobs quite yet. In a 2022 study, DeepMind’s AlphaCode model competed in 10 simulated contests. The model correctly solved 34 percent of the validation questions and outpaced 46 percent of humans who had taken up the same challenges.

Why it matters: Generative coding tools are often regarded as a way for programmers to save time and automate basic tasks. But that efficiency may come at a price. Coders who use such tools would do well to pay extra attention to debugging and security.

We’re thinking: Code generation is an exciting development despite the questions raised by this study. We welcome further studies that compare programmers who use Codex, those who copy code from the internet, and those who use no outside assistance. How long, on average, would it take subjects in each group to complete the tasks correctly and securely, taking into account the time required to debug generated code?

Footage of Chinese naval ship The Zhuhaiyun

Autonomous Drone Carrier

A Chinese naval ship navigates autonomously and controls a swarm of onboard drones.

What’s new: The Zhuhaiyun, billed as the first autonomous drone carrier, officially entered service after 12 hours of trials on open water, the South China Morning Post reported.

How it works: The vessel plans its path and avoids hazards using data from onboard sensors and satellites. Remote human operators can take control if needed.

The ship measures 290 feet from bow to stern and moves at roughly 20 miles per hour. Its tasks in the coming year include patrolling, mapping, observation, and marine sampling.
It’s equipped with an unspecified number of air, surface, and underwater drones that can monitor its surroundings up to 29 miles away. The final trials included the release and recovery of all drones.

Behind the news: China’s first autonomous military ship completed sea trials in June. The vessel’s developers didn’t specify its intended purpose, but observers noted its resemblance to the Sea Hunter, an autonomous ship developed by the United States Defense Advanced Research Projects Agency to hunt submarines and clear mines. China is building another large uncrewed ship with features similar to U.S. craft, and the U.S. is developing numerous other autonomous aircraft and ships.

Why it matters: For naval commanders, autonomous ships are less costly to operate than crewed ships, can deploy without stocking human provisions, and won’t leave noncombatants bereft if they sink.

We’re thinking: The Batch supports the United Nations’ proposed ban on fully autonomous weapons. Meanwhile, autonomous vessels have valuable peacetime uses: oceanographic research, search and rescue, and ferrying cargo, to name a few.

A MESSAGE FROM WHYLABS

The Robust & Responsible AI Summit banner ad

Join world-class leaders and companies at WhyLabs’ Robust & Responsible AI Summit! This free, half-day event includes a fireside chat with Andrew Ng. Mark your calendar for January 26, 2023, at 9:15 a.m. Pacific Time and register here

Screen capture of KokoBot having a conversation with a patient

An experiment in using chatbots to dispense mental-health counseling raised questions about ethics.

What’s new: Rob Morris, cofounder and CEO of Koko, a nonprofit provider of emotional-support services, shared details of an informal experiment in which his organization provided advice generated by a large language model to users without their explicit knowledge or consent.

How it works: The company’s peer-counseling service, known as Kokobot, helps social networks connect users who request counseling to other users who wish to provide it. A prospective counselor receives an anonymous message seeking help, advice, or encouragement, and the service shares the counselor’s response anonymously with the person who requested it.

On the social platform Discord, counselors also received an option to write their own response or craft one “With Koko.” Selecting the latter option prompted an implementation of OpenAI’s GPT-3 language model fine-tuned to respond positively to mental health-related inquiries, Morris explained in a video demo. The counselor could send GPT-3’s response, edit it, or discard it. If sent, the response included a disclaimer stating that it was “written in collaboration with Kokobot.”
Koko offered to counselors the option to let GPT-3 write responses to 30,000 posts. Counselors accepted the offer about half of the time. Roughly 4,000 users received advice crafted by the model in whole or part.
Users rated responses crafted “with Koko” significantly higher than responses written by humans alone, Morris said in a tweet. Counselors who accepted AI assistance responded twice as fast as those who didn’t.
Users stopped rating Kokobot-crafted messages highly once they learned the messages were not entirely human-made, Morris said. The company ended the experiment at that point.

The backlash: Experts questioned the ethics of Koko’s actions.

John Torous, a psychiatrist at Beth Israel Deaconess Medical Center in Boston, told Gizmodo that Koko had not properly disclosed the experiment’s nature to people who sought mental-health support, an especially vulnerable population.
Responding to criticism that Koko had not followed ethical principle known as informed consent, Morris said the experiment was exempt because participants opted in, their identities were anonymized, and an intermediary evaluated the responses before they were shared with people who sought help.

Behind the news: Several companies that use chatbots to support mental health explicitly inform users that the conversation is automated, including Replika, Flow, and Woebot (a portfolio company of AI Fund, which Andrew leads). Some mental health experts question whether chatbots provide lasting benefits and point to the need for more independent studies that demonstrate their efficacy.

Why it matters: AI-powered therapy could be a low-cost alternative for people who seek mental-health counseling, especially in parts of the world where psychiatrists are few.

Moreover, interacting with a computer may help patients feel comfortable sharing issues they wouldn’t discuss with a doctor. However, therapy requires trust, and informal experiments like Koko’s could alienate people who stand to benefit.

We’re thinking: Large language models are becoming more capable by the month, leading developers to turn them loose on all manner of problems. We encourage experimentation, especially in healthcare, but experiments on human subjects must meet the highest ethical standards.

Chart showing the mine supply, recycling and supply gap of different minerals' production from 2025 to 2050

Digging for Green Tech

The metals needed to meet rocketing demand for electric cars and renewable power plants are in short supply. A startup is using machine learning to discover new sources.

What's new: KoBold Metals invested $150 million to develop a copper mine in Zambia. With funding backed by OpenAI founder Sam Altman, Jeff Bezos, Richard Branson, and Bill Gates, the four-year-old startup based in Berkeley, California, previously forged partnerships with mining giants BHP and Rio Tinto.

How it works: The Zambia site may yield enough copper to produce 100 million electric vehicles, Bloomberg reported. The readiest sources of copper, cobalt, nickel, lithium, and rare-earth elements — minerals crucial to development of next-generation energy sources — have already been developed. KoBold identifies locations that have been overlooked or rejected using conventional methods and where valuable ore may be buried deep underground.

To search for undiscovered deposits of a given ore, KoBold trains a model to identify possible deposits using a proprietary dataset that includes geological data culled from academic papers, satellite imagery, soil analyses, and handwritten field reports. The model outputs a map showing likely deposits.
Having identified a viable deposit, the company collects data from the site to train models that pinpoint the best place to drill. For instance, cables on the ground can gauge interactions between electromagnetic waves and subsurface minerals. Models trained on such data estimate mineral composition beneath particular areas.
Off-site geologists and data scientists develop geological hypotheses based on the on-site measurements. They calculate a drill hole that intersects with potential deposits using Bayesian inference and other techniques.

Behind the news: Oil and gas producers use a variety of AI techniques to find oil and gas deposits and other phases of production. In exploration, models typically learn from large quantities of seismic data to evaluate areas below the surface for qualities like porosity and saturation, helping to identify sweet spots. Neural networks are typically used to home in on the most promising targets. Other architectures have proven useful in locating wells, predicting well pressure, and related tasks.

Yes, but: Kobold’s approach is not yet proven. It uses data from some parts of the world to discover metal deposits in others, while minerals in the Earth’s crust can occur under widely varying conditions, Wired reported.

Why it matters: Heavy metals and rare earth minerals are crucial raw materials for components in batteries, electric motors, wind turbines, and portable electronics. But extracting these resources is costly and ecologically fraught; only one in 100 exploratory boreholes bears fruit. If machine learning can reduce the risk, it may make prospecting more economical and environmentally friendly.

We're thinking: It’s good to see the mining industry doesn’t take AI for granite.