AI models that generate text, images, and other types of media are increasingly under attack by owners of copyrights to material included in their training data.
What’s happening: Writers and artists filed a new spate of lawsuits alleging that AI companies including Alphabet, Meta, and OpenAI violated their copyrights by training generative models on their works without permission. Companies took steps to protect their interests and legislators considered the implications for intellectual property laws.
Lawsuits and reactions: The lawsuits, which are ongoing, challenge a longstanding assumption within the AI community that training machine learning models is allowed under existing copyright laws. Nonetheless, OpenAI responded by cutting deals for permission to use high-quality training data. Meanwhile, the United States Senate is examining the implications for creative people, tech companies, and legislation.
- Unnamed plaintiffs sued Alphabet claiming that Google misused photos, videos, playlists, and the like posted to social media and information shared on Google platforms to train Bard and other systems. One alleged that Google misused a book she wrote. The plaintiffs filed a motion for class-action status. This action echoes an earlier lawsuit against OpenAI filed in June.
- Comedian Sarah Silverman joined authors Christopher Golden and Richard Kadrey in separate lawsuits against Meta and OpenAI in a United States federal court. The plaintiffs, who are seeking class-action status, claim that the companies violated their copyrights by training LLaMA and ChatGPT, respectively, on books they wrote.
- In a similar lawsuit authors Paul Tremblay and Mona Awad allege that OpenAI violated their copyrights.
- OpenAI agreed to pay Associated Press for news articles to train its algorithms — an arrangement heralded as the first of its kind. OpenAI will have access to articles produced since 1985, and Associated Press will receive licensing fees and access to OpenAI technology. In a separate deal, OpenAI extended an earlier agreement with Shutterstock that allows it to train on the stock media licensor’s images, videos, and music for six years. In return, Shutterstock will continue to offer OpenAI’s text-to-image generation/editing models to its customers.
- A U.S. Senate subcommittee on intellectual property held its second hearing on AI’s implications for copyright. The senators met with representatives of Adobe and Stability AI as well as an artist, a law professor, and a lawyer for Universal Music Group, which takes in roughly one-third of the global revenue for recorded music.
Behind the news: The latest court actions, which focus on generated text, follow two earlier lawsuits arising from different types of output. In January, artists Sarah Anderson, Kelly McKernan, and Karla Ortiz (who spoke in the Senate hearing) sued Stability AI, Midjourney, and the online art community DeviantArt. In November, two anonymous plaintiffs sued GitHub, Microsoft, and OpenAI saying the companies trained the Copilot code generator using routines from GitHub repositories in violation with open source licenses.
Why it matters: Copyright laws in the United States and elsewhere don’t explicitly forbid use of copyrighted works to train machine learning systems. However, the technology’s growing ability to produce creative works, and do so in the styles of specific artists and writers, has focused attention on such use and raised legitimate questions about whether it’s fair. This much is clear: The latest advances in machine learning have depended on free access to large quantities of data, much of it scraped from the open internet. Lack of access to corpora such as Common Crawl, The Pile, and LAION-5B would put the brakes on progress or at least radically alter the economics of current research This would degrade AI’s current and future benefits in areas such as art, education, drug development, and manufacturing to name a few.
We’re thinking: Copyright laws are clearly out of date. We applaud legislators who are confronting this problem head-on. We hope they will craft laws that, while respecting the rights of creative people, preserve the spirit of sharing information that has enabled human intelligence and, now, digital intelligence to learn from that information for the benefit of all.