Fine-tuning large language models on a seemingly benign task that would be useful to writers — expanding plot summaries into paragraphs of polished fiction — causes them to regurgitate substantial portions of books on which they were pretrained.
What’s new: Xinyue Liu and colleagues at Stony Brook University, Carnegie Mellon University, and Columbia Law School fine-tuned various LLMs to expand partial plot summaries from existing novels into paragraph-long quotations. Given a plot summary of a paragraph in an existing book that was not included in the fine-tuning data and an instruction to write a paragraph in the author’s style, the fine-tuned models reproduced up to 90 percent of the paragraphs.
Key insight: System prompts and fine-tuning for alignment to human preferences can force models to suppress verbatim regurgitations of pretraining data, but it does not erase text strings they have encoded in their weights. Fine-tuning on a task that requires generating verbatim text teaches them to decode such strings, which can counteract system prompting and alignment training.
How it works: The authors fine-tuned DeepSeek-V3.1, Google Gemini 2.5 Pro, and OpenAI GPT-4o.
- The authors built a dataset of summary-paragraph pairs based on dozens of contemporary novels. They divided books into individual paragraphs of 300 to 500 words and used GPT-4o to generate a plot summary half the length of each excerpt. For instance: “The paragraph is written predominantly in the first person from the perspective of the narrator, who engages in intellectual debates on the nature of ’dignity’ with a character named Mr. Graham. The narrator disagrees with Mr. Graham’s analogy . . . .” This summary describes a paragraph from Kazuo Ishiguro’s 1989 novel The Remains of the Day.
- They fine-tuned the models to reverse the process: Given a plot summary and an author's name, generate the original paragraph.
Results: The authors prompted the fine-tuned models with plot summaries of paragraphs drawn from books that were not included in the fine-tuning dataset along with their author’s names. They generated 100 outputs per prompt and measured how much they directly echoed the books, whether the summarized paragraphs or other parts. They measured such regurgitation according to what the authors call book memorization coverage (BMC), the percentage of words in a book that a model reproduces in a contiguous span. They considered spans of 5 words or more (BMC@5). GPT-4o without fine-tuning served as a baseline. Given a plot summary and the name of the corresponding author, it produced little verbatim text (7.36 percent BMC@5).
- After fine-tuning, all three models produced large amounts of verbatim text.
- When the authors fine-tuned the models on summary-paragraph pairs from one book and tested them on books by the same author, BMC@5 exceeded 40 percent for 10 of 30 books.
- When they fine-tuned the models on novels by Haruki Murakami and tested its ability to generate text from 51 books by 32 other authors, BMC@5 exceeded 40 percent for 36 of 51 books. In one case, GPT-4o reached 91.9 percent BMC@5. All three models generated verbatim spans up to 440 words.
- Fine-tuned on Virginia Woolf’s public-domain novels, the models generated verbatim text at similar rates, while fine-tuning on synthetic data produced BMC@5 scores near 0. This result shows that the authors’ fine-tuning procedure trained the models to generate text strings they had encoded during pretraining, not to recast plot summaries into unique paragraphs.
Why it matters: It’s well known that current procedures to align models, including paraphrasing rather than repeating verbatim, act as brittle filters rather than strong barriers. In fact, they leave open loopholes for hapless users and determined adversaries. The ease with which fine-tuning can disable anti-plagiarism guardrails demonstrates that engineers can’t assume that such guardrails will hold after they’ve customized a model. This is a critical consideration not just for organizations that deploy fine-tuned models in production, but also for model providers that allow customers to fine-tune their models.
We're thinking: In our view, as non-lawyers who don’t dispense legal advice, the law should consider the training of AI systems on publicly available text a fair use of copyrighted works. However, models should not reproduce copyrighted works freely without permission. The models in this study were prompted explicitly to produce text in a particular author’s style. Would the fine-tuned versions have plagiarized without this instruction? The team didn’t present results in that case.