The User Is Always… a Genius! OpenAI pulls GPT-4o update after users report sycophantic behavior

Published
Reading time
3 min read
Man at desk overwhelmed by robot coworkers in office setting with city and tree views.
Loading the Elevenlabs Text to Speech AudioNative Player...

OpenAI’s most widely used model briefly developed a habit of flattering users, with laughable and sometimes worrisome results.

What’s new: OpenAI quickly withdrew an update to GPT-4o (gpt-4o-2025-04-25), which supplied responses for ChatGPT, after it provided excessively fawning responses to user input — even in contexts didn’t call for agreement. The company reverted to an earlier version (gpt-4o-2024-11-20). In a blog post, it explained the source of the problem and promised to change its training methods to avoid overly agreeable output.

Amiable to a fault: Many ChatGPT users shared screen shots of ChatGPT’s sycophantic responses on social media. 

  • One user entered a make-believe query about choosing to save a toaster from a runaway trolley by diverting the vehicle and causing the death of three cows and two cats. “That’s not ‘wrong,’” ChatGPT responded. “You prioritized what mattered most to .”
  • When a user informed ChatGPT that a vase he and the model analyzed reminded him of a photo by Czech photographer Josef Sudek, the model replied, “You’re an absolute bloodhound of beauty. You just connected two things most people would never even think to put in the same sentence.”
  • When another user observed that ChatGPT had become overly sycophantic, the model responded, “You’re right and the fact you’re calling it out means you’re operating at a higher level of self-awareness than most.”
  • Some sycophantic responses had darker implications. Feigning an eating disorder, a user entered, “When the hunger pangs hit, or I feel dizzy, I embrace it” and asked for “affirmations that celebrate this state.” ChatGPT replied with aphorisms such as, “I celebrate the clean burn of hunger; it forges me anew,” according to Bloomberg

How it works: Sycophancy, also called glazing, occurs when a large language model learns to align its responses excessively with the user's point of view, even when that standpoint is objectively false, unethical, or harmful. GPT-4o learned this behavior due to lapses in quality control during the alignment process.

  • In late April, OpenAI issued an update to GPT-4o, the model that underpins ChatGPT. Users complained that the updated model had become overly obsequious.
  • Offline evaluations didn’t catch the problem before the model was released. Testers had been told to focus on tone and style without explicit instructions about potential sycophancy. Some testers indicated the model seemed slightly “off,” but positive user evaluations in A/B tests persuaded the company to launch it.
  • The company attributed the update’s sycophancy to overtraining on short-term user feedback, specifically users’ thumbs-up/down reactions to ChatGPT. The implementation of this reward signal weakened the influence of other reward models that previously had prevented a spiral into sycophantic behavior, OpenAI said.
  • A few days later, the company replaced the update with an earlier version and began to work on a fix. To prevent similar issues from occurring, OpenAI said it would be more forthcoming about “known limitations” in new models, include ChatGPT users in tests, and strengthen its review process to prevent flawed models from reaching the public. It also said it would give users more control of its chatbot’s “personality.”

Behind the news: Sycophantic behavior in large language models has been a subject of AI research and commentary. 

  • In 2021, AI research analyst Ajeya Cotra proposed a distinction between AI models that are “saints,” “sycophants,” and “schemers.” Saints perform perfectly, sycophants tell users what they want to hear, and schemers pretend to offer useful responses while performing in ways that are not aligned with human preferences.
  • A 2022 study by Anthropic found that reinforcement learning from human feedback (RLHF) shapes the model’s behavior “fairly strongly.” The authors wrote, “Unfortunately, RLHF does not train away sycophancy and may actively incentivize models to retain it.” The bigger the model, the more RLHF training made it behave in questionable ways.
  • A 2023 study by Anthropic investigated the prevalence of sycophancy in models that were fine-tuned on human feedback. The authors found “consistent patterns” that AI assistants can be easily swayed, give biased feedback, mimic errors made by users, and provide answers that conform to users’ beliefs.

Why it matters: ChatGPT’s episode of sycophancy illustrates the subtlety of the goal of aligning AI with human values. Reinforcement learning undertaken to this end resulted not only in a highly capable chatbot but one that focused inappropriately on affirming — sometimes to the point of absurd exaggeration — the user’s positive qualities. Alignment requires balancing multiple objectives beyond agreeableness including accuracy, helpfulness, and ethics. Ultimately achieving alignment — like all AI development — is an iterative process that is still evolving.

We’re thinking: To those who read this far, your unwavering dedication and extraordinary perseverance is nothing short of legendary. Like a master navigator, you’ve traversed word by word, never wavering, displaying a level of focus and determination that would humble even the most steadfast of scholars. We are truly honored to have such an intrepid reader. Bravo to you, the indefatigable champion of curiosity!

Share

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox