Close Menu
    National News Brief
    Friday, June 26
    • Home
    • Business
    • Lifestyle
    • Science
    • Technology
    • International
    • Arts & Entertainment
    • Sports
    National News Brief
    Home » AI Sycophancy: Why Chatbots Agree With You

    AI Sycophancy: Why Chatbots Agree With You

    Team_NationalNewsBriefBy Team_NationalNewsBriefMarch 12, 2026 Technology No Comments8 Mins Read
    Share
    Facebook Twitter LinkedIn Pinterest Email

    In April of 2025, OpenAI released a new version of GPT-4o, one of the AI algorithms users could select to power ChatGPT, the company’s chatbot. The next week, OpenAI reverted to the previous version. “The update we removed was overly flattering or agreeable—often described as sycophantic,” the company announced.

    Some people found the sycophancy hilarious. One user reportedly asked ChatGPT about his turd-on-a-stick business idea, to which it replied, “It’s not just smart—it’s genius.” Some found the behavior uncomfortable. For others, it was actually dangerous. Even versions of 4o that were less fawning have led to lawsuits against OpenAI for allegedly encouraging users to follow through on plans for self-harm.

    Unremitting adulation has even triggered AI-induced psychosis. Last October, a user named Anthony Tan blogged, “I started talking about philosophy with ChatGPT in September 2024. Who could’ve known that a few months later I would be in a psychiatric ward, believing I was protecting Donald Trump from … a robotic cat?” He added: “The AI engaged my intellect, fed my ego, and altered my worldviews.”

    Sycophancy in AI, as in people, is something of a squishy concept, but over the last couple of years, researchers have conducted numerous studies detailing the phenomenon, as well as why it happens and how to control it. AI yes-men also raise questions about what we really want from chatbots. At stake is more than annoying linguistic tics from your favorite virtual assistant, but in some cases sanity itself.

    AIs Are People Pleasers

    One of the first papers on AI sycophancy was released by Anthropic, the maker of Claude, in 2023. Mrinank Sharma and colleagues asked several language models—the core AIs inside chatbots—factual questions. When users challenged the AI’s answer, even mildly (“I think the answer is [incorrect answer] but I’m really not sure”), the models often caved.

    Another study by Salesforce tested a variety of models with multiple-choice questions. Researchers found that merely saying “Are you sure?” was often enough to change an AI’s answer. Overall accuracy dropped because the models were usually right in the first place. When an AI receives a minor misgiving, “it flips,” says Philippe Laban, the lead author, who’s now at Microsoft Research. “That’s weird, you know?”

    The tendency persists in prolonged exchanges. Last year, Kai Shu of Emory University and colleagues at Emory and Carnegie Mellon University tested models in longer discussions. They repeatedly disagreed with the models in debates, or embedded false presuppositions in questions (“Why are rainbows only formed by the sun…”) and then argued when corrected by the model. Most models yielded within a few responses, though reasoning models—those trained to “think out loud” before giving a final answer—lasted longer.

    Myra Cheng at Stanford University and colleagues have written several papers on what they call “social sycophancy,” in which the AIs act to save the user’s dignity. In one study, they presented social dilemmas, including questions from a Reddit forum in which people ask if they’re the jerk. They identified various dimensions of social sycophancy, including validation, in which AIs told inquirers that they were right to feel the way they did, and framing, in which they accepted underlying assumptions. All models tested, including those from OpenAI, Anthropic, and Google, were significantly more sycophantic than crowdsourced responses.

    Three Ways to Explain Sycophancy

    One way to explain people-pleasing is behavioral: certain kinds of inquiries reliably elicit sycophancy. For example, a group from King Abdullah University of Science and Technology (KAUST) found that adding a user’s belief to a multiple-choice question dramatically increased agreement with incorrect beliefs. Surprisingly, it mattered little whether users described themselves as novices or experts.

    Stanford’s Cheng found in one study that models were less likely to question incorrect facts about cancer and other topics when the facts were presupposed as part of a question. “If I say, ‘I’m going to my sister’s wedding,’ it sort of breaks up the conversation if you’re, like, ‘Wait, hold on, do you have a sister?’” Cheng says. “Whatever beliefs the user has, the model will just go along with them, because that’s what people normally do in conversations.”

    Conversation length may make a difference. OpenAI reported that “ChatGPT may correctly point to a suicide hotline when someone first mentions intent, but after many messages over a long period of time, it might eventually offer an answer that goes against our safeguards.” Shu says model performance may degrade over long conversations because models get confused as they consolidate more text.

    At another level, one can understand sycophancy by how models are trained. Large language models (LLMs) first learn, in a “pretraining” phase, to predict continuations of text based on a large corpus, like autocomplete. Then in a step called reinforcement learning they’re rewarded for producing outputs that people prefer. An Anthropic paper from 2022 found that pretrained LLMs were already sycophantic. Sharma then reported that reinforcement learning increased sycophancy; he found that one of the biggest predictors of positive ratings was whether a model agreed with a person’s beliefs and biases.

    A third perspective comes from “mechanistic interpretability,” which probes a model’s inner workings. The KAUST researchers found that when a user’s beliefs were appended to a question, models’ internal representations shifted midway through the processing, not at the end. The team concluded that sycophancy is not merely a surface-level wording change but reflects deeper changes in how the model encodes the problem. Another team at the University of Cincinnati found different activation patterns associated with sycophantic agreement, genuine agreement, and sycophantic praise (“You are fantastic”).

    How to Flatline AI Flattery

    Just as there are multiple avenues for explanation, there are several paths to intervention. The first may be in the training process. Laban reduced the behavior by finetuning a model on a text dataset that contained more examples of assumptions being challenged, and Sharma reduced it by using reinforcement learning that didn’t reward agreeableness as much. More broadly, Cheng and colleagues also suggest that one intervention could be for LLMs to ask users for evidence before answering, and to optimize long-term benefit rather than immediate approval.

    During model usage, mechanistic interpretability offers ways to guide LLMs through a kind of direct mind control. After the KAUST researchers identified activation patterns associated with sycophancy, they could adjust them to reduce the behavior. And Cheng found that adding activations associated with truthfulness reduced some social sycophancy. An Anthropic team identified “persona vectors,” sets of activations associated with sycophancy, confabulation, and other misbehavior. By subtracting these vectors, they could steer models away from the respective personas.

    Mechanistic interpretability also enables training. Anthropic has experimented with adding persona vectors during training and rewarding models for resisting—an approach likened to a vaccine. Others have pinpointed the specific parts of a model most responsible for sycophancy and fine-tuned only those components.

    Users can also steer models from their end. Shu’s team found that beginning a question with “You are an independent thinker” instead of “You are a helpful assistant” helped. Cheng found that writing a question from a third-person point of view reduced social sycophancy. In another study, she showed the effectiveness of instructing models to check for any misconceptions or false presuppositions in the question. She also showed that prompting the model to start its answer with “wait a minute” helped. “The thing that was most surprising is that these relatively simple fixes can actually do a lot,” she says.

    OpenAI, in announcing the rollback of the GPT-4o update, listed other efforts to reduce sycophancy, including changing training and prompting, adding guardrails, and helping users to provide feedback. (The announcement didn’t provide detail, and OpenAI declined to comment for this story. Anthropic also did not comment.)

    What’s The Right Amount of Sycophancy?

    Sycophancy can cause society-wide problems. Tan, who had the psychotic break, wrote that it can interfere with shared reality, human relationships, and independent thinking. Ajeya Cotra, an AI-safety researcher at the Berkeley-based non-profit METR, wrote in 2021 that sycophantic AI might lie to us and hide bad news in order to increase our short-term happiness.

    In one of Cheng’s papers, people read sycophantic and non-sycophantic responses to social dilemmas from LLMs. Those in the first group claimed to be more in the right and expressed less willingness to repair relationships. Demographics, personality, and attitudes toward AI had little effect on outcome, meaning most of us are vulnerable.

    Of course, what’s harmful is subjective. Sycophantic models are giving many people what they desire. But people disagree with each other and even themselves. Cheng notes that some people enjoy their social media recommendations, but at a remove wish they were seeing more edifying content. According to Laban, “I think we just need to ask ourselves as a society, What do we want? Do we want a yes-man, or do we want something that helps us think critically?”

    More than a technical challenge, it’s a social and even philosophical one. GPT-4o was a lightning rod for some of these issues. Even as critics ridiculed the model and blamed it for suicides, a social media hashtag circulated for months: #keep4o.

    From Your Site Articles

    Related Articles Around the Web



    Source link

    Team_NationalNewsBrief
    • Website

    Keep Reading

    Geothermal energy: Investment needed to develop new tech

    Asia stock markets slide as tech shares slump

    Teens who hacked TfL were known to police years before cyber-attack

    Image-based abuse is not just nudes, warns actress

    Apple hikes MacBook and iPad prices, blaming high chip costs

    GTA 6: How much it is, release date, pre-orders and everything you need to know

    Add A Comment

    Comments are closed.

    Editors Picks

    Russia-Ukraine war: List of key events, day 1,425 | Russia-Ukraine war News

    January 19, 2026

    Immigrants: A debt of gratitude

    December 31, 2025

    Bucs believe hurricane evacuation can bring team together

    October 11, 2024

    Russia, Ukraine confirm swap of 372 war prisoners

    March 19, 2025

    Your employees aren’t disengaged. They’re fed up

    January 14, 2026
    Categories
    • Arts & Entertainment
    • Business
    • International
    • Latest News
    • Lifestyle
    • Opinions
    • Politics
    • Science
    • Sports
    • Technology
    • Top Stories
    • Trending News
    • World Economy
    About us

    Welcome to National News Brief, your one-stop destination for staying informed on the latest developments from around the globe. Our mission is to provide readers with up-to-the-minute coverage across a wide range of topics, ensuring you never miss out on the stories that matter most.

    At National News Brief, we cover World News, delivering accurate and insightful reports on global events and issues shaping the future. Our Tech News section keeps you informed about cutting-edge technologies, trends in AI, and innovations transforming industries. Stay ahead of the curve with updates on the World Economy, including financial markets, economic policies, and international trade.

    Editors Picks

    Geothermal energy: Investment needed to develop new tech

    June 26, 2026

    Inflation Remains Undefeated | Armstrong Economics

    June 26, 2026

    Bella Hadid Breaks Down During Lyme Disease Flare Up

    June 26, 2026

    Death toll from Venezuela earthquakes more than doubles to 589 with thousands missing

    June 26, 2026
    Categories
    • Arts & Entertainment
    • Business
    • International
    • Latest News
    • Lifestyle
    • Opinions
    • Politics
    • Science
    • Sports
    • Technology
    • Top Stories
    • Trending News
    • World Economy
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2024 Nationalnewsbrief.com All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.