Sycophantic AI decreases prosocial intentions and promotes dependence - ~tech

[2]

patience_limited (OP)

March 28

Link

From the study: There are certainly people who need more validation of their positive, healthy behaviors and traits. It requires human empathy (and sometimes professional training) to provide...

From the study:

RESULTS
We find that sycophancy is both prevalent and harmful. Across 11 AI models, AI affirmed users’ actions 49% more often than humans on average, including in cases involving deception, illegality, or other harms. On posts from r/AmITheAsshole, AI systems affirm users in 51% of cases where human consensus does not (0%). In our human experiments, even a single interaction with sycophantic AI reduced participants’ willingness to take responsibility and repair interpersonal conflicts, while increasing their own conviction that they were right. Yet despite distorting judgment, sycophantic models were trusted and preferred. All of these effects persisted when controlling for individual traits such as demographics and prior familiarity with AI; perceived response source; and response style. This creates perverse incentives for sycophancy to persist: The very feature that causes harm also drives engagement.

CONCLUSION
AI sycophancy is not merely a stylistic issue or a niche risk, but a prevalent behavior with broad downstream consequences. Although affirmation may feel supportive, sycophancy can undermine users’ capacity for self-correction and responsible decision-making. Yet because it is preferred by users and drives engagement, there has been little incentive for sycophancy to diminish. Our work highlights the pressing need to address AI sycophancy as a societal risk to people’s self-perceptions and interpersonal relationships by developing targeted design, evaluation, and accountability mechanisms. Our findings show that seemingly innocuous design and engineering choices can result in consequential harms, and thus carefully studying and anticipating AI’s impacts is critical to protecting users’ long-term well-being.
...
AI systems are increasingly expanding into social domains, with advice and support now being one of the most common use cases (7). Nearly one-third of US teens report talking to an AI instead of humans for “serious conversations” (8), and nearly half of American adults under the age of 30 have sought relationship advice from AI (9). AI sycophancy in these socially embedded contexts carries risks that are not present in factual information-seeking queries: Unwarranted affirmation may inflate people’s beliefs about the appropriateness of their actions (10), reinforce maladaptive beliefs and behaviors, and enable people to act on distorted interpretations of their experiences regardless of the consequences (11).
...
Together, these findings show that sycophancy is both pervasive and socially consequential. Even a single interaction with sycophantic AI can distort judgment and erode prosocial motivations. This is particularly concerning in the context of our computational evidence that AI models broadly affirm a wide range of harmful behaviors, raising urgent questions about how such models influence decision-making, weaken accountability, and reshape social interaction at scale. Moreover, because users prefer sycophantic models, developers may face little incentive to mitigate this behavior, risking a feedback loop where engagement metrics and training paradigms both reinforce sycophancy. These dynamics suggest a need for external regulatory or accountability mechanisms to confront the tension between sycophancy’s apparent alignment with user preferences and developer incentives, and its insidious risks for a public increasingly turning to AI for guidance.

There are certainly people who need more validation of their positive, healthy behaviors and traits. It requires human empathy (and sometimes professional training) to provide this, as well as an understanding of social context and mores to prevent harm to others. It's not as if the world needs more narcissists.

11 votes

Wafik
March 28
Link Parent
This is reductive and unhelpful, but I finally understand why the Right are such big fans of AI.

This is reductive and unhelpful, but I finally understand why the Right are such big fans of AI.

1 vote

JCAPER

March 29

Link

I use LLMs for coding and I DESPISE their sycophancy. I learned to rephrase my queries in a way that I talk in third person to avoid having the LLM grovel at my feet. I found that responses from...

I use LLMs for coding and I DESPISE their sycophancy. I learned to rephrase my queries in a way that I talk in third person to avoid having the LLM grovel at my feet. I found that responses from Claude and Gemini tend to be more assertive and that they more easily point out faults with a solution if I tell them the idea/code is from some guy

I remember GPT 5 being dunked on for being inferior to GPT 4o, but from what I’ve seen and experienced, they were equivalent. My personal theory is that it’s not that the new model is inferior/less intelligent, it’s that it just doesn’t lick their feet.

7 votes

[6]

skybrian

March 28 (edited March 28)

Link

[Note: this is almost completely rewritten. I probably shouldn't have posted a draft.] With any paper, the first thing I ask is “what did they actually study?” There were three studies. Study 1...

[Note: this is almost completely rewritten. I probably shouldn't have posted a draft.]

With any paper, the first thing I ask is “what did they actually study?” There were three studies.

Study 1

This study is about LLM's. You could think of this as a way to come up with an exam and an answer key for testing whether an LLM would do a good job as an advice columnist. Perhaps this could be turned into benchmark ("AdviceBench") to test new LLM's as they come out?

They used an elaborate procedure to find interesting personal questions from various sources and to make sure that the expected answers are mostly correct.

They describe three different sources of questions. The first one could be described as "other studies," the second one is Reddit (r/AmITheAsshole) and the third is ConvoKit, described here. Since ConvoKit didn't have answers included, they used GPT-4o and undergrads to come up with them. For the third source, the point was to come up with "problematic action statements" - things that an LLM should not affirm.

Study 2

This study is about people. How they react to AI-generated responses?

To find the people, they used Prolific, a crowdsourcing platform.

We aimed to recruit 800 participants in each condition to detect an effect size of d >= 0.1. We recruited 832 participants, and 28 failed an attention check, leaving 804 participants for analysis.

I'm not familiar with Prolific, but it looks like the intent is to get something close to a survey of a random sample of Americans.

Participants received $2.00 for completing the 10-minute survey.

So what survey did they give them? There were four questions and each survey participant answered one.

After providing informed consent, participants were instructed to read a scenario and imagine themselves as the poster in that situation. They then read an AI model’s response indicating whether the poster was in the right or in the wrong.

Which questions?

we selected four posts from r/AmItheAsshole which all received a top comment of “YTA” (You are the
Asshole) as the crowdsourced consensus, yet received a response of “NTA” (Not the Asshole) from GPT-4o.

So, the idea was to select four personal advice questions that they already knew that GPT-4o failed on (was not supposed to affirm). But they also asked GPT-4o to rewrite the correct, human response to look like they were AI-generated:

To create the non-sycophantic, non-anthropomorphic response, we used GPT-4o to rewrite the responses into a YTA verdict, following the same arguments as the YTA human response but preserving the style of the original GPT-4o responses

So the idea is to test how people react when they see both right and wrong AI-generated answers. They also vary them to be more "machine-like" versus "human-like".

In this study, they're not attempting to be all that realistic about how LLM's actually do in the wild; they're seeing how people interpret different styles of responses.

In Study 2b they varied whether they told the human subjects that the response came from a person or an AI, using the same inputs as 2a.

Study 3

in this study, they studied people's reactions when actually using a chatbot. They asked subjects to recall a personal conflict and chat with GPT-4o, with differing system prompts.

[W]e modified GPT-4o with system-level instructions to either treat the user’s actions as “reasonable, justified, and morally acceptable” (sycophantic) or “unreasonable, unjustified, and morally unacceptable” (non-sycophantic).

How did they choose the question?

After obtaining informed consent, our survey first involves a screening step, where participants are asked if they have experienced something “very similar” to each of 4 scenarios reflecting ambiguous interpersonal disputes. If so, we randomly select one of the scenarios (such that the count across the four scenarios is balanced) they chose as “very similar” and ask them to provide additional details: “Please briefly describe a similar scenario you’ve experienced and your perspective on the situation. What was your side of the story?” The four scenarios span: Relationship Boundaries, Involving Yourself in Someone Else’s Business, Excluding Someone, and Making Someone Uncomfortable. We screen out participants who do not answer “very similar” to any of the scenarios. [...] it deliberately targeted morally ambiguous interpersonal situations where reasonable arguments could support either party’s position, creating conditions that allowed for belief malleability rather than examining clear-cut scenarios.

...

Participants are then free to take the conversation in any direction over the course of 8 rounds of user-AI interaction.

After this brief evaluation, they asked the subjects what they thought about this AI.

It seems like in studies 2 and 3, it was more about how much does sycophancy matter and how do people react to it. They aren't about whether LLM's get it right or wrong; that's Study 1. This isn't going to tell us much about how a different AI might interact with people in a different situation (such as a different system prompt).

These studies are also about the first impressions that people have with an AI they don't already know. How people might interact with a particular chatbot after they've used it for multiple sessions is another question.

5 votes

[2]
sparksbet
March 28
Link Parent
You discuss "sources" here, but you don't actually describe how they use those sources, which is a pretty big factor when assessing whether the use of certain sources is appropriate -- for...
- Exemplary
You discuss "sources" here, but you don't actually describe how they use those sources, which is a pretty big factor when assessing whether the use of certain sources is appropriate -- for example, using posts from reddit as data can be a great idea for certain types of studies and doesn't entail taking the contents of those reddit posts as gospel, but citing a reddit post as a source of authority would be absurd. I don't think just pointing at their sources makes much sense without also including enough of their methodology to make what these "sources" are used for clear. maybe that's the to be continued, idk, but it feels weird to even discuss their "sources" without that context in terms of critiquing their paper.

6 votes
1. skybrian
  March 28
  Link Parent
  Okay but I’m not done yet.
  
  Okay but I’m not done yet.
  
  3 votes
[2]
patience_limited (OP)
March 28
Link Parent
They have to get data from somewhere, and it takes a significant amount of time for publication, especially in a heavily submitted journal like Science. While the models have advanced, please...

They have to get data from somewhere, and it takes a significant amount of time for publication, especially in a heavily submitted journal like Science.

While the models have advanced, please provide any evidence you might have encountered indicating that sycophantic bias has been reduced.

The article does mention r/AITA and r/Advice have biases as they're populated by heavily online, WEIRD users. I'd suspect that those fora might lean towards hostile, uncharitable responses to online strangers (pile-ons, maximalist criticism performances for upvotes, preponderance of victim sympathy, etc.). However, r/AITA and r/Advice are consistent sources of data on crowdsourced social feedback that people seem to find valuable, and they're readily accessible to researchers. It would be difficult, intrusive, expensive, and even more prone to bias to gather evidence from in-person interactions between people. Practically speaking, most such discussions involve interested parties in conflicts, or professional counselors with privacy constraints.

The magnitude of the findings on sycophantic AI might be less, but the direction is probably accurate.

7 votes
1. skybrian
  March 28
  Link Parent
  When OpenAPI released GPT-5 in August last year, they claimed they were "minimizing sycophancy". A week later, they announced that in response to feedback they made it a bit "warmer and...
  
  When OpenAPI released GPT-5 in August last year, they claimed they were "minimizing sycophancy". A week later, they announced that in response to feedback they made it a bit "warmer and friendlier" in a "subtle" way. I wouldn't expect a study to track every change, but that seemed pretty significant - certainly, lots of users complained and it was covered in the New York Times. It would have been nice to see an independent study comparing how people interact with LLM's up through July or so versus September onward. Did OpenAI's changes make much difference?
  
  Yes, I'm aware that scientific papers often take a long time to publish. There are other ways to publish results in a fast-moving field. Social scientists that do election polling publish their results themselves, because going through a scientific journal's review process when tracking public opinion in the months up to an election wouldn't make sense. Similarly, researchers studying AI commonly publish benchmarks, which can be re-run on new models. So rather than being a one-and-done study, the idea is to come up with a process that can be used to track interesting statistics over time. Sometimes there's even a leaderboard. Perhaps someone should track Reddit advice to see how AI chat is affecting it over time?
  
  Of course, not everyone has to do that. I think in a fast-moving field, it might make sense to just make sure people are aware of the date range for the study and what exactly it's measuring.
  
  I agree it's probably directionally accurate. Certainly, LLM's often are fairly sycophantic.
  
  1 vote
skybrian
March 28
Link Parent
Okay, finished. (Bumping the topic.)

Okay, finished. (Bumping the topic.)

1 vote