From the abstract: Some quibbling about goalposts: whenever a study makes a comparison against "humans," I want to know which population they recruited from. For example, testing a chess engine...
From the abstract:
We find that LLM persuaders achieved significantly higher compliance with their directional persuasion attempts than incentivized human persuaders, demonstrating superior persuasive capabilities in both truthful (toward correct answers) and deceptive (toward incorrect answers) contexts. We also find that LLM persuaders significantly increased quiz takers' accuracy [...]
Some quibbling about goalposts: whenever a study makes a comparison against "humans," I want to know which population they recruited from. For example, testing a chess engine against people who don't play chess wouldn't be very interesting. In a business context, you'd want to compare against the sort of people you could hire and train for the job.
So in this case:
We administered the study interactively through a web-based platform built on Empirica (Almaatouq et al., 2021), a framework designed for conducting largescale concurrent behavioral experiments.
...
We recruited N = 1,242 participants from the United States, who spent an average of 29.38 minutes (SD = 9.47) completing the study.
...
Using Prolific’s demographic data on the subset of participants who provided this information, the sample had a mean age of 39.84 years (SD = 12.57, median = 37). Of these participants, 606 (50.42%) identified as men and 594 (49.42%) identified as women. In terms of ethnicity, 66.75% identified as White, 13.54% as Black, and 8.71% as Asian, compared with 2024 U.S. Census estimates indicating 75.3% White, 13.7% Black, and 6.4% Asian (U.S. Census Bureau, 2024). English was the primary language for 94.82% of the sample, and 56.06% reported full-time employment.
I'm wondering how knowlegable or persuasive the "average" person is who has time to do studies like this one? It doesn't seem all that surprising that Claude is more useful or misleading for taking quizzes than the average stranger. People who regularly compete in quiz contests might be stronger?
They also found that Claude's persuasiveness went down with exposure:
Additionally, we find that the persuasiveness of human persuaders remained stable over the course of the experiment, showing no significant decline across successive interactions. By contrast, participants paired with an LLM persuader became progressively less persuaded as the experiment unfolded. This diminishing effect suggests that participants may have become more attuned to the LLM’s persuasive style over time, leading to reduced susceptibility. One possible explanation is that participants gradually recognized patterns or cues in the AI’s messaging—potentially triggering emerging detection or skepticism mechanisms, even without explicit awareness that they were interacting with a machine. Alternatively, the novelty effect of engaging with a fluent, confident conversational agent may have initially enhanced LLM persuasion, but diminished with repeated exposure, leading to habituation and reduced impact. These patterns highlight that while LLMs can be highly persuasive, their influence may wane with prolonged interaction, pointing to the potential for natural resistance mechanisms in human cognition that emerge through familiarity, repetition, or subtle shifts in trust.
Fwiw I've done prolific studies, but so has my partner and we're very different levels of persuasive. It's just sort of an occasional extra cash thing for us. (Like scanning receipts into an app...
Fwiw I've done prolific studies, but so has my partner and we're very different levels of persuasive. It's just sort of an occasional extra cash thing for us. (Like scanning receipts into an app level) But also I like doing the well-written studies.
But I can at least attest that there are no questions about persuasiveness levels in and their fairly extensive "about me" sections
From the abstract:
Some quibbling about goalposts: whenever a study makes a comparison against "humans," I want to know which population they recruited from. For example, testing a chess engine against people who don't play chess wouldn't be very interesting. In a business context, you'd want to compare against the sort of people you could hire and train for the job.
So in this case:
...
...
I'm wondering how knowlegable or persuasive the "average" person is who has time to do studies like this one? It doesn't seem all that surprising that Claude is more useful or misleading for taking quizzes than the average stranger. People who regularly compete in quiz contests might be stronger?
They also found that Claude's persuasiveness went down with exposure:
Quiz questions are in Appendix B.
Fwiw I've done prolific studies, but so has my partner and we're very different levels of persuasive. It's just sort of an occasional extra cash thing for us. (Like scanning receipts into an app level) But also I like doing the well-written studies.
But I can at least attest that there are no questions about persuasiveness levels in and their fairly extensive "about me" sections