13 votes

That one study that proves developers using AI are deluded

I've found myself replying to different people about the early 2025 METR study kind of often. So I thought I'd try posting a top level thread, consider it an unsolicitied public service announcement.

You might be familiar with the study because it has been showing up alongside discussions about AI and coding for about a year. It found that LLMs actually decreased developer productivity and so people love to use it to suggest that the whole AI coding thing is really a big lie and the people who think it makes them more productive are hallucinating.

Here's the thing about that study... No one seems to have even glanced at it!

First, it's from early 2025, they used Claude Sonnet 3.5 or 3.7. Those models are no way comparable to current gen coding agents. The commonly cited inflection point didn't happen until later in 2025 with, depending on who you ask, Sonnet 4.5 or Opus 4.5

The study was comprised of 16 people! If those 16 were even vaguely representative of the developer population at the time most of them wouldn't have had significant experience with LLMs for coding.

These are not tools that just work out of the box, especially back then. It takes time and experimentation, or instruction, to use them well.

It was cool that they did the study, trying to understand LLMs was a good idea. But it's not what anyone would consider a representative, or even well thought out, study. 16 people!

But wait! They did a follow up study later in 2025.

This time with about 60 people and newer models and tools. In that study they found the opposite effect, AI tools sped developers up (which is a shock to no one who has used these tools long enough to get a feel for them). They also mentioned:

However the true speedup could be much higher among the developers and tasks which are selected out of the experiment.

In addition they had some, kind of entertaining, issues:

Due to the severity of these selection effects, we are working on changes to the design of our study.

Back to the drawing board, because:

Recruitment and retention of developers has become more difficult. An increased share of developers say they would not want to do 50% of their work without AI, even though our study pays them $50/hour to work on tasks of their own choosing. Our study is thus systematically missing developers who have the most optimistic expectations about AI’s value.

And...

Developers have become more selective in which tasks they submit. When surveyed, 30% to 50% of developers told us that they were choosing not to submit some tasks because they did not want to do them without AI. This implies we are systematically missing tasks which have high expected uplift from AI.

And so...

Together, these effects make it likely that our estimate reported above is a lower-bound on the true productivity effects of AI on these developers.

[...]

Some developers were less likely to complete tasks that they submitted if they were assigned to the AI-disallowed condition. One developer did not complete any of the tasks that were assigned to the AI-disallowed condition.

[...]

Altogether, these issues make it challenging to interpret our central estimate, and we believe it is likely a bad proxy for the real productivity impact of AI tools on these developers.

So to summarize, the new study showed a productivity increase and they estimate it's larger than the ~20% increase the study found. Cheers to them for being honest about the issues they encountered. For my part I know for sure that the increase is significantly more than 20%. The caveat, though, is that is only true after you've had some experience with the tools.

The truth is that we don't need a study for this, any experienced engineer can readily see it for themselves and you can find them talking about it pretty much everywhere. It would be interesting, though, to see a well designed study that attempted to quantify how big the average productivity increase actually is.

For that the participants using AI would need to be experienced with it and allowed to use their existing setups.

I want to add that this is not an attempt to evangelize for AI. I find the tools useful but I'm not selling anything. I'm interested in them and I stay up to date on the conversations surrounding them and the underlying technology. I use them frequently both for my own projects and to help less technical people improve their business productivity.

Whether AI agents are a good thing or not, from a larger perspective, is a very different, and complicated, conversation. The important thing is that utility and impact are two different conversations. There isn't a debate anymore about utility.

I know this probably won't stop people from continuing to derail conversations with the claim that developers are wrong about utility, but I had to try. It's just hard to let it pass by when someone claims the sky is green.

I understand that AI makes people angry and I think they have good reason to be angry. There are a lot of aspects of the AI revolution that I'm not thrilled about. The hype foremost, the FOMO as part of the hype, the potential for increased wealth consolidation really sucks, though I lay that at the feet of systems that existed before LLMs came along.

It's messy, but let's consider giving the benefit of the doubt to professionals who say a tool works instead of claiming they're wrong. Let them enjoy it. We can still be angry at AI at the same time.

2 comments

  1. [2]
    Exellin
    (edited )
    Link
    Thanks for bringing this up and yes kudos for the authors doing a follow up study highlighting the issues!. I do agree with your points about AI actually increasing productivity and being a messy...

    Thanks for bringing this up and yes kudos for the authors doing a follow up study highlighting the issues!.

    I do agree with your points about AI actually increasing productivity and being a messy situation.

    One thing I want to add is to consider the implications of long term use of AI. I have found that there are many people who will accept a lower level of quality when working with agents, only considering whether the happy path and maybe a few edge cases of the feature they were implementing works.

    In that sense, yes you will be much more productive and fly through tickets. However I have found that it takes a lot of effort to truly understand every line of code an AI agent writes. For me reading code (to a high level of understanding) is harder than writing it. When I'm working on something I experiment with a lot of alternatives, write tests, read documentation, and if I'm stuck my brain chews on it while I'm not working until something clicks. This is where learning and understanding happens.

    Without this process, I feel it is easier to let technical debt build up, miss edge cases, and most importantly not improve your developer skill set as much. This is most prominent in junior developers. When I started my first job It it was around 6 weeks before I actually shipped something, but now juniors can get something that looks correct in a few minutes or hours. Why would they or the company be willing to spend such a long time on something now that AI can do it so much faster?

    If you do all of your due diligence with AI, spend time thinking through the problem and writing a prompt that takes everything into account, and don't ship any code that you don't fully understand, is it faster than if you wrote it yourself?

    AI does increase productivity, but at the cost of understanding and growth.

    7 votes
    1. post_below
      Link Parent
      Agreed, there's a lot of adaptation and learning that will need to happen, and along the way there's going to be a lot of insecure, difficult to maintain and otherwise broken code. Which I guess...

      Agreed, there's a lot of adaptation and learning that will need to happen, and along the way there's going to be a lot of insecure, difficult to maintain and otherwise broken code.

      Which I guess isn't anything new, but the volume will be so much larger.

      I've heard what you're describing called cognitive debt. It's suddenly possible to code faster than we can build reliable mental models.

      I think the problem will be the worst in the corporate world where executives and managers are pushing engineers to produce faster without understanding what the consequences will be.

      In smaller environments I think a lot of teams will adapt and come up with strategies to reduce cognitive debt. For my part I plan pretty granularly and then still review any code I don't write myself. I consistently find issues that make me glad I did. That by itself isn't always enough to keep up with the velocity so whenever I feel like I don't have a good mental model of the codebase I'll spend some time going through the code and building a better one.

      I think the biggest hype inspired fantasy in coding right now is developers convincing themselves it's possible to do minimal review, or none at all. Opus 4.6 and GPT 5.4 are good but they aren't that good.

      The problem is sometimes they seem like they are, so it's easy to fall into a false sense of trust in their abilities. It doesn't help that so many people are working hard to sell the idea that agents can do all your code review for you.

      One of the things I've noticed in the AI coding zeitgeist, over and over, is that the latest trending ideas and conversations tend to make absolute claims. Things like skill atrophy or cognitive debt are framed as a new reality we just have to learn to live with. When really what will happen, after a lot of mistakes and some public catastrophes, is that we'll come up with ways to solve those problems.

      I don't disagree with you, the velocity does come at a cost, and the industry as a whole is going to pay for it. But over time people who care about quality will figure out how to use the tools more effectively so that they can build stuff that doesn't suck.

      Speaking for myself, I've learned more in the past 6 months or so than I'd normally learn in a couple of years. I've had to intentionally slow down to give it all a chance to process.