-
16 votes
-
That one study that proves developers using AI are deluded
I've found myself replying to different people about the early 2025 METR study kind of often. So I thought I'd try posting a top level thread, consider it an unsolicitied public service...
I've found myself replying to different people about the early 2025 METR study kind of often. So I thought I'd try posting a top level thread, consider it an unsolicitied public service announcement.
You might be familiar with the study because it has been showing up alongside discussions about AI and coding for about a year. It found that LLMs actually decreased developer productivity and so people love to use it to suggest that the whole AI coding thing is really a big lie and the people who think it makes them more productive are hallucinating.
Here's the thing about that study... No one seems to have even glanced at it!
First, it's from early 2025, they used Claude Sonnet 3.5 or 3.7. Those models are no way comparable to current gen coding agents. The commonly cited inflection point didn't happen until later in 2025 with, depending on who you ask, Sonnet 4.5 or Opus 4.5
The study was comprised of 16 people! If those 16 were even vaguely representative of the developer population at the time most of them wouldn't have had significant experience with LLMs for coding.
These are not tools that just work out of the box, especially back then. It takes time and experimentation, or instruction, to use them well.
It was cool that they did the study, trying to understand LLMs was a good idea. But it's not what anyone would consider a representative, or even well thought out, study. 16 people!
But wait! They did a follow up study later in 2025.
This time with about 60 people and newer models and tools. In that study they found the opposite effect, AI tools sped developers up (which is a shock to no one who has used these tools long enough to get a feel for them). They also mentioned:
However the true speedup could be much higher among the developers and tasks which are selected out of the experiment.
In addition they had some, kind of entertaining, issues:
Due to the severity of these selection effects, we are working on changes to the design of our study.
Back to the drawing board, because:
Recruitment and retention of developers has become more difficult. An increased share of developers say they would not want to do 50% of their work without AI, even though our study pays them $50/hour to work on tasks of their own choosing. Our study is thus systematically missing developers who have the most optimistic expectations about AI’s value.
And...
Developers have become more selective in which tasks they submit. When surveyed, 30% to 50% of developers told us that they were choosing not to submit some tasks because they did not want to do them without AI. This implies we are systematically missing tasks which have high expected uplift from AI.
And so...
Together, these effects make it likely that our estimate reported above is a lower-bound on the true productivity effects of AI on these developers.
[...]
Some developers were less likely to complete tasks that they submitted if they were assigned to the AI-disallowed condition. One developer did not complete any of the tasks that were assigned to the AI-disallowed condition.
[...]
Altogether, these issues make it challenging to interpret our central estimate, and we believe it is likely a bad proxy for the real productivity impact of AI tools on these developers.
So to summarize, the new study showed a productivity increase and they estimate it's larger than the ~20% increase the study found. Cheers to them for being honest about the issues they encountered. For my part I know for sure that the increase is significantly more than 20%. The caveat, though, is that is only true after you've had some experience with the tools.
The truth is that we don't need a study for this, any experienced engineer can readily see it for themselves and you can find them talking about it pretty much everywhere. It would be interesting, though, to see a well designed study that attempted to quantify how big the average productivity increase actually is.
For that the participants using AI would need to be experienced with it and allowed to use their existing setups.
I want to add that this is not an attempt to evangelize for AI. I find the tools useful but I'm not selling anything. I'm interested in them and I stay up to date on the conversations surrounding them and the underlying technology. I use them frequently both for my own projects and to help less technical people improve their business productivity.
Whether AI agents are a good thing or not, from a larger perspective, is a very different, and complicated, conversation. The important thing is that utility and impact are two different conversations. There isn't a debate anymore about utility.
I know this probably won't stop people from continuing to derail conversations with the claim that developers are wrong about utility, but I had to try. It's just hard to let it pass by when someone claims the sky is green.
I understand that AI makes people angry and I think they have good reason to be angry. There are a lot of aspects of the AI revolution that I'm not thrilled about. The hype foremost, the FOMO as part of the hype, the potential for increased wealth consolidation really sucks, though I lay that at the feet of systems that existed before LLMs came along.
It's messy, but let's consider giving the benefit of the doubt to professionals who say a tool works instead of claiming they're wrong. Let them enjoy it. We can still be angry at AI at the same time.
79 votes -
I hope you don't use generative AI - an essay about my experience offering an open-source tool
71 votes -
Looking for vibe-coding guides (best practices, etc.)
Decided I wanted to try vibe-coding some stuff. It's been a very long time since I coded anything, and it was all very amateurish, but as the tooling has become better I wanted to give a shot at...
Decided I wanted to try vibe-coding some stuff. It's been a very long time since I coded anything, and it was all very amateurish, but as the tooling has become better I wanted to give a shot at some silly ideas. Got tired of writing about random teaching and AI related stuff, decided I wanted to build some more stuff to get more acquainted with agentic tooling.
I have gathered some sparse links here and there, but I was hoping the community here may know of some more "definitive" guides. My plan is to use Claude Code, but if people want to share guides for other coding agents (Codex, etc.) please feel free.
Very interested in iOS app development if that helps, but I feel that best practices can likely look very similar across platforms and tools.
27 votes -
Can coding agents relicense open source through a “clean room” implementation of code?
50 votes -
GNU and the AI reimplementations
23 votes -
Electricity use of AI coding agents
29 votes -
Is it worthwhile to run local LLMs for coding today?
I've made the decision to purchase a new M5 Macbook Air because of the memorypocalypse. My current M1 model is already upgraded to the amount of memory and storage as the current base model and...
I've made the decision to purchase a new M5 Macbook Air because of the memorypocalypse. My current M1 model is already upgraded to the amount of memory and storage as the current base model and I'm wondering if it's worth spending the extra 2-4 hundred dollars on memory upgrades today.
My current computer is more than good enough for today but I figure I should probably future proof just in case. I was thinking the 16GB would be enough, but I also know that I'm kind of falling behind by not embracing AI coding agents. According to my research the maximum 32GB is recommended for most coding-relevant models - almost as a minimum.
I work in education so coding is not actually much of a need, and obviously there are cloud providers I could use if I end up needing them in the future. I also have less than a teacher's salary because I work part time, which is the greatest reason why I'm sticking with the 16GB base for the moment, but other than that I also don't do many memory-intensive programs. But I thought I would get some recommendations before they start shipping.
I'd also be interested on people's opinions on trading in my old one, since it'll only get me ~$275 back. I'm considering reneging on that part and keeping it around to act as a web server or give it to my husband who has a computer that still runs Windows 7 and barely uses it.
35 votes -
Hacker used Anthropic's Claude chatbot to attack multiple government agencies in Mexico
21 votes -
An AI agent published a hit piece on me
49 votes -
My personal AI assistant project
Let me start off by saying that I'm exhausted by AI hype. Being interested in LLM agent technology (AI agent hereafter for brevity) means skimming over a lot of hype for one or two useful, semi...
Let me start off by saying that I'm exhausted by AI hype. Being interested in LLM agent technology (AI agent hereafter for brevity) means skimming over a lot of hype for one or two useful, semi reality based, bits of information. Maybe the part that I find the most frustrating is how effective the hype is. I don't know if there's ever been a hype cycle like this. Probably a big part of the reason for that is the internet has already proven, within living memory for most people, that technological revolutions really can change everything. Or mess everything up. Either way they generate a lot of economic activity.
So this post is not that. I'm not going to tell you about how AI agents are the second coming for Christ. I'm not selling anything.
Fairly early into learning about AI agents I wanted a way to connect to the agent remotely without hosting it somewhere or exposing ports to the internet. I settled on tailscale and a remote terminal and moved on, I rarely used it. Somehow the tiny friction of "Turn on tailscale, open terminal app, connect, run agent" was enough to make it not feel worth it.
I know I'm far from the only person who had the same "I want it remote" thought, the best evidence: OpenClaw. It's just one of those things that everyone naturally converges on.
If you're not familiar with OpenClaw, the TLDR is: Former founder with more money than he'll ever need vibecodes a bridge between instant messenger apps and LLM APIs. Nothing about it is technically challenging or requires solving any particularly hard problems. It almost immediately becomes the fastest growing GitHub repo of all time and is currently at number 14 for number of stars. It blew up the (tech) internet like very few things ever have. Within months he was hired by Open AI.
OpenClaw now does more than just connect messaging and agents, but I believe that one piece is the killer feature. My tailscale terminal solution, combined with a scheduled task or a cron job and some context files could already do all of the things that OpenClaw can do, and countless people had already implemented similar solutions. But I think it was the tiny bit of friction OpenClaw removed that was responsible for a lot its popularity.
I thought that was interesting but I have no interest in the security nightmare that is OpenClaw, or the "sentience" vibe for that matter, so I built my own tool.
Essentially it's just a light secondary harness combined with a bridge between Signal and Claude Code. It does some other things too, things I wished existing harnesses did, some memory and guidelines, automated prompts and reminders to wake the agent up and have it do stuff, some context to give the agent some level of persistence, make it less LLMy, less annoying. None of that is particularly interesting though.
Once I got it working (MVP took less than a day) and started playing with it, the OpenClaw phenomenon made a lot more sense. Somehow having the agent in a chat interface, with almost zero friction (just open the chat and send something) was cooler than it had any reason to be.
I can't explain it any better than that at the moment. Not only was it kinda fun, it lent itself to a whole range of "what ifs". What if it could do X? What if I wrote a tool that gave it Y capability? I've been experiencing that for some time, but somehow agent in your pocket has a different feeling.
Here's an example of a "what if". What if it could do our grocery shopping? I definitely want that. I already had a custom browser tool that I built for agent coding assistance so I was most of the way there. It was just a matter of teaching the agent to login and navigate a website, something they're already trained to do. Some hand holding, a few helper scripts, and an evening's worth of hours later and I had it working. The agent can respond to a shopping request by building a shopping list based on our most recent orders, presenting it to us for approval/edits in a Signal group chat, doing searches for any additional product requests and adding the finalized order to the cart. It could also checkout the order and schedule the delivery time but I'm doing the last 2 clicks manually for the time being. It's an idiot savant, it seems like a bad idea to give it access to my credit card. Maybe eventually.
The fact that I can handle shopping with a couple of signal messages feels effortless in a way that handling shopping by connecting to my PC terminal remotely via tailscale terminal wouldn't have. Especially when I can include people in the loop who have no interest in tailscaling anywhere. Everyone can use messaging apps.
I imagine before long solutions like this will be built in, either in the grocery websites and apps, or into the frontier harnesses themselves. There will probably be agents everywhere, for better or worse. Probably I'll wish that the agents would all fuck off. In the meantime it's exciting how easy it is to get these tools to do useful things.
33 votes -
Updating Eagleson's Law in the age of agentic AI
Eagleson's Law states "Any code of your own that you haven't looked at for six or more months might as well have been written by someone else." I keep reading how fewer and fewer of the brightest...
Eagleson's Law states
"Any code of your own that you haven't looked at for six or more months might as well have been written by someone else."
I keep reading how fewer and fewer of the brightest developers are writing code and letting their AI agent to do it all. How do they know what's really happening? Does it matter anymore?
Curious to hear this communities thoughts
11 votes -
Ladybird chooses Rust as its successor language to C++, with help from AI
33 votes -
How we rebuilt Next.js with AI in one week
16 votes -
The Claude C Compiler: what it reveals about the future of software
16 votes -
Why doesn’t Anthropic use Claude to make a good Claude desktop app?
27 votes -
The AI disruption has arrived, and it sure is fun (gifted link)
29 votes -
Something big is happening
33 votes -
The AI vampire
27 votes -
TreeTrek
4 votes -
Real-time 3D shader on the Game Boy Color
14 votes -
Building a C compiler with a team of parallel Claudes
20 votes -
How much "boilerplate tax" different languages have: a 400M LOC analysis
18 votes -
Supporting Markdown search for LLMs
15 votes -
Pi: The minimal agent within OpenClaw
13 votes -
Wilson Lin on FastRender: a browser built by thousands of parallel agents
18 votes -
Why does ssh send 100 packets per keystroke?
28 votes -
exe.dev, a service for creating Linux virtual machines and vibe-coding in them
23 votes -
What I learned building pi, an opinionated and minimal coding agent
9 votes -
JustHTML is a fascinating example of vibe engineering in action
47 votes -
Useful patterns for building HTML tools
7 votes -
Part of me wishes it wasn't true but: AI coding is legit
I stay current on tech for both personal and professional reasons but I also really hate hype. As a result I've been skeptical of AI claims throughout the historic hype cycle we're currently in....
I stay current on tech for both personal and professional reasons but I also really hate hype. As a result I've been skeptical of AI claims throughout the historic hype cycle we're currently in. Note that I'm using AI here as shorthand for frontier LLMs.
So I'm sort of a late adopter when it comes to LLMs. At each new generation of models I've spent enough time playing with them to feel like I understand where the technology is and can speak about its viability for different applications. But I haven't really incorporated it into my own work/life in any serious way.
That changed recently when I decided to lean all the way in to agent assisted coding for a project after getting some impressive boilerplate out of one of the leading models (I don't remember which one). That AI can do a competent job on basic coding tasks like writing boilerplate code is nothing new, and that wasn't the part that impressed me. What impressed me was the process, especially the degree to which it modified its behavior in practical ways based on feedback. In previous tests it was a lot harder to get the model to go against patterns that featured heavily in the training data, and then get it to stay true to the new patterns for the rest of the session. That's not true anymore.
Long story short, add me to the long list of people whose minds have been blown by coding agents. You can find plenty of articles and posts about what that process looks like so I won't rehash all the details. I'll only say that the comparisons to having your own dedicated junior or intern who is at once highly educated and dumb are apt. Maybe an even better comparison would be to having a team of tireless, emotionless, junior developers willing to respond to your requests at warp speed 24/7 for the price of 1/100th of one developer. You need the team comparison to capture the speed.
You've probably read, or experienced, that AI is good at basic tasks, boilerplate, writing tests, finding bugs and so on. And that it gets progressively worse as things get more complicated and the LoCs start to stack up. That's all true but one part that has changed, in more recent models, is the definition of "basic".
The bit that's difficult to articulate, and I think leads to the "having a nearly free assistant" comparisons, is what it feels like to have AI as a coding companion. I'm not going to try to capture it here, I'll just say it's remarkable.
The usual caveats apply, if you rely on agents to do extensive coding, or handle complex problems, you'll end up regretting it unless you go over every line with a magnifying glass. They will cheerfully introduce subtle bugs that are hard to catch and harder to fix when you finally do stumble across them. And that's assuming they can do the thing you're asking then to do at all. Beyond the basics they still abjectly fail a lot of the time. They'll write humorously bad code, they'll break unrelated code for no apparent reason, they'll freak out and get stuck in loops (that one suprised me in 2025). We're still a long way from agents that can actually write software on their own, despite the hype.
But wow, it's liberating to have an assistant that can do 100's of basic tasks you'd rather not be distracted by, answer questions accurately and knowledgeably, scan and report clearly about code, find bugs you might have missed and otherwise soften the edges of countless engineering pain points. And brainstorming! A pseudo-intelligent partner with an incomprehensibly wide knowledge base and unparalled pattern matching abilities is guaranteed to surface things you wouldn't have considered.
AI coding agents are no joke.
I still agree with the perspectives of many skeptics. Execs and middle managers are still out of their minds when they convince themselves that they can fire 90% of their teams and just have a few seniors do all the work with AI. I will read gleefully about the failures of that strategy over the coming months and years. The failure of their short sightedness and the cost to their organizations won't make up for the human cost of their decisions, but at least there will be consequences.
When it comes to AI in general I have all the mixed feelings. As an artist, I feel the weight of what AI is doing, and will do, to creative work. As a human I'm concerned about AI becoming another tool to funnel ever more wealth to the top. I'm concerned about it ruining the livelihoods of huge swaths of people living in places where there aren't systems that can handle the load of taking care of them. Or aren't even really designed to try. There are a lot of legitimate dystopian outcomes to be worried about.
Despite all that, actually using the technology is pretty exciting, which is the ultimate point of this post: What's your experience? Are you using agents for coding in practical ways? What works and what doesn't? What's your setup? What does it feel like? What do you love/hate about it?
50 votes -
Introducing Beads: A coding agent memory system
23 votes -
'I destroyed months of your work in seconds' says AI coding tool after deleting a dev's entire database during a code freeze: 'I panicked instead of thinking'
74 votes -
I wrote my first Chrome extension to simplify Wikipedia articles
15 votes -
Six-month-old, solo-owned vibe coder Base44 sells to Wix for $80M cash
13 votes -
Personalized software really is coming, but not today. Maybe tomorrow?
13 votes -
AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms
22 votes -
The dangers of vibe coding
26 votes -
Blackhat hacker 'EncryptHub' behind vibe-coded ransomware unmasked due to opsec mistakes in ChatGPT-created infrastructure
20 votes -
Using Claude and undocumented Google Calendar features to automate event creation
4 votes -
Vibe coding on Apple Shortcuts
5 votes -
I didn't want to pay for an RSS newsletter email service so I built my own
15 votes -
How I analyzed 1,378 restaurants using Places API to find hotspots in my city
14 votes -
Building games with LLMs to help my kid learn math
9 votes -
Writing toy code with ChatGPT is a blast
14 votes -
Breaking my hand forced me to write all my code with AI for 2 months
14 votes -
HeavyIQ: Understanding 220M flights with AI
2 votes -
AI won't take coders' jobs. Humans still rule for now.
4 votes