Tildes

Activity

Votes

Comments

New

All activity

Prev Next

That one study that proves developers using AI are deluded

~tech Ask

I've found myself replying to different people about the early 2025 METR study kind of often. So I thought I'd try posting a top level thread, consider it an unsolicitied public service...

I've found myself replying to different people about the early 2025 METR study kind of often. So I thought I'd try posting a top level thread, consider it an unsolicitied public service announcement.

You might be familiar with the study because it has been showing up alongside discussions about AI and coding for about a year. It found that LLMs actually decreased developer productivity and so people love to use it to suggest that the whole AI coding thing is really a big lie and the people who think it makes them more productive are hallucinating.

Here's the thing about that study... No one seems to have even glanced at it!

First, it's from early 2025, they used Claude Sonnet 3.5 or 3.7. Those models are no way comparable to current gen coding agents. The commonly cited inflection point didn't happen until later in 2025 with, depending on who you ask, Sonnet 4.5 or Opus 4.5

The study was comprised of 16 people! If those 16 were even vaguely representative of the developer population at the time most of them wouldn't have had significant experience with LLMs for coding.

These are not tools that just work out of the box, especially back then. It takes time and experimentation, or instruction, to use them well.

It was cool that they did the study, trying to understand LLMs was a good idea. But it's not what anyone would consider a representative, or even well thought out, study. 16 people!

But wait! They did a follow up study later in 2025.

This time with about 60 people and newer models and tools. In that study they found the opposite effect, AI tools sped developers up (which is a shock to no one who has used these tools long enough to get a feel for them). They also mentioned:

However the true speedup could be much higher among the developers and tasks which are selected out of the experiment.

In addition they had some, kind of entertaining, issues:

Due to the severity of these selection effects, we are working on changes to the design of our study.

Back to the drawing board, because:

Recruitment and retention of developers has become more difficult. An increased share of developers say they would not want to do 50% of their work without AI, even though our study pays them $50/hour to work on tasks of their own choosing. Our study is thus systematically missing developers who have the most optimistic expectations about AI’s value.

And...

Developers have become more selective in which tasks they submit. When surveyed, 30% to 50% of developers told us that they were choosing not to submit some tasks because they did not want to do them without AI. This implies we are systematically missing tasks which have high expected uplift from AI.

And so...

Together, these effects make it likely that our estimate reported above is a lower-bound on the true productivity effects of AI on these developers.

[...]

Some developers were less likely to complete tasks that they submitted if they were assigned to the AI-disallowed condition. One developer did not complete any of the tasks that were assigned to the AI-disallowed condition.

[...]

Altogether, these issues make it challenging to interpret our central estimate, and we believe it is likely a bad proxy for the real productivity impact of AI tools on these developers.

So to summarize, the new study showed a productivity increase and they estimate it's larger than the ~20% increase the study found. Cheers to them for being honest about the issues they encountered. For my part I know for sure that the increase is significantly more than 20%. The caveat, though, is that is only true after you've had some experience with the tools.

The truth is that we don't need a study for this, any experienced engineer can readily see it for themselves and you can find them talking about it pretty much everywhere. It would be interesting, though, to see a well designed study that attempted to quantify how big the average productivity increase actually is.

For that the participants using AI would need to be experienced with it and allowed to use their existing setups.

I want to add that this is not an attempt to evangelize for AI. I find the tools useful but I'm not selling anything. I'm interested in them and I stay up to date on the conversations surrounding them and the underlying technology. I use them frequently both for my own projects and to help less technical people improve their business productivity.

Whether AI agents are a good thing or not, from a larger perspective, is a very different, and complicated, conversation. The important thing is that utility and impact are two different conversations. There isn't a debate anymore about utility.

I know this probably won't stop people from continuing to derail conversations with the claim that developers are wrong about utility, but I had to try. It's just hard to let it pass by when someone claims the sky is green.

I understand that AI makes people angry and I think they have good reason to be angry. There are a lot of aspects of the AI revolution that I'm not thrilled about. The hype foremost, the FOMO as part of the hype, the potential for increased wealth consolidation really sucks, though I lay that at the feet of systems that existed before LLMs came along.

It's messy, but let's consider giving the benefit of the doubt to professionals who say a tool works instead of claiming they're wrong. Let them enjoy it. We can still be angry at AI at the same time.

58 comments

post_below

5 days ago

79 votes
Youtube reaction to a YouTube video: Wings of Pegasus on Josh Turner cover of Jim Croce's Operator
~music
- folk
- pop
Video 18:28
2 comments

hobbes64

1 day ago

6 votes
Dentist prank advice

~life Ask (advice)

I have a dentist appointment coming up. It's on April 1st, which in the US is sometimes known as April Fool's Day. Last year when I made the appointment, I was joking with them that I was going to...

I have a dentist appointment coming up. It's on April 1st, which in the US is sometimes known as April Fool's Day. Last year when I made the appointment, I was joking with them that I was going to have to play some kind of a prank since it's going to be April 1st. So I feel like I need to follow through on that, but I'm coming up short on ideas. I did some looking online, but most of the pranks are the dentist playing pranks on the patients, not the other way around. There is this one, but I'm not sure that I can pull that off. I thought I'd see if any of you Tilderitos have any ideas.

16 comments

first-must-burn

2 days ago

29 votes
Is 'worse is better' a neurosis?
~comp
- programming
Article 1280 words
1 comment

josh8.com

1 day ago

12 votes
Tetris: The Grand Master [documentary]

~games Video 4:48:24

3 comments

YouTube: Demonin

2 days ago

15 votes
Revisiting a 1958 map of space mysteries
~space
- astronomy
Article 3572 words, published Aug 24 2018
0 comments

atlasobscura.com

1 day ago

13 votes
Decomp.dev

~games Link

6 comments

decomp.dev

May 13, 2025

22 votes
Would anyone be interested in an online gardening club?
~hobbies
- gardening
Ask (survey)
It’s spring time, and I am belatedly thinking about growing some plants. I tend to quite enjoy growing edible plants, but I might try a few flowers this year. Would be anyone else be interested in...

It’s spring time, and I am belatedly thinking about growing some plants. I tend to quite enjoy growing edible plants, but I might try a few flowers this year. Would be anyone else be interested in a regular gardening chat group where we can share our experiences and challenges in growing plants? Indoor, outdoor, all welcome!
I envisage something similar to the weekly minecraft threads, but less square potatoes or pixelated flowers, and something more tangible (not to knock the minecraft threads, I love them, and they are my inspiration for this!)

29 comments

Chiasmic

March 9

39 votes
Adding live reload to a static site generator written in Go

~comp Article 1393 words

2 comments

chrt.dev

3 days ago

12 votes
Saturday Night Live UK is here - but can it make you laugh?

~tv Article 114 words

13 comments

BBC

2 days ago

20 votes
Minecraft Dungeons II | Announce trailer

~games Video 0:43

1 comment

YouTube: Minecraft

2 days ago

17 votes
Listen to the music... and find the title of the movie in less than sixty seconds

~movies Link

2 comments

movie-music-quiz.com

1 day ago

14 votes
The Dear Hunter - Sunya (2026)
~music
- rock.progressive
Video 42:27
3 comments

Bauke

3 days ago

7 votes
Hädangången – En Vals Till Döden (2026)
~music
- metal.black
Video 6:56, published Feb 28 2026
0 comments

mycketforvirrad

1 day ago

5 votes
Your daily coffee may be protecting your brain, 43-year study finds

~health.mental Article 1000 words

40 comments

sciencedaily.com

5 days ago

33 votes
Afroman emerges victorious in ‘Lemon Pound Cake’ defamation case

~music Article

12 comments

cfabbro

2 days ago

63 votes
Mining the deep ocean: Policymakers debate if we even need deep ocean mining and if we can do it safely
~enviro
- pollution.water
Article 2031 words
2 comments

Ars Technica

2 days ago

6 votes
Quentin Tarantino and Sylvester Stallone are teaming for a 1930s-set series filming in black and white with “1930s cameras”

~tv Article 200 words

24 comments

tmz.com

4 days ago

14 votes
The miracle cure for sickle cell is now two years old. Most are still waiting.
~health
- healthcare
Article
1 comment

Politico

1 day ago

8 votes
‘Project Hail Mary’ full of grace: $80m US opening breaks records for Amazon MGM Studios, Phil Lord and Christopher Miller and March non-franchise pic

~movies Article 1583 words

0 comments

Deadline

2 days ago

22 votes
The spread of solar panels in rural areas has become a divisive issue among Danish voters
~enviro
- energy.renewable
Article 1025 words
9 comments

The Guardian

3 days ago

19 votes
Sweden's old‑growth natural forests store 83% more carbon than managed woodlands – new study
~enviro
- energy
Article 1074 words
1 comment

The Conversation

2 days ago

19 votes
The Amazing Digital Circus - Episode 8: hjsakldfhl
~tv
- spoiler
Video 31:50
3 comments

YouTube: GLITCH

3 days ago

16 votes
2026 Oscar predictions

~movies Ask

Picture: One Battle After Another This became a tight race with Sinners late in the game. The obvious parallel here is 1917 and Parasite. 1917 won PGA, DGA, and BAFTA just like OBAA did. Parasite...

Picture: One Battle After Another

This became a tight race with Sinners late in the game. The obvious parallel here is 1917 and Parasite. 1917 won PGA, DGA, and BAFTA just like OBAA did. Parasite won SAG Ensemble and went on to win Picture from there.

I’m betting on OBAA being stronger than 1917. Notably 1917 was never in contention to win a Screenplay or an Acting award the way OBAA is. And OBAA is nominated in Film Editing and the front-runner in that category which 1917 missed.

I have correctly predicted every Picture winner since The Shape of Water but this is the first year where I’m in danger of losing that streak.

Director: Paul Thomas Anderson - One Battle After Another

Whether Sinners ends up winning Picture, I think this goes to PTA regardless.

Original Screenplay: Sinners

Adapted Screenplay: One Battle After Another

Lead Actor: Michael B. Jordan - Sinners

After Chalamet shot himself in the foot with his unorthodox press run for Marty Supreme, Jordan is the only one in the category going in with an industry award as the BAFTA awarded a British actor.

I think Chalamet really fucked up his chances to win for a while and will now win well into his 40s or even into his 50s. He should have played it better.

Lead Actress: Jessie Buckley - Hamnet

Supporting Actress: Amy Madigan - Weapons

After winning SAG it seems like it’s heading that way. Mosaku won the BAFTA but she had a homefield advantage. This is also an opportunity to award a veteran actress who never got her due, and will be supported by young people who enjoyed Weapons.

Supporting Actor: Sean Penn - One Battle After Another

Despite not campaigning and not appearing at the majority of the award shows (or perhaps because of that) Penn ended up winning both the SAG and BAFTA. Essentially sleepwalking to his third Oscar.

Original Score: Sinners

Original Song: Golden from KPop Demon Hunters

Sound: F1

Casting: Sinners

Production Design: Frankenstein

Cinematography: One Battle After Another

Sinners was originally the front-runner and would have made history as the first female cinematographer to win. However it lost both the ASC and BAFTA for Cinematography.

Makeup and Hairstyling: Frankenstein

Costume Design: Frankenstein

Film Editing: One Battle After Another

Sinners won the ACE Drama awards and OBAA won the ACE Comedy award. This category used to be correlated with Sound but since the sound categories were merged it has now correlated with Picture.

VFX: Avatar: Fire and Ash

Animated Feature: KPop Demon Hunters

Documentary Feature: The Perfect Neighbor

International Feature: Sentimental Value

3 comments

cloud_loud

March 12

12 votes
Save Point: A game deal roundup for the week of March 22

~games Text 93 words
Add awesome game deals to this topic as they come up over the course of the week! Alternately, ask about a given game deal if you want the community’s opinions: e.g. “What games from this bundle...

Add awesome game deals to this topic as they come up over the course of the week!

Alternately, ask about a given game deal if you want the community’s opinions: e.g. “What games from this bundle are most worth my attention?”

Rules:
- No grey market sales
- No affiliate links
If posting a sale, it is strongly encouraged that you share why you think the available game/games are worthwhile.

All previous Save Point topics

If you don’t want to see threads in this series, add save point to your personal tag filters.
0 comments

Scheduled topic

2 days ago

2 votes
Academy Award winner - The Girl Who Cried Pearls | Full film

~movies Video 17:37, published Dec 15 2025

6 comments

YouTube: NFB

2 days ago

7 votes
Game testers wanted for science fiction game

~games Ask

I have a bare bones prototype of a game made in twine and I will be honest it needs a lot of work. The story and main architecture of the game is already planned and I am happy with it. It is the...

I have a bare bones prototype of a game made in twine and I will be honest it needs a lot of work.

The story and main architecture of the game is already planned and I am happy with it. It is the story hooks and pathing that I am looking to improve and for that I would like to give out a early Alpha build for volunteers to critique and provide any dead ends, errors and story beats they find engaging.

Please feel free to send a message if you would like to participate. Thank you for your time.

Edit: Thank you for your interest in the game the final build should be ready for volunteers in one week. I will send links to you directly at that time. Thank you again for your interest this is much better than I hoped for.

32 comments

Humblemonk33

March 11

42 votes
Deadnate – Guilt & Sorrow (2026)
~music
- metal.groove
- metal.progressive
Video 5:59, published Feb 27 2026
1 comment

mycketforvirrad

2 days ago

3 votes
Nicholas Brendon, ‘Buffy the Vampire Slayer’ star, dies at 54

~tv Article 727 words

8 comments

The Hollywood Reporter

2 days ago

18 votes
MST3K - KTMA episode 3 Star Force recovered and digitized

~tv Video 1:42:41

1 comment

YouTube: Arthur Putie

2 days ago

13 votes
Woman diagnosed with sickle cell at two months old wakes up without pain for the first time after new treatment
~health
- medicine
Link
1 comment

people.com

2 days ago

16 votes
At twenty airports in the United States, security screening is handled not by the Transportation Security Administration, but by private companies — and their checkpoints aren’t seeing long lines

~transport Article

1 comment

CNN

2 days ago

22 votes
Asia turns back to coal as war chokes off natural gas
~enviro
- energy
Article
7 comments

The New York Times

4 days ago

14 votes
Babylon 5 S01E04: "Born To The Purple" - Episode Discussion

~tv Video 44:36

9 comments

YouTube: ClipZone: Beyond Infinity

5 days ago

15 votes
What are you reading these days?

~books Ask (survey)

What are you reading currently? Fiction or non-fiction or poetry, any genre, any language! Tell us what you're reading, and talk about it a bit.

16 comments

Scheduled topic

4 days ago

16 votes
Elon Musk found liable to Twitter shareholders in fraud lawsuit over $44 billion takeover
~tech
- social media
Article 633 words
3 comments

Reuters

2 days ago

33 votes
Weekly thread for casual chat and photos of pets

~life.pets Ask (survey)

This is the place for casual discussion about our pets. Photos are welcome, show us your pet(s) and tell us about them!

0 comments

Scheduled topic

2 days ago

4 votes
Why are we still doing this?

~tech Article 7035 words

35 comments

wheresyoured.at

4 days ago

37 votes
Erling Haaland's struggles in front of goal come from playing in "the most difficult position on the planet", says Manchester City manager Pep Guardiola

~sports.football Article 213 words, published Mar 15 2026

0 comments

BBC

2 days ago

5 votes
Jingle dress dancer blends culture and activism for meaningful messages
~design
- fashion
Article 367 words
1 comment

wjfw.com

2 days ago

7 votes
Ready for "the real McCoy"? It was designed to exploit Prohibition, and get smuggled into America by a man who became a legend.
~food
- history
- drinks
Video 14:00
1 comment

YouTube: Whiskey Tribe

2 days ago

5 votes
Tildes Minecraft Weekly

~games Text 270 words
Server host: tildes.nore.gg (Running Java 1.21.11) Verification site: https://tildes.nore.gg BlueMap: https://tildes.nore.gg/map/ Patreon: https://www.patreon.com/TildesMC Plugins and Data Packs...

Server host: tildes.nore.gg (Running Java 1.21.11)
Verification site: https://tildes.nore.gg
BlueMap: https://tildes.nore.gg/map/
Patreon: https://www.patreon.com/TildesMC
Plugins and Data Packs
Data Packs:
- Terralith - Overworld terrain upgrade
- Nullscape - End terrain upgrade
- Age Lock [Vanilla Tweaks]
- Armor Statues [Vanilla Tweaks]
- Bat Membranes [Vanilla Tweaks]
- Cauldron Concrete [Vanilla Tweaks]
- Cauldron Mud [Vanilla Tweaks]
- Custom Nether Portals [Vanilla Tweaks]
- Husks Drop Sand [Vanilla Tweaks]
- Mini Blocks [Vanilla Tweaks]
- More Mob Heads [Vanilla Tweaks]
- Player Head Drops [Vanilla Tweaks]
- Silence Mobs [Vanilla Tweaks]
- Wandering Trades [Vanilla Tweaks]
Plugins:
- BlueMap - Provides a live 3D rendering of the game world
- Clickable Links - Makes http URLs in chat clickable (only for registered players)
- CoreProtect - Records all block/container/mob changes (Anyone can look up changes with /co inspect)
- DebugStick - Gives the ability to craft debug sticks in survival
- DistantHorizons - Provides distant LOD map data to players running the client mod
- EasyArmorStands - GUI for editing armor stands
- Hexnicks - Enables Tildes usernames to be displayed
- hsrails - Allows for 4x speed rail travel
- LuckPerms - Locks down unregistered users
- Otherside - Fix for mob farms involving Nether portals
- Rapid Leaf Decay - Increases the speed of leaf decay by 10x
- WorldEdit - Used for occasional admin stuff
- WorldGuard - Prevents unregistered users from changing anything in the world
The server operates on a soft whitelist. Anyone can log in and walk around, but you need a Tildes account to gain build access.

We recommend you install our mod web-chat so that you can chat while in your web browser. It turns the server into an old-school chat room.

<- Previous Thread - Next Thread ->
31 comments

teaearlgraycold

March 14

18 votes
Paintings of paintings
~arts
- painting
Article 224 words, published Feb 8 2026
1 comment

francescrossley.com

3 days ago

11 votes
Chuck Norris dies aged 86

~movies Article 126 words

17 comments

BBC

3 days ago

43 votes
The (illegal) nuclear reactor buried under Greenland

~humanities.history Video 21:42

1 comment

YouTube: Real Engineering

2 days ago

10 votes
Miles Caton - Don't Hate Me (2026)

~music Video 3:16

1 comment

cfabbro

2 days ago

3 votes
The world's most annoying road | Map Men

~transport Video 13:22

1 comment

YouTube: Jay and Mark

3 days ago

10 votes
Sekiro: No Defeat | Official trailer 2

~anime Video 1:50

7 comments

YouTube: Crunchyroll

6 days ago

18 votes
Wordle on first guess?

~games Ask

Has anyone solved the Wordle at wordleunlimited.org today? I just got the answer on my first try, which seems unlikely to say the least. I think it could be a malfunction. Would anyone mind...

Has anyone solved the Wordle at wordleunlimited.org today? I just got the answer on my first try, which seems unlikely to say the least. I think it could be a malfunction. Would anyone mind posting the answer under a spoiler box so I can check? Thanks!

Update: Thanks, folks! It seems like the site uses a different word for each person. It also seems like it's working fine. So, go me! Or it was a one-off problem. My word was AUDIT, btw.

6 comments

thereticent

3 days ago

7 votes
I tried ranking my albums out of five stars - I think I've gotten it wrong. Thoughts?

~music Ask (advice)
TLDR/Warning: this might be a tedious read. But I'm curious if I could have gone about rating my albums better. I tend to simply either favourite an album or not. The idea of giving albums and...

TLDR/Warning: this might be a tedious read. But I'm curious if I could have gone about rating my albums better.

I tend to simply either favourite an album or not. The idea of giving albums and tracks marks out of five stars seems tedious, difficult to match to how I feel and just doesn't match how my head works. But my collection has grown over the decades and I've been bed bound a lot lately, so I'm trying to organise/categorise based on my feelings towards the albums rather than genre. I'm also hoping to rejuvenate my old interest in music (playing in a band and recording for a living took the shine off of casual listening for me). I thought it would be an interesting experiment to try out, so I rated songs from over 50 albums.

I came up with a rigid and hopefully balanced definition for each rating:
- 1 star - Dislike. I hope I never hear this song again (but I'll keep it purely because it's part of the album)
- 2 stars - Neutral. It doesn't annoy me, but it's too generic to be interesting
- 3 stars - Sometimes this song hits the spot.
- 4 stars - This song usually hits the spot.
- 5 stars - This song always hits the spot.
Then I rate the album out of five stars based on the average of the song ratings. The result is that no albums got 5 stars, a seven got 4 stars, the vast majority of albums are rated at 3 or 2 stars. Even among the 3-star albums, some I like much more than others depending on whether they contain mostly consistent 3-star songs or half 4-star songs and half 2-star songs.

I wonder if the lack of 5-star albums is because of the definitions I gave each of the 1-5-star possibilities. For example, I don't know if any song "always hits the spot". Or maybe it's just that I'm not as into my music as I used to be.

Anyway, I thought maybe people interested in music and data might have thoughts on going about this differently. It's worth asking before I do the next 1,000 albums :) Maybe you'd define each of the 5 stars differently. Any takers?

Edit: thanks to everyone for reading all this and commenting their thoughts. I have a system I'm happy with now, but always happy to continue to chat with fellow (and reluctant) pedants about this.
35 comments

FarraigePlaisteach

March 16

13 votes

Prev Next

<- Previous Thread - Next Thread ->