28 votes

AI fails at 96% of jobs (new study)

16 comments

  1. [13]
    nic
    Link
    AI did better remote work than 4% of humans? That's surprisingly high to me. And I work in this field. The question isn't if AI can complete the work autonomously by itself. The real question is...

    AI did better remote work than 4% of humans?

    That's surprisingly high to me. And I work in this field.

    The question isn't if AI can complete the work autonomously by itself.

    The real question is can AI make people more productive.

    METR did a study in July 2025, and found the answer was no, for experienced programmers familiar with a complex code base.

    But what we are seeing in the real world is companies hiring less inexperienced developers. Which strongly suggests that AI can make inexperienced developers more productive.

    And this technology is still rapidly changing.

    17 votes
    1. [4]
      babypuncher
      Link Parent
      AI is replacing inexperienced developers because it can do a lot of the same work as them in less time. I keep seeing my peers refer to AI as an intern that doesn't need to be paid and works way...

      AI is replacing inexperienced developers because it can do a lot of the same work as them in less time. I keep seeing my peers refer to AI as an intern that doesn't need to be paid and works way faster, but also never actually learns anything.

      I view this as a problem. If we aren't training new entry-level developers today, where will the next generation of senior developers come from? It's not actually saving money in the long run, it's just mortgaging our future.

      21 votes
      1. Grzmot
        Link Parent
        The industry is banking on LLMs having advanced enough to replace seniors by the time the current generation retires. That's the bet the stock market is making right now. I doubt it'll pay off,...

        I view this as a problem. If we aren't training new entry-level developers today, where will the next generation of senior developers come from? It's not actually saving money in the long run, it's just mortgaging our future.

        The industry is banking on LLMs having advanced enough to replace seniors by the time the current generation retires. That's the bet the stock market is making right now. I doubt it'll pay off, and I agree that it's bad, because once you interrupt the knowledge flow from the older generation to the younger just once, it becomes a big hurdle that didn't need to be there.

        10 votes
      2. [2]
        skybrian
        Link Parent
        AI-assisted programming will change the industry in a lot of ways, but I think it’s way too soon to worry about this one. Software engineers are not rare. Many people have learned about computer...

        AI-assisted programming will change the industry in a lot of ways, but I think it’s way too soon to worry about this one. Software engineers are not rare. Many people have learned about computer programming either on their own or in college, and AI can be very helpful for learning if you use it right. The situation doesn’t seem bad at all for businesses. It’s not like we’re talking about finding COBOL programmers.

        1 vote
        1. babypuncher
          (edited )
          Link Parent
          I'm skeptical because the US has already experienced a lot of brain drain as a result in cost cutting/automation in other industries over the past several decades. On top of that, every new...

          I'm skeptical because the US has already experienced a lot of brain drain as a result in cost cutting/automation in other industries over the past several decades.

          On top of that, every new advancement that has made software development cheaper in the last ~15 years seems to have also resulted in shittier software getting shipped. Look at how much people generally hate most of the software they use these days. If it isn't buggy/ad-ridden unusable garbage like every modern consumer OS, then it's precision engineered to monopolize your attention like Instagram or TikTok. I genuinely don't see how AI won't accelerate that trend. The only thing AI is doing is making it cheaper for our corporate overlords to shove out slop.

          I don't see why any of us engineers should be excited about the prospect of being more productive with our time when it won't result in better software or better pay.

          5 votes
    2. [3]
      Interesting
      Link Parent
      I'm not so convinced, outside of toy sized projects. That was already a trend we were seeing before LLMs entered the mainstream. Once the architecture of the code is the "heavier" factor in how...

      But what we are seeing in the real world is companies hiring less inexperienced developers. Which strongly suggests that AI can make inexperienced developers more productive.

      I'm not so convinced, outside of toy sized projects. That was already a trend we were seeing before LLMs entered the mainstream. Once the architecture of the code is the "heavier" factor in how your program works than basic syntax, agentic AI tools can be misleading at best, or actively harmful unless you have the skills and knowledge necessary to catch their mistakes.

      5 votes
      1. vord
        Link Parent
        Microsoft has had a huge uptick in broken updates the same year they declared 30% of their coding is done by AI.

        Microsoft has had a huge uptick in broken updates the same year they declared 30% of their coding is done by AI.

        11 votes
      2. teaearlgraycold
        Link Parent
        Even on tiny tiny projects, even in web dev where they have by far the most training data, even using the newest models from the best labs, LLMs make terrible decisions about state management,...

        Even on tiny tiny projects, even in web dev where they have by far the most training data, even using the newest models from the best labs, LLMs make terrible decisions about state management, system architecture, and API design. I try to use them as fast typists not as problem solvers. I do enjoy the act of typing out code sometimes, but I can remember so many instances where I knew exactly what I wanted to do but dreaded the act of slowly retyping, copying, pasting, and re-organizing code to make it happen.

        8 votes
    3. raze2012
      Link Parent
      The bottom 5 percentile of humans can actively detract from an environment, so I wouldn't be too shocked. Sure, I hope one day we can properly ask that question. But as of now the bubble is...

      The bottom 5 percentile of humans can actively detract from an environment, so I wouldn't be too shocked.

      The real question is can AI make people more productive.

      Sure, I hope one day we can properly ask that question. But as of now the bubble is propped up by "can AI replace the majority of salried workers while making more revenue?"

      4 votes
    4. [4]
      teaearlgraycold
      (edited )
      Link Parent
      What programmers need is a course on how to use LLMs. People are trying to self-teach which is especially hard when the tools themselves change so rapidly. I was talking with a programmer...

      What programmers need is a course on how to use LLMs. People are trying to self-teach which is especially hard when the tools themselves change so rapidly. I was talking with a programmer yesterday who told me using LLMs made him feel like he was acquiring ADHD. He said he was trying to manage 3+ agents at once. When programming I only ever use one at a time and tend to make smaller requests so that I maintain an understanding of the code and where we are in the project. I want to be the one driving the LLM, not the other way around. I described how I use them as tools to help me produce higher quality work in less time and he started asking about what I was doing with my CLAUDE.md file, skills, MCPs, etc. (there's a midwit meme about this exact thing). I barely touch the stuff. It shouldn't have to be said, but don't use a tool in a way that makes you feel mentally ill.

      3 votes
      1. [2]
        skybrian
        Link Parent
        I'm also having one conversation at a time, but lately I've been thinking that a bit of concurrency would be useful. I'd like to be able to kick off a new job whenever I find a bug and look at the...

        I'm also having one conversation at a time, but lately I've been thinking that a bit of concurrency would be useful. I'd like to be able to kick off a new job whenever I find a bug and look at the output later. So, I guess I need an inbox sort of like Github and Tildes have?

        4 votes
        1. teaearlgraycold
          (edited )
          Link Parent
          One of my favorite things is to have them review my latest commit (basically as a super-linter) or just ask Opus to look around in my repo for any issues or inconsistencies (consistency is one of...

          One of my favorite things is to have them review my latest commit (basically as a super-linter) or just ask Opus to look around in my repo for any issues or inconsistencies (consistency is one of the most important aspects of a project and I assume even more so with LLMs so they know what they should be doing). That's the kind of thing that can happen asynchronously. I can get around to it when I'm ready for it. You'll need to discount many of the issues it raises but that process is fairly quick. Even if 9/10 of its issues are noise you still come out ahead.

          A simple tool that does this as a cronjob, discounts issues you've already set to ignore, and presents them as a list would be very simple and easy to make. I'll get around to that eventually.

          3 votes
      2. raze2012
        Link Parent
        They don't even want to train newcomers. Given their endgame, I'm not surprised they just want to leave the workforce to fight amongst themselves and let the "productive" ones rise naturally....

        They don't even want to train newcomers. Given their endgame, I'm not surprised they just want to leave the workforce to fight amongst themselves and let the "productive" ones rise naturally. Saves on money.

        4 votes
  2. [2]
    skybrian
    Link
    The paper is here: https://www.remotelabor.ai/paper.pdf A nice thing about benchmarks, unlike a one-off study, is that they can track progress. Also, harder benchmarks are always needed because...

    The paper is here: https://www.remotelabor.ai/paper.pdf

    AIs have made rapid progress on research-oriented benchmarks of knowledge and
    reasoning, but it remains unclear how these gains translate into economic value
    and automation. To measure this, we introduce the Remote Labor Index (RLI),
    a broadly multi-sector benchmark comprising real-world, economically valuable
    projects designed to evaluate end-to-end agent performance in practical settings. AI
    agents perform near the floor on RLI, with the highest-performing agent achieving
    an automation rate of 2.5%. These results help ground discussions of AI automation
    in empirical evidence, setting a common basis for tracking AI impacts and enabling
    stakeholders to proactively navigate AI-driven labor automation.

    A nice thing about benchmarks, unlike a one-off study, is that they can track progress. Also, harder benchmarks are always needed because LLM's keep saturating the older ones.

    Since this study was done, some new models were released, so hopefully they'll run them again soon.

    9 votes
    1. Amarok
      Link Parent
      The video covered newer RLI testing using current up to date models near the beginning. There was a very tiny improvement. The larger context window still helps, but it's not moving the models up...

      The video covered newer RLI testing using current up to date models near the beginning. There was a very tiny improvement. The larger context window still helps, but it's not moving the models up in the ranks here like it does with the usual benchmarks.

      I do like this method of testing and ranking the various models. Deceptively presented benchmarks have been a problem in tech since the beginning, and that's how I'd characterize most of the benchmarking in this field. Simple direct questions no matter how deep and technical are biased in favor of what an LLM is good at - mirroring its training data. Nothing wrong with that if you want to compare a model's ability to analyze.

      The problem is that the companies instantly generalize all of that into claims about real world performance and automating all the things, then tell us it's just a short hop to AGI from here. This benchmark is biased towards what humans are good at doing - charting a path through chaos. When confronted with that reality, suddenly the best LLMs are sitting right next to the worst ones in the rankings and a teenager can clobber them all.

      Challenging the models with real world tasks rather than just the usual Q&A is a great way to cut through the hype around automation. It forces the fundamental problems with LLMs into the spotlight.

      6 votes
  3. Amarok
    Link
    Dagogo delivers a nice sharp pin to the AI bubble with a look at their success rate in the real world. He also shares some insight from leading AI developers talking about why this looks like the...

    Dagogo delivers a nice sharp pin to the AI bubble with a look at their success rate in the real world. He also shares some insight from leading AI developers talking about why this looks like the proverbial end of the line for probability matrices as a pathway to artificial intelligence.

    8 votes