When AI builds itself — progress toward recursive self-improvement and its implications - ~tech

[8]

Grzmot

June 5

Link

Considering Anthropic is currently ahead of the AI game, it's no surprise they're trying again with their grandstanding posturing about safety. Just this time to lenghten the time period they are...

Considering Anthropic is currently ahead of the AI game, it's no surprise they're trying again with their grandstanding posturing about safety. Just this time to lenghten the time period they are ahead by suggesting a moratorium.

They've been caught publically advocating for increased regulation and then lobbying with actual money against said regulation already. I don't believe a thing they say.

22 votes

[3]
archevel
June 5
Link Parent
A very casual search didn't bring up any immediate results about anthropic Most of the results seem to indicate they support lobbying for increase regulation. Again, I just did a very cursory...
- Exemplary
A very casual search didn't bring up any immediate results about anthropic

lobbying with actual money against said regulation already.

Most of the results seem to indicate they support lobbying for increase regulation. Again, I just did a very cursory search, but do you have any references?

13 votes
1. [2]
  Grzmot
  June 6
  Link Parent
  Apologies, I conflated Anthropic's earnest desire to jump into bed with the military and their lobbying efforts, which so far appear fairly small. That being said, anyone that's trying to be a...
  
  Apologies, I conflated Anthropic's earnest desire to jump into bed with the military and their lobbying efforts, which so far appear fairly small.
  
  That being said, anyone that's trying to be a supplier to the DoD can get bent when it comes to talking about safety. Anthropic has opened the nitro valve on AI development by competing with OpenAI at breackneck speeds. And they were very happy to supply the DoD as well, until Trump got angry at them in one of his usual toddler fits of rage. In the words of of the Anthropic CEO:
  
  As we wrote on Thursday, we are very proud of the work we have done together with the Department [of War, their phrasing, not mine], supporting frontline warfighters with applications such as intelligence analysis, modeling and simulation, operational planning, cyber operations, and more.
  
  They aren't just working to make the US military more efficient and better, read, deadly, they are doing it very specifically to the US military under Trump. Those are only the things they admit to in public, or even can. I suspect a lot of those uses are classified. :^)
  
  6 votes
  1. post_below
    June 7
    Link Parent
    I think you may have missed a few parts of the story. Anthropic was indeed providing inference to the DoD, but then right after Claude was used sucessfully to help capture Maduro (note that last...
    
    I think you may have missed a few parts of the story. Anthropic was indeed providing inference to the DoD, but then right after Claude was used sucessfully to help capture Maduro (note that last part is just "rumored"), Hegseth demanded they remove restrictions on use and allow "any lawful use" where what was and wasn't lawful would of course be up to Hegseth and the DoD to determine. That part is public record. Anthropic refused to budge on their "red lines" of no mass domestic surveillance and no automated weapons. They continued to hold the line even when Hegseth threatened to designate them a supply chain risk, which at the time seemed absurd as the designation was created for foreign threats and had never been used on a domestic company.
    
    Except he actually followed through and did it when Anthropic wouldn't give in. Meaning that now Anthropic can't be used by the DoD or by any contractor while working with the DoD. It was shocking, at least to me, as I expected them to roll over so they could keep getting those sweet government (and government contractor) paychecks. It's very clear that, whatever you think of their ethics, they genuinely believe in them. At least for now. You can argue that their safety stance is flawed, but it's difficult to make the case that it's posturing.
    
    After the designation, within days of the DoD and Anthropic deal falling apart, Open AI stepped in and agreed to let the DoD use their inference however they want.
    
    19 votes
[4]
skybrian (OP)
June 5
Link Parent
It sounds like you will be skeptical no matter what they do?

It sounds like you will be skeptical no matter what they do?

7 votes
1. [3]
  omid
  June 5
  Link Parent
  That’s a pretty specific reason for skepticism. It’s not the same as saying “I will be skeptical no matter what they do.”
  
  That’s a pretty specific reason for skepticism. It’s not the same as saying “I will be skeptical no matter what they do.”
  
  19 votes
  1. [2]
    skybrian (OP)
    June 5
    Link Parent
    Hmm. Okay, let’s hear the details on what they allegedly did.
    
    Hmm. Okay, let’s hear the details on what they allegedly did.
    
    3 votes
    
    Grzmot
    June 6
    Link Parent
    See my comment here: https://tildes.net/~tech/1uiu/when_ai_builds_itself_progress_toward_recursive_self_improvement_and_its_implications#comment-i0py
    
    See my comment here: https://tildes.net/~tech/1uiu/when_ai_builds_itself_progress_toward_recursive_self_improvement_and_its_implications#comment-i0py
    
    1 vote

[14]

kmcgurty1

June 5 (edited June 5)

Link

It feels insane to me this article completely glosses over how bad of a metric lines of code is. If anything, it shows how much slop is being dumped into the code base. The fact that all of the AI...

A caveat: Lines of code is an imperfect measure, as it measures quantity over quality. So 8× lines of code/engineer/day in the second quarter of 2026 is almost certainly an overstatement of the true productivity gain. Nonetheless, it indicates an acceleration. At Anthropic, we don’t reward people for how many lines of code they write; rather, team members are producing more code simply because they’re using AI systems to write more code.
The increase in lines of code written lines up with subjective impressions of large productivity increases.

It feels insane to me this article completely glosses over how bad of a metric lines of code is. If anything, it shows how much slop is being dumped into the code base. The fact that all of the AI CEOs equate it to productivity is just crazy.

The graphic they show says 8x (obscuring the real number?) LOC since 2025, but I can guarantee you there isn't 8x as much productivity.

Edit:

Another thing I noticed:

We believe it would be good for the world to have the option to slow or temporarily pause frontier AI development to enable societal structures and alignment research to keep up with the advance of the technology.

How much of this is actually because "progress" is stalling?

17 votes

[5]
Staross
June 5
Link Parent
If anything that increase in code volume would worry me, unless it's linked to a corresponding number of products you can actually sell (maybe it's the case though, they are doing a lot of stuff)....

If anything that increase in code volume would worry me, unless it's linked to a corresponding number of products you can actually sell (maybe it's the case though, they are doing a lot of stuff). Since it's so easy for these models to write a lot of code, the challenge is more to keep them (and the designs) in check rather than producing more "stuff".

9 votes
1. [3]
  kmcgurty1
  June 5
  Link Parent
  I've seen that sentiment a lot too. That modern programming is mostly about managing a bunch of agents instead of writing code. To me, that completely sucks the fun out of programming. Not only...
  
  I've seen that sentiment a lot too. That modern programming is mostly about managing a bunch of agents instead of writing code. To me, that completely sucks the fun out of programming. Not only are vibe coded projects a nightmare to maintain, you don't get to properly think about the issue you're trying to solve.
  
  5 votes
  1. [2]
    skybrian (OP)
    June 5
    Link Parent
    You can think about it if you want to. I usually ask the coding agent to write a design doc first because I care about how it will be implemented. It will go through multiple drafts. For bug...
    
    You can think about it if you want to. I usually ask the coding agent to write a design doc first because I care about how it will be implemented. It will go through multiple drafts.
    
    For bug fixes, I ask it to explain the bug and proposed fix before giving it the go-ahead to fix it.
    
    3 votes
    
    kmcgurty1
    June 5
    Link Parent
    LLMs have their use. They are tools among many other tools. In the hands of someone that knows how to use them, they are undoubtedly useful. However, it's the people that don't know what they're...
    
    LLMs have their use. They are tools among many other tools. In the hands of someone that knows how to use them, they are undoubtedly useful.
    
    However, it's the people that don't know what they're doing that I have a problem with. The vast majority of people have no business creating production applications. There's a promise of infinite easy wealth that just creates slop websites and blogs. It's just asking for vulnerabilities and PII leaks.
    
    If someone doesn't know how auth works, how can they ensure it's implemented properly? What about payment processing?
    
    I know from first hand experience, many of these vibe coded websites you can get unfettered access because they don't add in basic security checks. The most recent public example is Trump Phone website. Entirely vibe coded, and leaked all of the details of people that pre-ordered one. The Tea breach is another one.
    
    There's so many more issues I have with LLMs that the majority of people aren't talking about, but a buried comment on tildes isn't the place to list them all out.
    
    12 votes
2. skybrian (OP)
  June 5
  Link Parent
  Big tech companies have lots of internal code that is never released as a product. I think it's fair to ask what they're getting for it, but we can't really tell from the outside.
  
  Big tech companies have lots of internal code that is never released as a product. I think it's fair to ask what they're getting for it, but we can't really tell from the outside.
  
  2 votes
[8]
skybrian (OP)
June 5
Link Parent
It seems like the paragraphs you quoted also say that it’s not 8x productivity, and therefore it wasn’t “completely glossed over?” As for the quality of the code, I’ve heard bad things about...

It seems like the paragraphs you quoted also say that it’s not 8x productivity, and therefore it wasn’t “completely glossed over?”

As for the quality of the code, I’ve heard bad things about Claude Code and prefer a different coding agent, but we don’t know about their internal codebase.

3 votes
1. [4]
  kmcgurty1
  June 5
  Link Parent
  Fair. But I think my original point still stands. We've seen the code leaked from Claude Code to prove their codebase is trash. These companies desperately need LLMs to work out for their...
  
  Fair. But I think my original point still stands. We've seen the code leaked from Claude Code to prove their codebase is trash.
  
  These companies desperately need LLMs to work out for their investors. They've lied time and time again, yet people still trust what they have to say.
  
  5 votes
  1. [3]
    skybrian (OP)
    June 5
    Link Parent
    Some people claimed that the code was trash, but is that really proven? I didn't follow it all that closely, but there was a claim that using a regular expression was somehow wrong and it seemed...
    
    Some people claimed that the code was trash, but is that really proven? I didn't follow it all that closely, but there was a claim that using a regular expression was somehow wrong and it seemed more like a hot take.
    
    2 votes
    
    [2]
    cdb
    June 5
    Link Parent
    I've never in my life seen a positive comment on the internet about any large codebase. Any leaked code from any company is always universally panned. This tracks with most general sentiments I've...
    
    I've never in my life seen a positive comment on the internet about any large codebase. Any leaked code from any company is always universally panned. This tracks with most general sentiments I've read online though. Any code you see that was written by someone else is trash, including the "you" of several months ago.
    
    13 votes
    
    stu2b50
    June 5
    Link Parent
    There's a lot of bias inherent in the framing as well. Someone posted a painting from Monet on Twitter with the text that, paraphrased, said that they generated it with nanobanana and wanted art...
    
    There's a lot of bias inherent in the framing as well. Someone posted a painting from Monet on Twitter with the text that, paraphrased, said that they generated it with nanobanana and wanted art experts to see how it compared with a real Monet painting.
    
    Of course, it got shellacked. Even though it was actually a real, Monet painting.
    
    That's the power of how things are presented. When it's "controversial company has their code leaked", it's inevitable.
    
    7 votes
2. [3]
  cdb
  June 5
  Link Parent
  What agent are you using? I haven't tried Claude Code and have been using Google's Antigravity, but that's just because I'm getting Google Pro for free. It's working alright. I'm not sure I'd be...
  
  What agent are you using? I haven't tried Claude Code and have been using Google's Antigravity, but that's just because I'm getting Google Pro for free. It's working alright. I'm not sure I'd be able to tell the difference without spending more time than I'm willing to on comparisons.
  1. [2]
    skybrian (OP)
    June 5 (edited June 5)
    Link Parent
    It's a web-based coding agent called Shelley. It's open source. It's being written by developers at exe.dev to run on the Linux VM's they provide. It should run on any Linux VM, but you'd have to...
    
    It's a web-based coding agent called Shelley. It's open source. It's being written by developers at exe.dev to run on the Linux VM's they provide. It should run on any Linux VM, but you'd have to reimplement authentication since it relies on exe.dev's built-in authentication.
    
    A web UI and a cloud-based VM works well for me because I can switch between laptop, tablet, or even my phone, and it doesn't shut down when I put my laptop to sleep.
    
    1 vote
    
    cdb
    June 5
    Link Parent
    Thanks. That sounds interesting. Since Antigravity is just a VS Code fork, I've been syncing with github between devices, assuming enough artifacts are syncing (some of the internal ones aren't...
    
    Thanks. That sounds interesting. Since Antigravity is just a VS Code fork, I've been syncing with github between devices, assuming enough artifacts are syncing (some of the internal ones aren't synced). Might be helpful to just be able to continue with the exact same context.
    
    1 vote

[3]

Omnicrola

June 5

Link

A quote from near the end of the article captures a feeling I've been beginning to feel lately. I think others are feeling it much more acutely. It describes personal existential crisis, which the...

A quote from near the end of the article captures a feeling I've been beginning to feel lately. I think others are feeling it much more acutely.

Work (and life) ran on a gift economy of small favors between humans. ‘Can you help me get this script running?’ [...] each one created a little debt, a little mutual awareness. [Claude is] faster, it creates zero debt, but each of these is a lost bid for human collaboration.”

On days where everything works well, I can’t help but think nothing I do matters, everything is automated and better and faster than I ever will be. But then there are days where everything breaks and I don't understand why and I realize I have no idea what I’ve been up to anymore.”

It describes personal existential crisis, which the article itself doesn't really get into at all. Instead the article is entirely focused on potential societal and species level problems. Digging into those personal disruptions and how people are reacting I think would reveal far more about how the next several years are going to go than pontificating about how companies are going to 10x and then 100x their "productivity".

11 votes

[2]
skybrian (OP)
June 5
Link Parent
I've read about people who have connected a coding agent to Slack (or similar team chat) so that the team works together more, and they seem to like the result. I'm a hobby programmer working...

I've read about people who have connected a coding agent to Slack (or similar team chat) so that the team works together more, and they seem to like the result.

I'm a hobby programmer working alone, so that doesn't really work for me, but I think it shows that we haven't figured out the best way to use these tools and there will be more exploration of different ideas on how to organize the work.

3 votes
1. Omnicrola
  June 5
  Link Parent
  I used to work at a place that practiced paired programming rigorously. It was (and is) part of their core ethos. I've really been wondering lately how they've been adapting and what they've...
  
  I used to work at a place that practiced paired programming rigorously. It was (and is) part of their core ethos. I've really been wondering lately how they've been adapting and what they've figured out about working collaboratively with an AI.
  
  3 votes

skybrian (OP)

June 4

Link

From the article: [...] [...] [...] [...]

From the article:

Claude writes a significant proportion of Anthropic’s code. As of May 2026, more than 80% of the code we merge into Anthropic’s codebase was authored by Claude.3 Before Claude Code launched in research preview in February 2025, this number was in the low single digits. That shift also shows up in the amount of output per engineer. Lines of code merged per engineer per day stayed constant through Anthropic’s first four years (2021-2024), then began to climb upward in 2025 when Claude began to run code rather than just suggesting it for an engineer to copy and paste. The slope steepened again in 2026 when models began to work autonomously over longer time horizons. These two inflection points are shown in the chart below. In the second quarter of 2026, the typical engineer was merging 8× as much code per day as they were in 2024.4 This is because much of the code is written by Claude, with the engineer directing and reviewing, rather than typing it themselves.

[...]

The second criterion is writing code that another engineer can understand and build on. Here the gap between humans and AI persists, but is closing fast. There isn’t full consensus among staff at Anthropic, but many believe that the Claude-written code was still worse in quality than human-written code at Anthropic in late 2025, and is roughly at parity today. We expect it to be better within the year.

This has changed the way that Anthropic now reviews its own code. Proposed changes to our codebase are now read by an automated Claude reviewer that looks for bugs, security flaws, and other defects before it can merge. Using this tool, we ran a retrospective analysis, and found that an automated Claude review of every change to our codebase would have caught roughly a third of the bugs behind past incidents on claude.ai before they ever reached production. The engineers who wrote that code are among the best in the world at building these systems. Claude is now catching the mistakes that they missed.

[...]

Claude is good at running experiments to hit a goal that someone else has set. Every time Anthropic releases a model, we run the same test: we give Claude some code that trains a small AI model, and ask it to make that code run as fast as possible while still passing the same correctness checks. The goal and the success metrics are fixed in advance, so Claude’s job is to find speedups by rewriting the code, running it, timing it, and repeating. It’s a miniature version of an experimental research loop. In May 2025, Claude Opus 4 averaged a ~3x speedup over the starting code. By April 2026, Claude Mythos Preview was achieving ~52x. For calibration, a skilled human researcher would need four to eight hours to reach 4x.7 In this part of the research workflow—optimizing steps within a clearly defined experiment—Claude has gone from super helpful to superhuman in under a year.

[...]

Even if we suppose that Claude never achieves good research taste, a conservative reading of our evidence still implies compounding acceleration. If humans spend most of their time on the single-digit fraction of work that is direction-setting, while Claude handles the rest, that means each engineer or researcher is steering far more work than before. The evidence we see suggests that people at Anthropic are both moving faster and covering a broader surface. In practice, this means that AI already makes Anthropic move much faster than it did before the advent of effective AI tools.

[...]

We believe it would be good for the world to have the option to slow or temporarily pause frontier AI development to enable societal structures and alignment research to keep up with the advance of the technology. The Anthropic Institute will conduct research—in collaboration with many others—and take actions to help build the systems that a credible slowdown or pause would require. These systems would enable frontier AI developers to verify that others globally have actually stopped or slowed, and that a bad actor could not use the auspices of a coordinated slowdown to jump ahead in secret. If such systems existed, we expect that we would slow down or temporarily pause, if other developers at or near the frontier also did so in a verifiable manner.

A meaningful slowdown or pause would require multiple well-resourced labs at or near the frontier, in multiple countries, agreeing to stop under the same conditions. It would also require that each can verify that the others have actually stopped. Due to the unique characteristics of AI systems, the detectability (a lower standard than verifiability) element of this arms control problem is much more challenging than with other technologies. Training runs are far easier to conceal than missile silos, their inputs are general-purpose, and the incentive to defect quietly is enormous, because whoever continues while others pause could inherit the lead. A credible pause also has to specify what triggers it, what lifts it, and who adjudicates.

10 votes

gpl

June 5

Link

A complementary perspective: https://africa.businessinsider.com/news/inside-the-unseen-operation-to-turbocharge-claude-code/lr1wqv5?op=1 Hard to say it is self-improving, in my opinion, when it...

A complementary perspective: https://africa.businessinsider.com/news/inside-the-unseen-operation-to-turbocharge-claude-code/lr1wqv5?op=1

Hard to say it is self-improving, in my opinion, when it also rests on large operations like this.

3 votes

When AI builds itself — progress toward recursive self-improvement and its implications

Link information

27 comments