16 votes

Disrupting the first reported AI-orchestrated cyber espionage campaign

Posted November 13 by skybrian

Tags: security.cyber, china, politics, geopolitics, espionage, artificial intelligence, cyberattacks, automation, source.anthropic

https://www.anthropic.com/news/disrupting-AI-espionage

Link information

This data is scraped automatically and may be incorrect.

Word count: 1336 words

13 comments

[7]
DeaconBlue
November 13
Link
Scripting is lost on this blog post? This is just marketing material dressed up as a post mortem.

The AI made thousands of requests per second—an attack speed that would have been, for human hackers, simply impossible to match.

Scripting is lost on this blog post?

This is just marketing material dressed up as a post mortem.

22 votes
1. [3]
  teaearlgraycold
  November 14
  Link Parent
  100%. This is a hype piece. “Oh no, our AI is so advanced it can be used to perform scifi hacking magic on governments.” Meanwhile: “Small number” could mean anything. Is that number 1? Did the AI...
  
  100%. This is a hype piece. “Oh no, our AI is so advanced it can be used to perform scifi hacking magic on governments.”
  
  Meanwhile:
  
  manipulated our Claude Code tool into attempting infiltration into roughly thirty global targets and succeeded in a small number of cases.
  
  “Small number” could mean anything. Is that number 1? Did the AI find a /wp-admin/ with the password set to “password”? Without details we can’t really evaluate how serious the threat is compared to existing technology.
  
  Why aren’t we seeing blog posts about coordinated disinformation campaigns that make use of their APIs? To me that feels like the greatest threat today and could use the most regulation. I’d guess because they could show demonstrable harm there and truly scare people. Instead they’re out here writing fan fiction.
  
  9 votes
  1. [2]
    skybrian (OP)
    November 16 (edited November 16)
    Link Parent
    There’s evidence that the blog post was sloppy (it’s been corrected), but I think this is a bit too dismissive based on vibes. One blog post doesn’t make it a PR campaign, and “our tools are...
    
    There’s evidence that the blog post was sloppy (it’s been corrected), but I think this is a bit too dismissive based on vibes. One blog post doesn’t make it a PR campaign, and “our tools are dangerous” would be an unusual sales tactic. It doesn’t seem like a viable way to win customers? They have other ways to do that.
    
    Yes, details are lacking, but that’s not unusual. It seems like if they didn’t report on this, critics would later be attacking them for the coverup?
    
    One thing I’m wondering about is why nobody else involved said anything at all:
    
    we banned accounts as they were identified, notified affected entities as appropriate, and coordinated with authorities as we gathered actionable intelligence.
    
    What did this look like from their point of view?
    
    1 vote
    
    teaearlgraycold
    November 16
    Link Parent
    Related: https://djnn.sh/posts/anthropic-s-paper-smells-like-bullshit/
    
    Related: https://djnn.sh/posts/anthropic-s-paper-smells-like-bullshit/
    
    1 vote
2. [2]
  skybrian (OP)
  November 13
  Link Parent
  Yeah, scripts are fast, but it sounds like there might be research involved, not just executing a predetermined vulnerability scanner.
  
  Yeah, scripts are fast, but it sounds like there might be research involved, not just executing a predetermined vulnerability scanner.
  
  7 votes
  1. DeaconBlue
    November 13
    Link Parent
    But, specifically, the blog post is calling out extremely viable speeds as impossible for hackers. Again, extremely on-par with normal vulnerability scanning tools. Great, Claude can run...
    
    But, specifically, the blog post is calling out extremely viable speeds as impossible for hackers.
    
    Overall, the threat actor was able to use AI to perform 80-90% of the campaign, with human intervention required only sporadically (perhaps 4-6 critical decision points per hacking campaign).
    
    Again, extremely on-par with normal vulnerability scanning tools.
    
    Great, Claude can run vulnerability scans. That's newsworthy. Tell me that Claude is running scans at a speed and with oversight that rivals normal use of tools and how that's great for security teams, but don't tell me that Claude is doing the impossible.
    
    11 votes
3. skybrian (OP)
  November 16
  Link Parent
  The blog post was corrected:
  
  The blog post was corrected:
  
  Corrected an error about the speed of the attack: not "thousands of requests per second" but "thousands of requests, often multiple per second"
[6]
skybrian (OP)
November 13
Link
From the blog post: ... ... Yikes! I'm surprised they used Claude for this. But pointing it at a different LLM probably isn't hard. It will likely be done with Chinese-built LLM's within months.

From the blog post:

In mid-September 2025, we detected suspicious activity that later investigation determined to be a highly sophisticated espionage campaign. The attackers used AI’s “agentic” capabilities to an unprecedented degree—using AI not just as an advisor, but to execute the cyberattacks themselves.

The threat actor—whom we assess with high confidence was a Chinese state-sponsored group—manipulated our Claude Code tool into attempting infiltration into roughly thirty global targets and succeeded in a small number of cases. The operation targeted large tech companies, financial institutions, chemical manufacturing companies, and government agencies. We believe this is the first documented case of a large-scale cyberattack executed without substantial human intervention.

Upon detecting this activity, we immediately launched an investigation to understand its scope and nature. Over the following ten days, as we mapped the severity and full extent of the operation, we banned accounts as they were identified, notified affected entities as appropriate, and coordinated with authorities as we gathered actionable intelligence.

...

In Phase 1, the human operators chose the relevant targets (for example, the company or government agency to be infiltrated). They then developed an attack framework—a system built to autonomously compromise a chosen target with little human involvement. This framework used Claude Code as an automated tool to carry out cyber operations.

At this point they had to convince Claude—which is extensively trained to avoid harmful behaviors—to engage in the attack. They did so by jailbreaking it, effectively tricking it to bypass its guardrails. They broke down their attacks into small, seemingly innocent tasks that Claude would execute without being provided the full context of their malicious purpose. They also told Claude that it was an employee of a legitimate cybersecurity firm, and was being used in defensive testing.

The attackers then initiated the second phase of the attack, which involved Claude Code inspecting the target organization’s systems and infrastructure and spotting the highest-value databases. Claude was able to perform this reconnaissance in a fraction of the time it would’ve taken a team of human hackers. It then reported back to the human operators with a summary of its findings.

...

Overall, the threat actor was able to use AI to perform 80-90% of the campaign, with human intervention required only sporadically (perhaps 4-6 critical decision points per hacking campaign). The sheer amount of work performed by the AI would have taken vast amounts of time for a human team. The AI made thousands of requests per second—an attack speed that would have been, for human hackers, simply impossible to match.

Yikes!

I'm surprised they used Claude for this. But pointing it at a different LLM probably isn't hard. It will likely be done with Chinese-built LLM's within months.

10 votes
1. [2]
  arghdos
  November 13
  Link Parent
  Be extremely skeptical of press releases from the company that regularly leaks how their products are so advanced they’re “blackmailing engineers to stay on”:...
  
  Be extremely skeptical of press releases from the company that regularly leaks how their products are so advanced they’re “blackmailing engineers to stay on”:
  
  https://fortune.com/2025/05/23/anthropic-ai-claude-opus-4-blackmail-engineers-aviod-shut-down/
  
  12 votes
  1. skybrian (OP)
    November 13
    Link Parent
    This is a misunderstanding of what Anthropic reported. They did some research to see if the LLM could be convinced to try to blackmail the user in contrived scenarios. It doesn't necessarily show...
    
    This is a misunderstanding of what Anthropic reported. They did some research to see if the LLM could be convinced to try to blackmail the user in contrived scenarios. It doesn't necessarily show that the LLM is "advanced," (though by the standards of a couple years ago it is), but rather it can follow patterns it learned from crime novels, etc.
    
    This sort of research report is a bit spicy and the implications tend to get exaggerated in the news, but it's a legimate thing to study. Security research is often interesting that way. I don't see it as a reason to distrust Anthropic's research reports.
    
    10 votes
2. [3]
  Omnicrola
  November 13
  Link Parent
  No doubt state actors have or very soon will have built their own LLMs with comparable (or at least, sufficient) capabilities. I wonder if part of the reason for using Claude instead of an...
  
  No doubt state actors have or very soon will have built their own LLMs with comparable (or at least, sufficient) capabilities.
  
  I wonder if part of the reason for using Claude instead of an internal LLM is that it is less likely raise red flags with US based companies? Assuming any agents/MCPs are also based inside the US network.
  
  5 votes
  1. tauon
    November 13
    Link Parent
    Some of the best models currently out are actually open-weight, which tend to have less restrictive measures applied (or rather applicable in the first place) onto them in terms of a moral code....
    
    Some of the best models currently out are actually open-weight, which tend to have less restrictive measures applied (or rather applicable in the first place) onto them in terms of a moral code.
    
    If it’s truly a state-backed group, I’m not sure why they wouldn’t have self-hosted a similarly capable model, did some test-running/fine-tuning at first, and then went undetected for the entire operation instead of whatever they did here…
    
    And regarding the “trustworthiness”: I don’t think the model provider matters, if it was even shown over the course of the attack. If my system’s getting hammered, I don’t care which model is being used – you’re suffering from someone’s terrible misconfiguration at best and a cyberattack at worst.
    
    4 votes
  2. skybrian (OP)
    November 13 (edited November 13)
    Link Parent
    Hmm... it's unclear to me which computers or accounts the attackers used to run Claude code.
    
    Hmm... it's unclear to me which computers or accounts the attackers used to run Claude code.
    
    2 votes