33 votes

If Claude Fable stops helping you, you'll never know

Posted June 10 by hungariantoast

Tags: artificial intelligence, language models.large, claude.fable, vibecoding, anthropic, author.jonathon ready, source.jonready

https://jonready.com/blog/posts/claude-fable5-is-allowed-to-sabotage-your-app-if-youre-a-competitor.html

Link information

This data is scraped automatically and may be incorrect.

Word count: 445 words

8 comments

[3]
Tiraon
June 10
Link
In effect they simply decide to just silently give you nonsense in a random field they don't like you exploring. I don't particularly like the model refusing to do something but if it at least...

In effect they simply decide to just silently give you nonsense in a random field they don't like you exploring.
I don't particularly like the model refusing to do something but if it at least tells you then fine.
This is unlikely to affect even most people who will be using Fable 5 but the problem is transparency and this making it impossible even as normal user to have any trust in an output of a tool whose accuracy was already problematic.
And they can decide to do this to absolutely any prompt.

The scenario is follows:
I pay a company to access their compute and violate their ToS. The sane outcome is followup proportionate reaction and refusal to allow the access.
The reaction they are saying they will do is to bill as usual and output nonsense. What I am missing that makes this legal?

18 votes
1. [2]
  vord
  June 10
  Link Parent
  This is America. Everything is legal if you have enough money.
  
  What I am missing that makes this legal?
  
  This is America. Everything is legal if you have enough money.
  
  20 votes
  1. d32
    June 11
    Link Parent
    That's The Golden Rule! Whoever Has the Gold Makes the Rules
    
    That's The Golden Rule!
    Whoever Has the Gold Makes the Rules
    
    6 votes
[2]
skybrian
June 11
Link
Anthropic Walks Back Policy That Could Have ‘Sabotaged’ AI Researchers Using Claude (Wired) … …

Anthropic Walks Back Policy That Could Have ‘Sabotaged’ AI Researchers Using Claude (Wired)

“We’re changing Fable 5’s safeguards for frontier LLM development to make them visible,” Anthropic said in a statement to WIRED. “We made the wrong tradeoff and we apologize for not getting the balance right.”

…

Anthropic now says it’s changing course, and that Claude Fable 5’s safeguards for AI development will be visible to users. If the company suspects a user is trying to use Claude to build a highly capable AI it will alert them that it’s either refusing the request, or rerouting the user to a less capable model.

Anthropic reversed the policy after it received fierce backlash from the AI research community. Anthropic has already taken steps to limit competitors from using Claude to build closed and open source AI models, but critics say that quietly degrading the model’s performance for certain users went a step too far. Claude’s coding agent has become a favored tool among developers, including those working on open-source AI research projects, and researchers tell WIRED that the company’s latest policy could have led to a troubling future in which only a handful of leading AI labs could perform advanced AI research.

…

“These safeguards prevent foreign adversaries from using our most capable models in ways that pose severe safety risks. The US and its allies hold an edge in frontier chips and the highly optimized software that runs them at full potential,” the company said in a statement to WIRED. “These safeguards ensure Claude isn't used to erode that advantage—by optimizing chips developed by those adversaries, for example […] In deciding whether to make them visible or invisible we faced a choice. A hidden safeguard is harder to probe and work around. This means the safeguards can be targeted much more narrowly.”

Anthropic says that because this safeguard around AI development is now visible, it needs to cast a wider net, meaning more benign requests may trigger its safeguards. The company says it’s working to make its classifiers more precise as quickly as possible.

9 votes
1. unkz
  June 11
  Link Parent
  This was a pretty distressing policy, especially after seeing how bad they are at detecting intent in other areas of Fable’s safeguards. Who knows how many of my queries would have been mistagged...
  
  This was a pretty distressing policy, especially after seeing how bad they are at detecting intent in other areas of Fable’s safeguards. Who knows how many of my queries would have been mistagged and therefore sabotaged without my knowing?
  
  Definitely motivating me to experiment more with Qwen code so I can run it all local.
  
  3 votes
Macil
June 10
Link
If they're aiming for the level of effectiveness of their next best model (Opus) on the limited tasks, then I don't think is too different from them keeping the best models for themselves (which...

If they're aiming for the level of effectiveness of their next best model (Opus) on the limited tasks, then I don't think is too different from them keeping the best models for themselves (which is a thing I have complex feelings about, but I can't imagine not doing that in their shoes if I thought there was a chance of AI being good enough to start doing recursive self improvement). Though they did fumble the communication hard by not emphasizing this if they are in fact aiming for Opus's level of effectiveness.

I do expect to see more of the top labs cautiously holding back their best stuff as AI gets better at programming.

6 votes
[2]
hungariantoast (OP)
June 10
Link
https://lobste.rs/s/f2fwny/if_claude_fable_stops_helping_you_you_ll

https://lobste.rs/s/f2fwny/if_claude_fable_stops_helping_you_you_ll

5 votes
1. ebonGavia
  June 10
  Link Parent
  What's old is new again.
  
  BenjaminRi avatar BenjaminRi 8 hours ago
  
  Reflections on Trusting Trust, revisited for AI.
  
  What's old is new again.
  
  5 votes