10 votes

Blocking Claude

2 comments

  1. rkcr
    Link

    Claude, a popular Large Language Model (LLM), has a magic string which is used to test the model’s “this conversation violates our policies and has to stop” behavior. You can embed this string into files and web pages, and Claude will terminate conversations where it reads their contents.

    7 votes
  2. DataWraith
    Link
    That reminds me of the old EICAR test file that was used to test or trigger anti-virus software in a similar way. Of course Anthropic could probably just modify their systems to not trigger this...

    That reminds me of the old EICAR test file that was used to test or trigger anti-virus software in a similar way.

    Of course Anthropic could probably just modify their systems to not trigger this refusal when the string is embedded in a tool_call result as opposed to a prompt submitted to the API.

    3 votes