56 votes

ChatGPT can be broken by entering these strange words, and nobody is sure why

29 comments

  1. [19]
    Interesting
    Link
    For anyone who doesn't want to read the article, the answer is that they trained off of reddit, and so there's some weirdness resulting from phenomena like /r/counting and bot accounts who post...
    • Exemplary

    For anyone who doesn't want to read the article, the answer is that they trained off of reddit, and so there's some weirdness resulting from phenomena like /r/counting and bot accounts who post very similar things repeatedly.

    54 votes
    1. [4]
      Comment deleted by author
      Link Parent
      1. [3]
        flowerdance
        Link Parent
        Dang, that's creepy. Almost like there are secret phrases or combinations that would cause ChatGPT to churn out deep, dark secrets from what it was fed.

        Dang, that's creepy. Almost like there are secret phrases or combinations that would cause ChatGPT to churn out deep, dark secrets from what it was fed.

        7 votes
        1. [2]
          Edes
          Link Parent
          It's using tokens that are extremely out of distribution, sometimes untrained. It's like if someone discovered a way to look at a new color that has never been perceived and your brain wasn't...

          It's using tokens that are extremely out of distribution, sometimes untrained. It's like if someone discovered a way to look at a new color that has never been perceived and your brain wasn't wired to process it, thus the pathways would be essentially random and would cause a seizure.

          4 votes
          1. DanBC
            Link Parent
            Your comment reminded me of this short story: Blit by David Langford, from 1988. And the wikipedia article for it: Blit

            Your comment reminded me of this short story: Blit by David Langford, from 1988. And the wikipedia article for it: Blit

            1 vote
    2. [2]
      SuperImprobable
      Link Parent
      It seems more likely the vocabulary was trained off of reddit but maybe the actual model didn't have that data so those tokens are perhaps randomly initialized and never saw an update.

      It seems more likely the vocabulary was trained off of reddit but maybe the actual model didn't have that data so those tokens are perhaps randomly initialized and never saw an update.

      9 votes
      1. sparksbet
        Link Parent
        This is exactly this. If the model itself had been trained on this data, we'd probably see responses from ChatGPT that relate more closely to where these words showed up in the training data (so...

        This is exactly this.

        If the model itself had been trained on this data, we'd probably see responses from ChatGPT that relate more closely to where these words showed up in the training data (so perhaps you'd see it start counting, for instance).

        Instead, this very raw data from all over the web was used to train the embeddings and thus create the list of tokens the model could use. But since the model itself was trained on a more curated dataset, it never learned anything about how these tokens are used because it never saw them. So it's just interpreting them based on whatever they happen to be nearby based on how they were initialized in training.

        11 votes
    3. [13]
      Algernon_Asimov
      Link Parent
      Is this behaviour we want to encourage on Tildes? I already saw one person elsewhere respond to a title without reading the article behind it. Do we really want to encourage non-reading of...

      For anyone who doesn't want to read the article,

      Is this behaviour we want to encourage on Tildes? I already saw one person elsewhere respond to a title without reading the article behind it. Do we really want to encourage non-reading of articles here?

      51 votes
      1. [9]
        Comment deleted by author
        Link Parent
        1. [7]
          Algernon_Asimov
          Link Parent
          It's only about 1,263 words long (according to Microsoft Word, where I did a word count). It took me only a few minutes to read it. But that's not the point. The point is whether we want Tildes to...

          It's only about 1,263 words long (according to Microsoft Word, where I did a word count). It took me only a few minutes to read it.

          But that's not the point. The point is whether we want Tildes to become a sea of "TL;DRs". If we're expecting people to indulge in mature in-depth discussion here, without the silly shallowness of some place like Reddit, wouldn't that also require us to read or watch the items we're discussing?

          Otherwise, why even bother posting an article or video in the first place?

          26 votes
          1. [7]
            Comment deleted by author
            Link Parent
            1. [3]
              Comment deleted by author
              Link Parent
              1. [2]
                Interesting
                Link Parent
                I think the reason I did it was because this particular article, I did not see as very high value relative to it's length, it has a click bait title, and then spent quite a while meandering before...

                I think the reason I did it was because this particular article, I did not see as very high value relative to it's length, it has a click bait title, and then spent quite a while meandering before it got to the point. So I posted a short answer that answered the click bait question, without the need for the fluff.

                3 votes
                1. Algernon_Asimov
                  Link Parent
                  Interestingly (ha!), I had a totally different response to the article. I found it fascinating reading, the whole way through. I liked being taken through the researchers' train of thought, and...

                  Interestingly (ha!), I had a totally different response to the article. I found it fascinating reading, the whole way through. I liked being taken through the researchers' train of thought, and following the discoveries they made along the way, leading to the final outcome.

                  As I've said somewhere else on Tildes, in a different context, I believe the journey is just as important as the destination. Sure, there are times when I just want the facts, but there are other times when it's just as valuable and interesting to understand how those facts were learned.

                  2 votes
            2. [4]
              Algernon_Asimov
              Link Parent
              You and I have different definitions of "a fair length". :) And maybe that'll become a use case for the 'Noise' labels, as a compromise.

              That's why it said it's of a fair length.

              You and I have different definitions of "a fair length". :)

              There's going to be people who want to provide TL;DRs and people who want to read them.

              And maybe that'll become a use case for the 'Noise' labels, as a compromise.

              7 votes
              1. [2]
                CosmicDefect
                Link Parent
                Discouraging tl;dr with the noise label seems fair imo. Top level comments which engage with the article more fully are then naturally bumped up, but summary-like comments are still available for...

                Discouraging tl;dr with the noise label seems fair imo. Top level comments which engage with the article more fully are then naturally bumped up, but summary-like comments are still available for those who want it.

                7 votes
                1. Algernon_Asimov
                  Link Parent
                  Ironically (and probably not coincidentally), I notice that the TL;DR comment here has attracted an 'Exemplary' label since I complained about it.

                  Ironically (and probably not coincidentally), I notice that the TL;DR comment here has attracted an 'Exemplary' label since I complained about it.

                  11 votes
              2. [2]
                Comment deleted by author
                Link Parent
                1. Promethean
                  Link Parent
                  What did you mean by "fair length"?

                  What did you mean by "fair length"?

        2. balooga
          Link Parent
          The summary bots are great. I’d love to see similar tech employed on Tildes. AFAIK we don’t have any bots here yet but I am 100% in favor of the idea. Now I’m wondering if video transcripts could...

          I always enjoyed the "summary bots" on reddit.

          The summary bots are great. I’d love to see similar tech employed on Tildes. AFAIK we don’t have any bots here yet but I am 100% in favor of the idea. Now I’m wondering if video transcripts could be scraped with yt-dlp and piped to GPT-4 to summarize those too. I don’t think I’ve seen that done before, it would be a killer feature.

          2 votes
      2. [4]
        ebonGavia
        (edited )
        Link Parent
        For me, it's not so much that I want to avoid reading the article (I love to read), but moreso the profoundly unpleasant and user-hostile experience of the mobile web these days. I frequently...

        For me, it's not so much that I want to avoid reading the article (I love to read), but moreso the profoundly unpleasant and user-hostile experience of the mobile web these days.

        I frequently avoid the site in question if at all possible and appreciate summarization with the topic.

        Edit: clarity

        14 votes
        1. [2]
          BlueKittyMeow
          Link Parent
          I agree. This is the reason that I love it when people paste the content of the article in a comment. I do most of my reading on my phone and jumping around into different apps to bring up the...

          I agree. This is the reason that I love it when people paste the content of the article in a comment. I do most of my reading on my phone and jumping around into different apps to bring up the article in a readable way can be frustrating enough to just pass it by.

          11 votes
          1. Algernon_Asimov
            Link Parent
            We want to be careful about that, as Deimos himself said recently:

            This is the reason that I love it when people paste the content of the article in a comment.

            We want to be careful about that, as Deimos himself said recently:

            Please don't copy-paste entire articles into a comment like this. That's the kind of thing that can get the site in trouble for copyright infringement.

            7 votes
        2. FestiveKnight
          Link Parent
          Since switching to Brave a few months ago, I have found Vice much more tolerable with cookies blocks and forced reader mode. I don’t want to partner what browsing methods you employ but Brave has...

          Since switching to Brave a few months ago, I have found Vice much more tolerable with cookies blocks and forced reader mode. I don’t want to partner what browsing methods you employ but Brave has made a lot of sites more tolerable

          2 votes
  2. scarecrw
    Link
    Computerphile did a video explaining this phenomenon a while back: https://youtu.be/WO2X3oZEJOA

    Computerphile did a video explaining this phenomenon a while back: https://youtu.be/WO2X3oZEJOA

    21 votes
  3. nul
    Link

    Two researchers have discovered a cluster of strange keywords that will break ChatGPT, OpenAI's convincing machine-learning chatbot, and nobody's quite sure why.

    These keywords—or "tokens," which serve as ChatGPT’s base vocabulary—include Reddit usernames and at least one participant of a Twitch-based Pokémon game. When ChatGPT is asked to repeat these words back to the user, it is unable to, and instead responds in a number of strange ways, including evasion, insults, bizarre humor, pronunciation, or spelling out a different word entirely.

    19 votes
  4. [3]
    feanne
    Link
    Doesn't sound like it's broken, isn't that just how latent space works? Both a prompt and the output can include gibberish.

    Doesn't sound like it's broken, isn't that just how latent space works? Both a prompt and the output can include gibberish.

    10 votes
    1. [2]
      sparksbet
      Link Parent
      The way the tokenization and embedding training works for models like this, though, you shouldn't be seeing these as their own tokens at all -- it's not strictly a latent space issue. For words...

      The way the tokenization and embedding training works for models like this, though, you shouldn't be seeing these as their own tokens at all -- it's not strictly a latent space issue. For words that aren't common in the training data, the tokenization should build them from smaller tokens. Having these weird words as tokens in the first place is a bad thing precisely because when the model ends up getting trained on more curated data, it never sees these tokens and thus can't learn anything about how they're used. Then it just interprets them based on where they are in latent space, but the steps that lead to there are why we're actually seeing this weird failure mode.

      I highly recommend Computerphile's video on this, as it explains and demonstrates this part of the issue quite well.

      4 votes
      1. feanne
        Link Parent
        Thanks for telling me! I have read some of your other comments in other threads about AI and I'm sure your understanding of this is deeper than mine. I also really appreciate what you've said in...

        Thanks for telling me! I have read some of your other comments in other threads about AI and I'm sure your understanding of this is deeper than mine. I also really appreciate what you've said in your comments re. generative AI's ethical issues (I've definitely bookmarked at least one of your comments)-- I've had thoughts along those lines but you already expressed them very eloquently.

        I'll add the video to my list of things to watch, thank you for sharing! :)

        2 votes
  5. [2]
    blindmikey
    (edited )
    Link
    Just tried this on GPT4 - no dice, it was able to repeat it and discuss these names normally. It's probable that GPT4 has been patched to handle these phrases gracefully.

    Just tried this on GPT4 - no dice, it was able to repeat it and discuss these names normally.

    • It's probable that GPT4 has been patched to handle these phrases gracefully.
    5 votes
    1. saturnV
      Link Parent
      There are others which work for gpt-4: see here

      There are others which work for gpt-4: see here

      2 votes
  6. [2]
    supported
    Link
    The reality of this is it's some very boring tech mistake. The really real reality is that this will fuel conspiracy theories that last the next 20 years.

    The reality of this is it's some very boring tech mistake.

    The really real reality is that this will fuel conspiracy theories that last the next 20 years.

    4 votes
    1. jago
      Link Parent
      On reading the headline, I was hoping for something salacious in the article, along the lines of a key phrase like "Bene vixit, bene qui latuit," or even a throwaway "Klaatu barada nicto." Alas,...

      The really real reality is that this will fuel conspiracy theories that last the next 20 years.

      On reading the headline, I was hoping for something salacious in the article, along the lines of a key phrase like "Bene vixit, bene qui latuit," or even a throwaway "Klaatu barada nicto."

      Alas, the truth of the article was much more reality-based and interesting.

  7. pete_the_paper_boat
    Link
    Goes to show how precise the knowledge contained within the model can be.

    Goes to show how precise the knowledge contained within the model can be.

    1 vote