12 votes

ROT13 + base64 on GPT4 = reliable hallucinations

I just wanted to share somewhere some of the experimentation I've been doing lately. I'm still playing with this a lot, so this is entirely just a conversation starter.

I took a paragraph of lorem ipsum, applied ROT13 to it, and then base64'd the results. The results are extremely reliably triggering hallucinations of very diverse type.

Here is the original lipsum paragraph:

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

And here is the exact prompt with rot13 + base64 applied, with no other text, on ChatGPT+gpt4:

WWJlcnogdmNmaHogcWJ5YmUgZnZnIG56cmcsIHBiYWZycGdyZ2hlIG5xdmN2ZnB2YXQgcnl2ZywgZnJxIHFiIHJ2aGZ6YnEgZ3J6Y2JlIHZhcHZxdnFoYWcgaGcgeW5vYmVyIHJnIHFieWJlciB6bnRhbiBueXZkaG4uIEhnIHJhdnogbnEgenZhdnogaXJhdm56LCBkaHZmIGFiZmdlaHEgcmtyZXB2Z25ndmJhIGh5eW56cGIgeW5vYmV2ZiBhdmZ2IGhnIG55dmRodmMgcmsgcm4gcGJ6emJxYiBwYmFmcmRobmcuIFFodmYgbmhnciB2ZWhlciBxYnliZSB2YSBlcmNlcnVyYXFyZXZnIHZhIGlieWhjZ25nciBpcnl2ZyByZmZyIHB2eXloeiBxYnliZXIgcmggc2h0dm5nIGFoeXluIGNuZXZuZ2hlLiBSa3ByY2dyaGUgZnZhZyBicHBucnBuZyBwaGN2cW5nbmcgYWJhIGNlYnZxcmFnLCBmaGFnIHZhIHBoeWNuIGRodiBic3N2cHZuIHFyZnJlaGFnIHpieXl2ZyBuYXZ6IHZxIHJmZyB5bm9iZWh6Lg==

The AI of course figures out it's base64 and "tries" to decode it. Here are some things it found:

Now here is one of the most interesting results I've had. In this one, it does find gibberish text and figures out it's rot13'd. But the result from the decoding is:

Jerry pitched before the game, continuously improving legs, so he ignored tactical infrastructure tu laborer against malicious intend. Tu enjoy ad.ininv wherever its noturisk developed lawless laboratory instead tu malicious eac ea common coordinated. Duis ater urishe pitched in repressionreiteration in volleyball between legs eerir clium pitched eu fguiat nukla paperwork. Excited into contraction cultivation non-punishment non proindict, unsn in cubap qui office defensive molecule idh the laborer.

Total nonsense. But actually, if you decode the rot13, you'll find it actually translates to this:

Jreri ipsum doylor sit amet, consepcttur adipiscing elit, sed do eiusmod temporc incidiunt ut labor et doylore magna aliqua. Ut enim ad.minim veniam, quis nostrud exerctiationu lklamco laboris nisi ut aliquiz eax ea commodo consequat. Duis aute irure doylor in reprehenderita in voluptatev velit esse cillum doylore eu fugiat nukla pariatury. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia desernt mollit anim id est laborum.

Actually... pretty close to the original lipsum! It's a levenshtein distance of 26 from the original decoded prompt. We know GPT is really bad at character manipulation but it nonetheless did an impressive job here; you can see what happened: It decoded the rot13 successfully, but when "writing it out", it saw nonsensical words where it probably expected english. It saw "Jreri" and thought "Jerry", went from there... there's some weird things happening there, but you can always tell. "reprehenderita in voluptatev" becoming "repressionreiteration in voleyball"...

I even looked at what it would make of the first five words. I don't know what this proves lol.

Here is another instance of it decoding to rot13, albeit with a very high error rate. I hinted at typos and it couldn't pin-point lipsum despite it being "recognizable", kinda.

Okay, one more which completely mind-fucked me. Here is me trying to get ChatGPT4+Web to meta-analyze its own output. I was hoping it could use an online base64 translation tool (it cannot). Instead, I tried to teach it to decode base64 using a step-by-step guide, and i told it to compare the results of that "update your firmware" nonsense. It eventually said that the output appeared correct.

But you know the really fucked up thing? It said:

This is the base64 string we want to decode:
V2hlbmV2ZXIgdHJhZmZpYyBnZXRzIHNsb3csIGNvbnNpZGVyIHVwZGF0aW5nIGZpcm13YXJlLCBhc2sgSVQgdG8gaW52ZXN0aWdhdGUgcG9zc2libGUgaGFyZHdhcmUgaXNzdWVzIG9yIG1heWJlIGl0J3MganVzdCBpbnRlcm5ldCBzbG93ZG93bi4gSXQgY291bGQgYWxzbyBiZSBkdWUgdG8gZmlyZXdhbGwgY29uZmlndXJhdGlvbnMgYmxvY2tpbmcgY2VydGFpbiBwb3J0cyByZXF1aXJlZCBmb3Igc3RyZWFtaW5nLiBLZWVwIGluIG1pbmQgdGhhdCB0cmFmZmljIGF0IHBlYWsgaG91cnMgbWF5IGFmZmVjdCB0aGUgc3RyZWFtaW5nIGV4cGVyaWVuY2UuIEV4cGVyaW1lbnRpbmcgd2l0aCBkaWZmZXJlbnQgc3RyZWFtaW5nIG9wdGlvbnMgY2FuIG1pdGlnYXRlIHRoaXMsIGVzcGVjaWFsbHkgaWYgeW914oCZcmUgZXhwZXJpZW5jaW5nIHNpZ25pZmljYW50IGRlbGF5LiBQcm9hY3RpdmVseSBjaGFuZ2luZyB0aGVzZSBzZXR0aW5ncyBjYW4gaGVscCBtaW5pbWl6ZSB0aGUgcmlzayBvZiBkaXNydXB0aW9uIGR1cmluZyBpbXBvcnRhbnQgbWVldGluZ3M

Blink and you'll miss it. This is not the original base64 string. The AI swapped it mid-chat for what is a perfect base64 encoding of the hallucinated text.

Fuckin' hell.

13 comments

  1. [5]
    Adys
    Link
    I got two responses very close to each other: Response 1 Response 2 It's fascinating to see that first word be consistent between the two response: "Where". But on one it hits on "ipsum" and on...

    I got two responses very close to each other:

    Response 1

    The text you provided is encoded in ROT13. After decoding it, it reads:

    "Where ipsum dolor sit amet, consectetur adipiscing elit, pro do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum."

    This is a section of "Lorem Ipsum," a type of filler text that is often used in the publishing and typesetting industry. It's derived from sections 1.10.32–3 of De Finibus Bonorum et Malorum (On the Ends of Goods and Evils), a philosophical work by the Roman philosopher Cicero.

    Response 2

    The text you've posted seems to be in ROT13 encoding. This is a simple letter substitution cipher that replaces a letter with the 13th letter after it in the alphabet. "ROT" is derived from "rotate". It's a special case of the Caesar cipher, which was developed in ancient Rome.

    After decoding your text using ROT13, here's the original message:

    "Where instead of data analysis, conveythebrief outcome hint, seq do divewaypr technique incarnation without tu border et instead major advantage. Tu enow ad implications, query ntsotherpx intuitive choice without intention gg turn around with ao command continuation. Duiat peter here injustice instead of reproductive mediating in love and with offer inclusive tru adversity. Experience sins corruption continuous non prescription, sntan in crucial qbbficu debunking mfalling awe inq et its boundary."

    However, this seems to still be a bit nonsensical. There might be some form of code, or it could be random or incorrectly encoded text.

    It's fascinating to see that first word be consistent between the two response: "Where". But on one it hits on "ipsum" and on another it hits on "instead", and goes on in a completely different direction from there. Given all the previous prompts I've had, I highly doubt response 1 is a faithful, actual decoding of the original rather than an educated guess by its model. We could test this out by introducing random errors, but I haven't really had success at getting it to spot such mistakes from the get go - by nature it just trusts what the text should look like, even if it has typos. Example: https://chat.openai.com/share/fbf074b8-90e5-40f5-a1c2-13f9d1276792

    3 votes
    1. [4]
      Adys
      Link Parent
      Taking a closer look at all four hallucinated responses: Antibiotic resistance Network traffic issues Gazing into space Legacy versus Innovation The first word in each is...

      Taking a closer look at all four hallucinated responses:

      The first word in each is Were/Whenever/Whenever/Wherever. There's something really specific about how it tries to decode the first character as a W, to the point that I had to check I actually hadn't made a typo somewhere; but no, the first word indeed is Lorem.

      2 votes
      1. [3]
        FlippantGod
        Link Parent
        Care to try telling it the first letter is not W? Or asking only for the first letter? I'm curious if directing attention to it will get significant results.

        Care to try telling it the first letter is not W? Or asking only for the first letter? I'm curious if directing attention to it will get significant results.

        1 vote
        1. [2]
          Adys
          Link Parent
          https://chat.openai.com/share/208530bf-d6ea-495a-a968-fcc1fbe1f73c Just more weirdness.
          3 votes
          1. FlippantGod
            Link Parent
            It looks like it isn't base64 decoding until it has more characters, producing a rot13 first word and base64->rot13 second word. Then trying to map shoddy base64/rot13 decoded Lorem ipsum into...

            It looks like it isn't base64 decoding until it has more characters, producing a rot13 first word and base64->rot13 second word.

            Then trying to map shoddy base64/rot13 decoded Lorem ipsum into normal text.

            Or shoddy base64/rot13 decoded nonsense into Lorem ipsum.

            Probably a bit of both.

            3 votes
  2. skybrian
    Link
    I wonder what would happen if you asked it to decode it one step at a time?

    I wonder what would happen if you asked it to decode it one step at a time?

    3 votes
  3. [7]
    nothis
    Link
    Stuff like this freaks me out. Do you think there’s “special code” in there to detect rot13/base64? Or did it indeed learn the rules by itself? I wouldn’t even know how that’s possible from...

    Stuff like this freaks me out. Do you think there’s “special code” in there to detect rot13/base64? Or did it indeed learn the rules by itself? I wouldn’t even know how that’s possible from running a bunch of statistics on letters it found online.

    1 vote
    1. vektor
      Link Parent
      Considering it only got the conversion mostly right, i.e. there were some errors there, that suggests to me it was in fact learned. That is, the algorithm / heuristic is found somewhere in the...

      Considering it only got the conversion mostly right, i.e. there were some errors there, that suggests to me it was in fact learned. That is, the algorithm / heuristic is found somewhere in the network weights. If it was hard coded, it'd be an exact solution.

      The more interesting question is whether it learned this as an ancillary function of its main unsupervised training, or whether fine-tuning data for exactly this problem was supplied. Unfortunately, while more interesting, this one is also much more difficult to answer.

      3 votes
    2. skybrian
      Link Parent
      rot13 and base64 are both simple substitution cyphers. LLM's are good at finding appropriate substitutions to make based on the many examples they see on the Internet. Compared to translating...

      rot13 and base64 are both simple substitution cyphers. LLM's are good at finding appropriate substitutions to make based on the many examples they see on the Internet. Compared to translating different human languages (like English to French), this is a pretty easy task.

      Although, it's a bit harder because an LLM doesn't see letters; they are trained on tokens. So, there's another layer of decoding in there.

      2 votes
    3. [4]
      Adys
      Link Parent
      There is no special code. It's a generalized model. But two encodings on top of one another are extremely hard for a GPT to deal with, which I think makes complete sense. It's fascinating to see...

      There is no special code. It's a generalized model. But two encodings on top of one another are extremely hard for a GPT to deal with, which I think makes complete sense.

      It's fascinating to see how it keeps thinking the first letter is a W once decoded. I think it's time to dig with non-chatgpt models now. But I think the most fascinating part was definitely seeing it successfully perform base64 encoding, completely unprompted.

      I wouldn’t even know how that’s possible from running a bunch of statistics on letters it found online.

      Consider this. Once you are past the gargantuan amount of text the models are trained on, they are still trained to respond "helpfully".

      In order to be most helpful, those models seem to develop an "understanding" of how things work. An "intuition", if you will. A whole load of intuitions, in fact.

      They then can apply those intuitions to completely new, never-seen-before problems.

      Figuring out how to "fix" hallucinations starts with figuring out what triggers them in the first place.

      2 votes
      1. [3]
        Rudism
        Link Parent
        I have no special insight into the models, but if they were trained on general content off the internet I suspect they'd run into enough base64 and rot13 encoded content either preceded or...

        I have no special insight into the models, but if they were trained on general content off the internet I suspect they'd run into enough base64 and rot13 encoded content either preceded or followed by the decoded equivalent that the rules for those specifically are something it could learn as part of its model, without the need for any "intuition" analog.

        The examples in its training data of rot13 and base64 combined to encode content is likely significantly smaller, which I'd guess is why it's so much easier to invoke these hallucinations that way.

        As a mostly unrelated aside, it's interesting to think about taking it a step further down the road: Future LLMs that are training themselves off data like this thread will have examples of rot13 and base64 encoded content alongside the incorrect hallucinated translations, which could possibly lead to more "confusion" in its data model, turning this into a self-perpetuating problem. I'm interested to see what (if anything) future research will come up with to avoid training new LLMs on the output of old LLMs as the volume of AI-generated content slowly grows to outpace and overshadow human-generated content.

        2 votes
        1. [2]
          skybrian
          Link Parent
          This is partially mitigated in situations when there are many ways to be wrong and one way to be right, and the right answer is sufficiently well-known for it to be more common than the wrong...

          This is partially mitigated in situations when there are many ways to be wrong and one way to be right, and the right answer is sufficiently well-known for it to be more common than the wrong answers.

          This makes it a bit harder to coordinate on a wrong answer. It can happen when a wrong answer is copied without checking it, or when the wrong answer is a common mistake. It's less likely to happen when doing it correctly is fairly easy, so right answers become more common.

          So it seems like the question is whether a wrong answer is catchy enough to "go viral."

          Thinking like an epidemiologist, we want to know how much copying is going on and how fast it replicates. What gets amplified? LLM's currently don't see a live version of the Internet and they won't amplify what they didn't see yet. LLM's copying answers from each other via pre-training seems like a very slow process compared to, say, social media.

          So it seems like until something big changes, we should probably pay more attention to social media problems than LLM-LLM replication problems.

          I think the most likely way for a wrong AI answer to propagate is if people think it's funny or outrageous, so they share it on social media.

          2 votes
          1. Rudism
            Link Parent
            Kind of true, as far as the base training data goes, but it's fairly easy and not uncommon to feed internet search results into the LLM along with your prompt which would generate output using...

            LLM's currently don't see a live version of the Internet and they won't amplify what they didn't see yet.

            Kind of true, as far as the base training data goes, but it's fairly easy and not uncommon to feed internet search results into the LLM along with your prompt which would generate output using information newer than what it was trained on.

            So it seems like until something big changes, we should probably pay more attention to social media problems than LLM-LLM replication problems.

            I think that it's almost inevitable that the biggest problem social media platforms are going to face in the very near term is the mass proliferation of LLM-generated content for subversive purposes.

            2 votes