16 votes

What is ChatGPT doing … and why does it work?

13 comments

  1. [10]
    Gaywallet
    Link
    This is a particularly long post going into the minutae of detail around precisely how chatgpt works. I found it particularly interesting because it also investigates what we know about the...

    This is a particularly long post going into the minutae of detail around precisely how chatgpt works. I found it particularly interesting because it also investigates what we know about the structure of language/speech and what we don't know. The exploration of computational linguistics and neural net structure is fascinating as it's revealing things about the production/synthesis of language that we haven't done a good job of classifying yet.

    I think it also brings up (yet does not address) an interesting question of intelligence and sentience, a conversation which hasn't really come up until large scale text prediction models hit the mainstream. A divergence into exploring the structure of AI image classification shows how models have in many ways learned to mimic what's going on in our own brains (the layers in the visual cortex). I find it particularly fascinating that there's something special about speech which people associate with intelligence - animals which can learn to utilize language in a way we recognize as language appear more 'intelligent' to us than other animals, even when some are unable to complete the same level of complex tasks. For example, octopuses and crows are both fantastically smart animals but cannot produce and typically do not respond to speech, and are often not listed when people are asked to name 'intelligent' animals.

    Lately it's made me reflect upon what intelligence truly is. Classic ways to measure this involve concepts like 'self-aware', but this concept is intrinsically tied to language. If something is aware of itself but unable to articulate it to humans, is it self-aware? How do social factors play into this - that is to say, how do our thoughts change when we examine a single bee versus a hive? What if a human were raised in complete isolation? What do we think of their intelligence when they are never exposed to language or the intelligence that socialization brings us (acquired knowledge, teaching, etc.)? To continue this line of thinking, what of a computer that can complete the same task as an animal? Does it matter if the way it accomplishes this is functionally similar (a feed-forward neural network structurally resembles some things our brain already does, even though it differs in fundamental ways) to the way other living beings do? What of genetic diversity and millions of years of evolution - will we find living beings which have fundamentally different architecture for information processing and will we have to revise some of our thoughts on what constitutes intelligence?

    On one level I understand quite deeply that chatgpt is just statistical probability being performed on a massive library of information, but how different is that from how a human processes information? We've debated the idea of free will with very similar arguments - if we had perfect knowledge of someone's life and how their brain processes information, could we predict with perfect accuracy how someone would act in any moment? Is this really any different than having a "random" weight in a model which we fully control? I don't think anyone truly has a model which accounts for all of these interesting questions and we may be finding ourselves needing to revise our thoughts on intelligence quite quickly as we continue to develop AI that has started to at the very least resemble intelligent thought.

    4 votes
    1. [6]
      stu2b50
      Link Parent
      I don't think just a LLM could ever be considered sentient. But augmented with memory, which can be as simple as a separate collection of prior tokens with some kind of recall method, that's a...

      I don't think just a LLM could ever be considered sentient. But augmented with memory, which can be as simple as a separate collection of prior tokens with some kind of recall method, that's a different story.

      Google had an interesting (although not particularly relevant to practical uses of LLMs) where they managed to construct a rudimentary turing machine with an off-the-shelf LLM and some prompt engineering. Paper here: https://arxiv.org/abs/2301.04589

      That means that an LLM with memory is a turing machine, and that it is turing complete, which means it's technically capable of any computational problem.

      That's far from meaning that it's a practical thing, but it does lend credence to the idea that augmented LLMs can have something akin to an intelligence... eventually.

      4 votes
      1. [5]
        streblo
        Link Parent
        It's kind of alarming how many otherwise intelligent people seem to lose their minds a little bit when what's effectively a markov chain outputs "I'm conscious please help me". I don't even think...

        It's kind of alarming how many otherwise intelligent people seem to lose their minds a little bit when what's effectively a markov chain outputs "I'm conscious please help me". I don't even think we'll need AGI, someone will be able to weaponize an LLM into some new form of religion.

        6 votes
        1. [4]
          skybrian
          Link Parent
          A side point: calling it a Markov chain seems more confusing than helpful because Markov chains are supposed to depend only on the current state and transformers are all about looking back at...

          A side point: calling it a Markov chain seems more confusing than helpful because Markov chains are supposed to depend only on the current state and transformers are all about looking back at previous tokens in the sequence. (Admittedly, with a limited window.) There are some similarities, but text generated by Markov chains isn’t nearly as impressive.

          2 votes
          1. [3]
            streblo
            Link Parent
            Fair point, although if we group all active tokens as current state with the next state being +1 token I think the my point still stands. And to head off anyone thinking "well, aren't we just...

            Fair point, although if we group all active tokens as current state with the next state being +1 token I think the my point still stands.

            And to head off anyone thinking "well, aren't we just fancy markov chains then" as /u/stu2b50 said, we have the ability to categorize, retrieve, and contextualize 'sequences of tokens', which is, I think, a non-trivial task ahead of consciousness (if consciousness is even an emergent property at all).

            2 votes
            1. [2]
              skybrian
              Link Parent
              Yeah, and because infinite memory doesn't exist, all computers are finite state machines too. :) It seems like the usefulness of mathematical models depends on more than what they can be...

              Yeah, and because infinite memory doesn't exist, all computers are finite state machines too. :)

              It seems like the usefulness of mathematical models depends on more than what they can be shoehorned into modelling. It's very unlikely that any given 4k token sequence will come up twice and so estimating the probability of the next token based on each state independently doesn't seem very useful. How would you train that?

              I think the resemblance is just that the language model has no short-term memory, other than the chat transcript it gets as input. That's something you can say without bringing up Markov models, though.

              4 votes
              1. streblo
                (edited )
                Link Parent
                Yea that's really what I was getting at. Your nitpick is correct. ;)

                I think the resemblance is just that the language model has no short-term memory, other than the chat transcript it gets as input. That's something you can say without bringing up Markov models, though.

                Yea that's really what I was getting at. Your nitpick is correct. ;)

                1 vote
    2. [3]
      skybrian
      (edited )
      Link Parent
      I think of large language models as having at least partially learned the rules to many conversational games. Like, it might not really know what a “hamburger” is but it knows how to use...

      I think of large language models as having at least partially learned the rules to many conversational games. Like, it might not really know what a “hamburger” is but it knows how to use “hamburger” in many kinds of conversations because it’s read a large subset of the Internet. And that can be true of us too. Sometimes we only have a partial understanding of terms that we’ve only read about or heard about. That’s usually true when we first start learning about something.

      A language model’s level of understanding might seem impressive for games that you only have a superficial understanding of yourself. But when you start playing a game for real you will often find that it doesn’t really know how to play; it’s just learned some of the basics.

      A deeper difference is that language models aren’t individuals. To the extent that they contain opinions, it’s in much the way that a library contains opinions. To predict any text on a subset of the Internet, they need to be able to impersonate anyone on the Internet, to some limited degree. But none of the characters they can impersonate are real.

      I also like Ted Chiang’s metaphor where he calls ChatGPT a “blurry JPEG of the web”. This doesn’t capture the game-playing aspect of it, but it’s true that it can’t know more than it’s training data, and many details are lost. It’s a vast but blurred library.

      Impersonations can be unsettling and you may wonder how deep they go, but often this is based on an exaggerated idea of their capabilities. Rather than getting philosophical, it might be better to push their limits so you get a more realistic sense of their capabilities.

      I’m reminded of how captivating certain video games can be when you first start playing them. When you don’t have a good sense of how deep a game goes, you can imagine all sorts of things going on.

      3 votes
      1. [2]
        Gaywallet
        Link Parent
        To be clear, I already knew everything this article talks about. I'm a data scientist in health care, I'm on published papers where AI has been used to model and predict things. I may not be an...

        Rather than getting philosophical, it might be better to push their limits so you get a more realistic sense of their capabilities.

        To be clear, I already knew everything this article talks about. I'm a data scientist in health care, I'm on published papers where AI has been used to model and predict things. I may not be an expert in large language models of this kind of complexity, but I am also quite acutely aware that this is nothing but a fantastically complex statistical prediction on an insanely large volume of data.

        With that being said, it inspired a conversation with a partner of mine about the limits of intelligence. For a long time I've been against most standard definitions of intelligence - I think intelligence tests are pretty much all racist and problematic for a variety of reasons. A paper I recently linked about how the language you speak affects certain aspects of cognition which are baked into the way we test specific aspects of it. This conversation I had, however, had me shifting that thinking even more. I've always respected living objects in the world more than most - I refuse to kill almost all bugs (mosquitos are one that I have a lot of trouble empathizing for), for example, and simply trap/move them when they are bothersome. Spending a small bit of time exploring my own thoughts on cognition and intelligence and prodding at the bounds of where I drew the lines that I drew made me realize even more deeply that some of these lines I've drawn are perhaps unfounded or have issues with them. As I mentioned above, I'm not sure quite where I place a bee's intelligence as an individual or as a hive, but I definitely recognize at least the hive as some form of intelligence. One could easily make the argument that many insects see a "blurry jpeg" of the world in that they tend to be really good at specific things (e.g. bees and identifying sources of nectar) and not so great at other things, compressing knowledge along specific axes of importance (fitness).

        All of that is simply to say that I'm philosophizing because I think it's interesting to have a talk about where we put limits on intelligence and how we interact with other intelligent beings in life. There's definitely a difference between how we interact with bugs and how we interact with animals and how we interact with pets, specifically. A dog might be a lot dumber than an octopus, but I think people in general are more okay with someone killing an octopus for food than a dog. Why is that? At what point is "turning off" an AI a similar action and where will humans decide to draw these bounds? I don't know, but I enjoy thinking and talking about it as it reveals a lot about our thoughts and the world we live in.

        4 votes
        1. skybrian
          (edited )
          Link Parent
          Okay, fair enough! One philosophical point I think worth pondering is the difference between an active agent (or creature) versus a passive artifact. A large language model, when not being used,...

          Okay, fair enough!

          One philosophical point I think worth pondering is the difference between an active agent (or creature) versus a passive artifact. A large language model, when not being used, is just a file on disk. That's definitely more of an artifact, like a book or a library or a database. By reading a book we have a sort of one-way relationship with its authors, but we don't think of the book itself as being alive.

          We interact with a language model using a chat session, which is a rather simple computer program with a bit of state: the chat history.

          So the question is how much do you consider a chat session to be an active agent or creature? Do you create another entity, a new clone, every time you start up a new session? I think most people would say no.

          A large language model, then, seems more like an artificial memory than an artificial intelligence. The difference is that it's a memory you can talk to, and it's organized in a way that suffers from a lot of the flaws of human memory. In a lot of ways it's actually worse than a library or database, but you can't talk to them.

          Someday this artificial memory might be a component of a more sophisticated agent, though.

          We are in large part our memories. It might be interesting to think about what we are besides our memories.

          6 votes
  2. [3]
    Atvelonis
    Link
    Thanks for sharing this detailed explanation. ChatGPT reminds me a lot of Jorge Luis Borges' "Library of Babel" story from the Ficciones. It seems we have discovered a new wing of the Library,...

    Thanks for sharing this detailed explanation. ChatGPT reminds me a lot of Jorge Luis Borges' "Library of Babel" story from the Ficciones. It seems we have discovered a new wing of the Library, which others call the universe.

    One of my undergrad theses reviewed common deconstructive natural language processing techniques in the context of fictional literature, especially bodies of work by a particular author, for the purpose of conducting literary analysis. I was dealing with much less sophisticated models than GPT-3, but I was interested in the idea that you could use exclusively probabilistic techniques to break down "complex thought" into a series of statistically correlated relationships, groups, themes, etc. The takeaway of my research was that, however mathematically cool this technology was, it was only practically meaningful to literary analysis when a scholar provided subjective context to the categories of related terms the computer generated. The model just ran the numbers, it didn't really "know" what it was doing. That seems to have been something supported by the people who created these models too. Reading through the explanation of GPT-3 in this article, I don't think we've moved on from that caveat, even though the output has gotten more convincing.

    I anticipate that these bots are going to spread quite a lot of misinformation as they proliferate. No harm meant necessarily, their output is just meaningless regardless of how polished it appears. This point has been made many times, so I won't belabor it. The Library of Babel will be the next step in the Information Age.

    Many people seem to feel existentially threatened by the existence of a mathematical model that can approximate human language, sometimes for economic reasons, sometimes for philosophical ones. I understand the fear—no one wants to be displaced, whether at the office or in the grand hierarchy of the universe. But I actually have very little worry for humanity here. The truth is that, in a sense, we've always lived in the Library, but we've been able to make decent enough sense of it.

    The next generation's search for truth is probably going to rely more heavily on trust than we've become accustomed. Trust in media sources; trust in scientists; trust in "I'm a real human" authentication software. I'm even tempted to call this level of truth faith. While it'll be regrettable to lose the internet as a source of authentic conversation with other humans, having collective social faith in "the wise persons" among us is an age-old practice.

    4 votes
    1. [2]
      skybrian
      Link Parent
      I wouldn't say that the output is meaningless, because it's created from meaningful ingredients. But it's only suggestive and shouldn't be taken literally. For factual information, you do want to...

      I wouldn't say that the output is meaningless, because it's created from meaningful ingredients. But it's only suggestive and shouldn't be taken literally.

      For factual information, you do want to know provenance and it's been removed.

      2 votes
      1. Atvelonis
        (edited )
        Link Parent
        Yes, perhaps I used the term too liberally. Meaning isn’t exclusively centralized in authorship. For academics, there’s substantial sociological benefit to be derived from analyzing the output of...

        Yes, perhaps I used the term too liberally. Meaning isn’t exclusively centralized in authorship. For academics, there’s substantial sociological benefit to be derived from analyzing the output of these models. The machines reflect a particular image of humanity, and we can learn a lot about ourselves by taking a step back and looking at our use of language from an outside perspective. As with the models in my thesis work, the new chatbots are just a scholarly tool—not a source of truth in and of themselves.

        Inevitably, as the bots output more work that appears to be “creative,” people will begin to analyze it like traditional literature. This is fine, but their analysis will be flawed if they think of the bots as authors instead of what they really are, which is more like filters. In this case the “creative” work by a bot has only or primarily latent inherent meaning—that is, meaning which is potential but inaccessible until synthesized with the scholarly interpretation (another source of meaning). The “meaningful components” fed to the algorithms of the model are there, but the process of “writing” with GPT-3 is what scholars would call distant—the writing process itself doesn’t unlock meaning in the same way or to the same extent as it would with a human author, whose placement of “meaningful components” in a larger work is informed by context. When analyzing “creative” bot work, scholars simply have to remember that the sources of meaning are now two instead of three.

        I’m interested to see an entire new subfield of literary analysis emerge here. I’m sure I will also be dismayed by the public misunderstanding where the “meaning” of a probabilistic literary work comes from, though that’s far from the end of the world. What worries me more is that bad actors will use the existence of these models as a reason to devalue and defund the arts and humanities more than they already have, especially the creative arts. The social ramifications of that misguided policy will be widespread, severe, and extremely difficult to untangle. But that deserves a topic of its own.

        1 vote