9 votes

Anyone know of research using GPTs for non-language tasks

I've been a computer scientist in the field of AI for almost 15 years. Much of my time has been devoted to classical AI; things like planning, reasoning, clustering, induction, logic, etc. This has included (but had rarely been my focus) machine learning tasks (lots of Case-Based Reasoning). For whatever reason though, the deep learning trend never really interested me until recently. It really just felt like they were claiming huge AI advancements when all they really found was an impressive way to store learned data (I know this is an understatement).

Over time my opinion on that has changed slightly, and I have been blown away with the boom that is happening with transformers (GPTs specifically) and large language models. Open source projects are creating models comparable to OpenAIs behemoths with far less training and parameters which is making me take another look into GPTs.

What I find surprising though is that they seem to have only experimented with language. As far as I understand the inputs/outputs, the language is tokenized into bytes before prediction anyway. Why does it seem like (or rather the community act like) the technology can only be used for LLMs?

For example, what about a planning domain? You can specify actions in a domain in such a manner that tokenization would be trivial, and have far fewer tokens then raw text. Similarly you could generate a near infinite amount of training data if you wanted via other planning algorithms or simulations. Is there some obvious flaw I'm not seeing? Other examples might include behavior and/or state prediction.

I'm not saying that out of the box a standard GPT architecture is a guaranteed success for plan learning/planning... But it seems like it should be viable and no one is trying?

10 comments

  1. [3]
    unkz
    Link
    What you are talking about sounds basically like reinforcement learning, which is an active field of research using transformers, as well as many other deep learning models. For example, from 2021...
    • Exemplary

    What you are talking about sounds basically like reinforcement learning, which is an active field of research using transformers, as well as many other deep learning models.

    For example, from 2021 with 517 citations:

    https://arxiv.org/abs/2106.01345

    We introduce a framework that abstracts Reinforcement Learning (RL) as a sequence modeling problem. This allows us to draw upon the simplicity and scalability of the Transformer architecture, and associated advances in language modeling such as GPT-x and BERT. In particular, we present Decision Transformer, an architecture that casts the problem of RL as conditional sequence modeling. Unlike prior approaches to RL that fit value functions or compute policy gradients, Decision Transformer simply outputs the optimal actions by leveraging a causally masked Transformer. By conditioning an autoregressive model on the desired return (reward), past states, and actions, our Decision Transformer model can generate future actions that achieve the desired return. Despite its simplicity, Decision Transformer matches or exceeds the performance of state-of-the-art model-free offline RL baselines on Atari, OpenAI Gym, and Key-to-Door tasks.

    That’s just one example, there are many others in the literature.

    6 votes
    1. Beenrak
      Link Parent
      Interesting, that's exactly the kind of research I was looking for. I think I focused to heavily on my searches around specific sub topics (planning), but RL would be another area. That will...

      Interesting, that's exactly the kind of research I was looking for. I think I focused to heavily on my searches around specific sub topics (planning), but RL would be another area. That will probably give me a lot more similar work to what I'm looking for then the LLM stuff going on right now.

      Thanks!

      1 vote
  2. [6]
    skybrian
    Link
    I'm not familiar with the literature, but searching on "LLM planning" shows that there are some papers to read. There's also been some research into getting LLM's to understand images and video....

    I'm not familiar with the literature, but searching on "LLM planning" shows that there are some papers to read.

    There's also been some research into getting LLM's to understand images and video. Search on "multimodal language models" for that. And there's been research into generating music.

    1. [5]
      Beenrak
      Link Parent
      Well, my comment is less so on using LLMs for planning, and more so on using GPTs without language. You can get chatGPT to do minimal planning for you, but why use a language model for something...

      Well, my comment is less so on using LLMs for planning, and more so on using GPTs without language.

      You can get chatGPT to do minimal planning for you, but why use a language model for something that isn't language?

      1. [4]
        skybrian
        Link Parent
        I don't know how it's done in the planning domain, but often, input data is in a text format that LLM's can handle. There's a lot of grunge work to get data in the right format to do analysis on....

        I don't know how it's done in the planning domain, but often, input data is in a text format that LLM's can handle. There's a lot of grunge work to get data in the right format to do analysis on.

        I believe people are using LLM's to scrape websites and stuff like that. It might be cheaper and safer to ask the LLM to write a script, though?

        1. [3]
          Beenrak
          Link Parent
          Sure I could translate a plan into text and then use the LLM, but isn't that making the problem in the learner WWAAYY harder then it needs to be? With language it would model an action as a...

          Sure I could translate a plan into text and then use the LLM, but isn't that making the problem in the learner WWAAYY harder then it needs to be?

          With language it would model an action as a sequence of letters that could potentially change. However if we tokenize the action directly (skipping the language bit) we can skip over needing to ever learn about Grammer, vowel rules, etc.

          1 vote
          1. [2]
            skybrian
            Link Parent
            You're asking what other people do, and I'm speculating based on stuff I happened to read on the Internet. The short answer is I don't know what they do.

            You're asking what other people do, and I'm speculating based on stuff I happened to read on the Internet. The short answer is I don't know what they do.

            1. Beenrak
              Link Parent
              Ok, sorry to be argumentative -- just trying to make sure we were talking about the same problem. Thanks for your input!

              Ok, sorry to be argumentative -- just trying to make sure we were talking about the same problem. Thanks for your input!

              2 votes
  3. Greg
    Link
    If I’m understanding your post correctly (and I may well not be, so feel free to tell me if so!) I think terminology might be part of the issue? Although strictly speaking “GPT” could be used to...

    If I’m understanding your post correctly (and I may well not be, so feel free to tell me if so!) I think terminology might be part of the issue? Although strictly speaking “GPT” could be used to refer to any pre-trained transformer, the connotation is generally that you’ll be talking about an LLM if you say that - so it’s less that the technology is only being used for LLMs, more that if a transformer model is being used for non-LLM purposes it’s more likely to be referred to as “a transformer model” than “a GPT”, even if it is indeed generative and pre-trained.

    If you go completely outside the LLM space there are examples like diffusion transformers which operate on tokenised image patches rather than tokenised text. Approaching it from the other side, some projects do start from GPT-2 (or similar) as a base because the language processing capability is still important to them, but they then modify it for anything from speech synthesis to brainwave decoding.