21 votes

Does generative AI have a natural limit without a major innovation?

I was musing about this recently with the recent models becoming more capable. The core of gen AI is the model, which is trained on a massive dataset. To date, gen AI has improved because the models have become larger, more efficient, the data they are trained on has become better and the software/harnesses around them has improved to help query them.

As I see it, surely the bottleneck will soon become the data they are trained on? If we imagine a scenario where a models could consume an infinite amount of training data, and there is no limit to the training time or quality. The sum of human skill/knowledge is the limiting factor. Gen AI should (in theory) never be able to out preform or push the boundary of the sum of humanity at time of training.

Or, counterpoint, is there enough randomness and speed to iterate that gen AI can actually step change and improve if training times/cost were less prohibitive? Most companies/models today will save good output and feed it back into the next iteration, but right now that's taking months. What if that took minutes?

What do you think?

Is gen AI going to take us to general intelligence?
Will gen AI get to a place where it's "intelligence" and reasoning is actually better than the sum of Humanity?

35 comments

  1. [2]
    arqalite
    Link
    I'm far from an LLM researcher, but I work with AI on the daily and have deployed multiple applications using models from all major frontier labs. I don't think LLMs will get us to AGI, it might...

    I'm far from an LLM researcher, but I work with AI on the daily and have deployed multiple applications using models from all major frontier labs.

    I don't think LLMs will get us to AGI, it might get very close but I do think that always a sufficiently trained human will outperform the very best LLM there is, in a given task.

    Where LLMs will outperform us (and maybe already do for some narrow cases) though, is in cross-functional tasks across multiple domains, since they can distill knowledge from so many fields at once while people generally specialize in one or two fields.

    I do think we need a breakthrough in model architecture to get past this barrier. It will probably happen eventually, but not right now.

    It's interesting to see the attempts at recursive self-improvement and how they develop in the future, though.

    15 votes
    1. kaffo
      Link Parent
      Sensible take. It does seem that gen AI is getting better at having more, and more broad models. Way back when the hype started in like 2023/2024 I wondered if we'd see extremely good, but...

      Sensible take.

      It does seem that gen AI is getting better at having more, and more broad models. Way back when the hype started in like 2023/2024 I wondered if we'd see extremely good, but specialised models. Maybe they could talk to each other, or they could work together in some environment. But seems they've gone the general route and it's actually working out reasonably well.

      Re self improvement, I'm sure it's being attempted. It's got to be in similar veins to reinforcement learning where they give the models a reward metric. But it must be slow as hell right now with how expensive training is.

      3 votes
  2. [8]
    NonoAdomo
    Link
    No. That's not an an end state for LLM/Generative AI. The AI companies want you to think that it is, but they're honestly just using marketing smoke and mirrors to talk about how cool their...

    Is gen AI going to take us to general intelligence?

    No. That's not an an end state for LLM/Generative AI. The AI companies want you to think that it is, but they're honestly just using marketing smoke and mirrors to talk about how cool their products are.

    The first hurdle is for everyone to agree on what "intelligence" is. What does it mean? Clearly, us awesome humans have it, but are there other species that have it? I could go and write a doctoral thesis level post on this, but the problem I wish to highlight is that we don't have an agreed upon definition on what intelligence is.

    The second hurdle is that LLMs are a prediction model. It can only spit out responses based on what what it was trained to do. Most LLMs are trained with whatever they could get their hands on (acquired legal or illegal) but they don't come up with new ideas. Every response is a prediction based on what they expect the answer to be. There is no "logic" or "reasoning" with this on the part of the software (again, things we have no clear philosophical definitions). To these AI companies credit, they tuned the training process so well that they give impressive answers that appear like intelligence to the layperson.

    The third and final hurdle understanding what the achievement state of a "general intelligence" is. Lets take an easy route, and simply say: General Intelligence will be when we can replicate the human brain and how intelligence exists in humans. Well, the challenging part there is that we honestly don't know what that means from a medical position either. We know the high levels: Brains have neurons and neurons talk to one another. (Hence the term you might hear of "Neural net" for various simple models), but we don't understand the how on a human consciousness exists within our brains. It works, we can even see it happening in other animals but it's such a challenging web of chemicals and electrical signals that we don't understand it. Someday we will, but right now we don't and LLMs/Generative AI is not the path there.

    Will gen AI get to a place where it's "intelligence" and reasoning is actually better than the sum of Humanity?

    To put it simply, no. The best we can hope for is equal to, but that seems unlikely as well. Don't get me wrong, it's likely going to get to like, 95%-99% range but it can never get better than the comparison of where it started. The only reason why I can't confidently give it 100% is that AI as we know it now currently works by guessing what it should respond next. It's gotten DAMN good at guessing, but stuff that we can reason as true (like 2+2=4) is not what LLMs do. It responds that 2+2=4 because it read enough places that this is true. It does not see two rocks in one hand and two rocks in another count to four total rocks when brought together like humans do. When you ask how it got the answer, it will give you a pretty good bit of stuff that looks like reasoning, but it's again just taking the most likely answer. This is why for the longest time LLMs struggled with stuff that's simple for us like "How many instances of the letter r are in the word strawberry?" There are tons of these edge cases and the AI companies try their best to shut these down every time the public loudly discovers one to show that the models are learning and growing.

    Now none of this is to disparage LLMs or GenAI. This is a remarkable achievement in computational history. When I started to learn about computers and software as a kid, I wanted to learn about how to do those cool things in science fiction like AI and robots. Turns out, as I got into actually learning about AI, I also learned that the ethics are really REALLY complicated and it didn't take long to see how these models would be abused by everyone to do morally ambiguous things, which is why I ultimately didn't feel comfortable pursuing a career in it.

    I hope these answers help!

    11 votes
    1. [4]
      R3qn65
      (edited )
      Link Parent
      I get where you're coming from and this isn't 100% wrong, but it's a simplification to the point that it's starting to be a little bit wrong. Models have, at this point, proved multiple novel...

      The second hurdle is that LLMs are a prediction model. It can only spit out responses based on what what it was trained to do. Most LLMs are trained with whatever they could get their hands on (acquired legal or illegal) but they don't come up with new ideas.

      I get where you're coming from and this isn't 100% wrong, but it's a simplification to the point that it's starting to be a little bit wrong. Models have, at this point, proved multiple novel mathematical proofs. Yes, that's built on other work and other ideas, but so is the entirety of human innovation. "They can only combine existing ideas" isn't really an indictment, since that's how human creativity basically functions also.

      The third and final hurdle understanding what the achievement state of a "general intelligence" is. Lets take an easy route, and simply say: General Intelligence will be when we can replicate the human brain and how intelligence exists in humans. Well, the challenging part there is that we honestly don't know what that means from a medical position either. We know the high levels: Brains have neurons and neurons talk to one another. (Hence the term you might hear of "Neural net" for various simple models), but we don't understand the how on a human consciousness exists within our brains. It works, we can even see it happening in other animals but it's such a challenging web of chemicals and electrical signals that we don't understand it. Someday we will, but right now we don't and LLMs/Generative AI is not the path there.

      For what it's worth, most researchers do not consider fully understanding and replicating the human brain to be necessary for general intelligence. Most have a functionalist definition of general intelligence, not structuralist. If we met intelligent alien life, presumably it would not function exactly the same way as the human brain, but that would not preclude it being generally intelligent.

      16 votes
      1. [3]
        Blakdragon
        Link Parent
        Honestly hadn't heard of this before - can you share more?

        Models have, at this point, proved multiple novel mathematical proofs

        Honestly hadn't heard of this before - can you share more?

        2 votes
        1. [2]
          R3qn65
          Link Parent
          You bet! A few months ago OpenAI's model disproved one of the Erdos conjectures. It's probably the most significant result thus far. But there have been others as well. Those two results were...

          You bet!

          A few months ago OpenAI's model disproved one of the Erdos conjectures. It's probably the most significant result thus far. But there have been others as well.

          Those two results were comprehensible to humans. (Mathematicians, that is. Not me.) But there's a very interesting open question right now about what it will mean when models are proving things mathematically that nobody really understands. That last link is probably the most worth reading - it's a fascinating look at what math really means, not just what it is, if that makes sense.

          8 votes
          1. tauon
            (edited )
            Link Parent
            To pile onto that, ErdosBench(mark) is a thing now, comparing multiple models’ performance on (previously not published) adaptations of Erdős problems.

            To pile onto that, ErdosBench(mark) is a thing now, comparing multiple models’ performance on (previously not published) adaptations of Erdős problems.

            3 votes
    2. delphi
      (edited )
      Link Parent
      I get the mechanisms you're referring to, I know how back propagation and reinforced learning works, I understand pretaining as a concept, and you're spot on on the analysis in any other part, but...

      I get the mechanisms you're referring to, I know how back propagation and reinforced learning works, I understand pretaining as a concept, and you're spot on on the analysis in any other part, but I have to ask - "they just regurgitate text and can't have any original insight" - is that really true?

      Like, get this. I've absolutely seen the model do "original" things before, even if they were just deterministic flukes. I can absolutely get the model to string together words and sentences in a way that - aside from the library of Babel, grumble grumble - no human being has ever said. Now granted, I can also do that with a random number generator, but therein I think lies the point.

      If we can get it to do that, we can get it to do that in code, or maths, or poetry, or whatever. Will the output be coherent, good, any of those? Doubtful, doubtful, but it will absolutely be "original". This is pedantic, sure, but I kind of reject the notion that "LLMs can't be meaningfully creative" when the mechanisms in place are in concept so close to a human synthesising knowledge from their learned experiences. And god knows humans are capable of writing nonsense as well.

      5 votes
    3. F13
      Link Parent
      I would say that LLMs are getting closeish, if you really squint, to something that could arguably be called reasoning. Yes, they know 2+2=4 because they have internalized enough data that...

      I would say that LLMs are getting closeish, if you really squint, to something that could arguably be called reasoning.

      Yes, they know 2+2=4 because they have internalized enough data that basically says that. But also, good models can "reason", even if they were never told 2+2=4, if they were told things like 1+1=2, 1+2=3, 1+1+1+1=4, etc etc. There really truly is a sense of reasoning going on, using their multi-dimensional relationship model of one token to another.

      You could argue that a dimension of an LLM might be an object's "blueness", even if it's not codified as such, based on the relative weights of other blue and non-blue things represented in that dimension, and similarly it might encode an emotional concept like "sadness". As such, it could tell you with an amount of logic whether a blue thing is likely to be associated with sad things, even if it was never trained on that directly, based on how often blue things tend to be also sad things.

      2 votes
    4. kaffo
      Link Parent
      Thanks for the detailed reply, I agree it's very interesting and very in-depth. On your point of "what is intelligence". I, like probably many others, have been thinking about it and there's a...

      Thanks for the detailed reply, I agree it's very interesting and very in-depth.

      On your point of "what is intelligence". I, like probably many others, have been thinking about it and there's a good reason we've not got agreement.
      I suspect, in my opinion, if we knew the "source code" of how something like an individual sheep worked then we would look at it in a very different way. When Humans don't understand something, it's put on a pedestal (sometimes worshipped!) and I think we do that with consciousness and intelligence to some degree.
      That's not to say it's not extremely impressive, especially as a biological evolution, but I think that because we fully understand how something works, we have a natural tendency to demote it.

      That said. I'm convinced that gen AI is not intelligent, but it is able to mimic intelligence. Which is confusing to a lot of users who don't understand what they are talking to.
      How do we decide what is "intelligent" like you said? Who knows! I thought about it somewhat and I haven't come to a conclusion. But I think that at least a "thing" has to be able to make it's own decisions and those decisions must have some kind of reasoning behind them based on both external input and also their own internal memories and thoughts.
      I don't think models (or agents) today meet this criteria. They mimic it well, especially well sometimes, but it's essential the same as a broken clock being right twice a day.

      I agree with much you've said in the rest of your comment, I can see us getting a long way in the right direction with gen AI. But it's not taking over the world quite yet.

  3. [7]
    pete_the_paper_boat
    Link
    Have the diminishing returns since GPT 3 not been clearly visible?

    As I see it, surely the bottleneck will soon become the data they are trained on?

    Have the diminishing returns since GPT 3 not been clearly visible?

    8 votes
    1. R3qn65
      Link Parent
      Most labs and think-tanks believe capabilities since GPT-3 have either continued to advance linearly or have even accelerated. Even skeptics generally insist that progress is "only" linear. I'm...

      Have the diminishing returns since GPT 3 not been clearly visible?

      Most labs and think-tanks believe capabilities since GPT-3 have either continued to advance linearly or have even accelerated. Even skeptics generally insist that progress is "only" linear. I'm sure there are some who hold that it's slowing, but that's far from a mainstream opinion.

      Not saying it can't still be your opinion, of course. But that's the consensus for context.

      19 votes
    2. [3]
      V17
      Link Parent
      As a regular user since about ChatGPT release I don't think so. ChatGPT 4 was a huge step forward, and so was o1, the change to "reasoning" models that are now standard. Since then the gains have...

      As a regular user since about ChatGPT release I don't think so. ChatGPT 4 was a huge step forward, and so was o1, the change to "reasoning" models that are now standard. Since then the gains have been seemingly small, but also I haven't tested any of the frontier models that are hidden behind the higher tier subscriptions or in the case of Anthropic currently paused, and in the grand scheme of things the time since "reasoning" models proliferated has been incredibly short, we're just used to really fast development.

      12 votes
      1. [2]
        vord
        Link Parent
        And the important question: How much is improvement to the actual model, how much is just the non-AI framework around it, and how much is just pumping nitrous in the fuel line?

        And the important question:

        How much is improvement to the actual model, how much is just the non-AI framework around it, and how much is just pumping nitrous in the fuel line?

        3 votes
        1. kru
          Link Parent
          Most of the gains over the past cycle have come from better tooling. I think this is widely recognized. But it's a tit-for-tat thing. Tools get made. Models get better at generic tool use. Tools...

          Most of the gains over the past cycle have come from better tooling. I think this is widely recognized. But it's a tit-for-tat thing. Tools get made. Models get better at generic tool use. Tools get better/standardized. Models get better at using those tools. Rinse. Repeat.

          I liken it to the development of software for creatives. Back in the days of yore, if you wanted to edit a digital photograph (not that those were easy to come by in the 80s/90s, heh), you were doing direct pixel manipulation. Then photoshop (and similar) came out and there was suite of nifty editing tools - but you had to learn how best to use them. Then those tools got better/more advanced and you learned the new usages. Then the tools got even better, and you learned more to keep up. Then the tools started being able to use themselves and here we are.

          4 votes
    3. updawg
      Link Parent
      I regularly think about how incredible some of the things are that Claude does for me and how I could never have expected to be doing this when I was using GPT-4. I know I've been seeing a lot of...

      I regularly think about how incredible some of the things are that Claude does for me and how I could never have expected to be doing this when I was using GPT-4. I know I've been seeing a lot of pessimism around Claude the last few weeks, but my projects shifted and it is even better for me than when I was singing its praises in the past.

      7 votes
    4. tauon
      (edited )
      Link Parent
      Without delivering concrete proof here, I am fairly certain most major American labs, and for sure (like, confirmed by them) some of the Chinese labs known for “distillation” work (Moonshot,...

      Without delivering concrete proof here, I am fairly certain most major American labs, and for sure (like, confirmed by them) some of the Chinese labs known for “distillation” work (Moonshot, Zhipu/Z.ai, DeepSeek, Alibaba’s Qwen), are already using synthetic training data, which is to mean data originally produced by an LLM (or a specific/deterministic code-driven process), and then (eventually) fact-checked and/or refined by a human.

      It’s worked pretty well so far in the cases that were published, for example Moonshot’s Kimi K-model series:

      [Step] 4. Simulate Usage of the Synthetic Agents: The team simulated multi-turn tool-use scenarios in order to generate “trajectories” – a fancy way of saying the detailed set of steps documenting the inputs and steps models take to accomplish their goals. Some of these scenarios simulated “users” – fake people with diverse communication styles – interacting with these agents, while others simulated autonomous usage.

      Edit: This is not to say I believe synthetic training data, for LLMs specifically, will get us to “AGI”/further-than-human intelligence. I’m sure there’s an inherent quality ceiling we’ll encounter somewhere.

  4. [2]
    post_below
    Link
    To clarify the vocab: Gen AI = LLM powered agents = LLM fine tuned for reasoning and tool use running in a harness that provides tools and other functionality. Boiling it down there are two steps:...

    To clarify the vocab: Gen AI = LLM powered agents = LLM fine tuned for reasoning and tool use running in a harness that provides tools and other functionality.

    Boiling it down there are two steps:

    • Pre training. The giant dataset, tokenizing it (converting it into numbers) and generating embeddings (mathematical relationships between the tokens). This step is constrained by the available data like you said.
    • Post training (or fine tuning). This step turns the LLM, which can't really do anything except output plausible text in response to input, into a tool that can do useful work. It's where it learns to be an assistant, to use tools, do multi-step reasoning, write code that mostly works, develop an em-dash kink, etc..

    The above compresses a bunch of important sub steps for brevity.

    Innovation can happen in various parts of both steps, so there's still a lot of room for improvement. There are undoubtedly better ways to do everything involved, much of it has been replaced with better methods multiple times already.

    Model size is likely to become a limiting factor, both because of the limit of what exists in terms of training data and because bigger models are more computationally expensive to train and to run. But that's assuming better ways of getting, vetting and tagging pre-training data aren't discovered. I'd assume that, yes, eventually there will be a ceiling. In terms of compute, the tech is going to keep getting more efficient and the hardware will keep getting better so likely any limits imposed by compute will be temporary.

    Will recursive self improvement hit an event horizon where LLMs will start improving themselves so fast they start rocketing towards AGI? Probably not with the current state of the art. When models generate their own training data they end up entrenching and exaggerating their flaws, and there are a lot of flaws. Some amount of artifical training data is fine (especially if it comes from a better model), but 100% artifical training isn't viable at this point.

    Even if LLMs were to achieve the ability to recursively self improve without ensloppifying themselves, there's no room in the math for the kind of awareness or understanding we'd associate with AGI. The models don't have a conceptual understanding of reality, they only appear to. They would need to invent new technology to get there, not just iterate on existing LLM tech.

    However, will LLM tools contribute to whatever sort of AGI is someday created? It's hard to imagine they won't.

    I can imagine a future world model with pre-training on a much wider dataset that strives to tokenize reality, as opposed to just language and other creative outout, having a more realistic path to AGI. Especially if it was fine tuned with some sort of feedback mechanism that could approximate real world cause and effect. Maybe you'd need sensory feedback. But that's speculating on technology that doesn't exist yet. Right now world models are mostly focused on improving robotics. As far as I know, no one has tried to make a super-sized general world model. It would take the resources of one of the frontier labs to attempt it.

    My perspective is that AGI is still roughly comparable to stable fusion power. There's no reason to believe it can't be done, but it will most likely be "just around the corner" for years and years.

    7 votes
    1. kaffo
      Link Parent
      Thanks for the detailed reply. Very interesting take on the "world model" idea, that makes a lot of sense in terms of giving the model some context of the real world as opposed to just our...

      Thanks for the detailed reply.

      Very interesting take on the "world model" idea, that makes a lot of sense in terms of giving the model some context of the real world as opposed to just our language.

      I do agree with the take that gen AI won't lead to general AI but will help pave the way. Though I suspect there will be a lot of media coverage along the way (not that we don't get plenty of it already!) about how gen AI is actually already general AI and has thoughts and feelings.

      1 vote
  5. [2]
    V17
    Link
    This is literally a billion dollar question. Nobody really knows. Imo the answers are Yes, though no idea how different the technology is going to be from the one we have now - it could be just...

    This is literally a billion dollar question. Nobody really knows. Imo the answers are

    Is gen AI going to take us to general intelligence?

    Yes, though no idea how different the technology is going to be from the one we have now - it could be just incremental development from LLMs gradually taking us someplace else, not necessarily a paradigm change. I think it can be as little as a decade away, depending on whether/when we manage to get to recursive improvements, using AI to improve itself.

    (to be clear this worries me a lot, and I wish it didn't happen, but I think it will)

    Will gen AI get to a place where it's "intelligence" and reasoning is actually better than the sum of Humanity?

    "sum of Humanity" is very strong, I wouldn't bet on that specifically, though I guess it depends on definition. I think there's a big difference between a theoretical best possible sum of humanity, our potential, and a realistic sum of humanity, humanity that is uncooperative, tribalistic, irrational and full of conflicts. The latter, of course, seems more likely to be beaten.

    4 votes
    1. kaffo
      Link Parent
      Interesting, one of the few people who thinks we will get to general AI "soon"! In your opinion how far away do you think current models are from "general AI" in terms of capability?

      Interesting, one of the few people who thinks we will get to general AI "soon"!
      In your opinion how far away do you think current models are from "general AI" in terms of capability?

  6. TurtleCracker
    (edited )
    Link
    Lots of great replies here, so I’ll keep mine short. My issue with framing LLMs as a path to “real” AI/AGI is that they can’t effectively learn in real time. If you ask it something, it does it...

    Lots of great replies here, so I’ll keep mine short.

    My issue with framing LLMs as a path to “real” AI/AGI is that they can’t effectively learn in real time. If you ask it something, it does it wrong, and you correct it then it won’t answer that question correctly for the next person. This is a key thing that human intelligence is capable of.

    4 votes
  7. [3]
    Staross
    Link
    I think the lack of online learning/catastrophic forgetting is a major limitation currently, to be truly smart a model should be able to learn new information in a reliable way, without a finicky...

    I think the lack of online learning/catastrophic forgetting is a major limitation currently, to be truly smart a model should be able to learn new information in a reliable way, without a finicky and costly separate training procedure. Probably the training procedure where all the data is learned as once is an issue in itself (that's not how we learn).

    https://en.wikipedia.org/wiki/Catastrophic_interference

    3 votes
    1. [2]
      kaffo
      Link Parent
      Ah, I didn't know it had a proper term, but that's been on my mind. It's very valid. The context is a weird format for memory, especially for gen AI. Since it essentially drives the output and it...

      Ah, I didn't know it had a proper term, but that's been on my mind. It's very valid.
      The context is a weird format for memory, especially for gen AI. Since it essentially drives the output and it also has perfect "recall" of everything in the context, the output would always be something "silicon based" and unnatural to us in my opinion.

      I suspect there's a format for memory that we haven't thought of yet. The current implementations of "memory" all suck, and they have no real signs of getting better. Especially since at the end of the day, all they do is modify the context.

      2 votes
      1. Greg
        (edited )
        Link Parent
        I strongly suspect that allowing models to recursively fine tune their own weights is a better allegory for experiential memory (as much as such a term can apply to a system that probably isn’t...

        I strongly suspect that allowing models to recursively fine tune their own weights is a better allegory for experiential memory (as much as such a term can apply to a system that probably isn’t yet capable of experiencing), possibly with something like DeepSeek’s engram lookups as short term/factual memory. Although perhaps not having those lookups would be more organic… just relying on an imperfect fine tuning process that modifies the “brain” as a whole does seem a better analogy for organic remembering.

        A lot of the really cool stuff just isn’t being done right now because it’s not particularly likely to be commercially beneficial. A model that modifies itself and remembers imperfectly but holistically is fascinating, but likely less useful than the way we do it now, and/or the way DeepSeek are looking at doing it.

        Which is a shame, because I much prefer fascinating, but research grants are limited in a way that AI company budgets don’t seem to be.


        [Edit] Having now thoroughly nerd sniped myself with this, I'm seeing something like a system that runs a couple of lightweight LoRA training steps using the entire context window as the training data after every input or output is completed, merging those back to the underlying model weights each time to get a new overall model state. Probably literally less than 20 training steps, I'd imagine, on a tensor bootstrapped directly from the existing state of the weights at that point in time (although a smallish LoRA layer would converge quickly anyway), because most of the same context window is going to be passed back in to the next brief training session, and the next, and the next, with things being repeated and "thought about" until they fall past the limits of the window. The context window remains in place as well, to serve as working memory of the actual conversation, but it's capped at a shorter length than modern systems are capable of, on the basis that we're trying to push the model into making stronger use of its "experiential" memory baked into the weights. I imagine each fine tuning pass would also need to apply some kind of exponential decay function to the context window, maybe breaking it into shorter chunks of conversation (couple of sentences each) and skewing the training sampler heavily towards selecting more recent ones - things that are "fresh in the model's mind" are more likely to be "dwelled upon" and encoded into long term memory, altering the "brain" and "mind" as a whole (although altering them far more strongly in the small targeted areas related to that memory), but things from earlier in the conversation might also be sampled with lower probability, "popping back into the model's mind" and similarly reinforcing as memories. This seems far closer to a true continuity of experience for the model, again at least in as much as that term can make sense at all here.

        Of course this is absurdly expensive, compute intensive, slow, potentially buggy (garbage in, garbage out, apart from anything else), risky (it would invalidate basically all guardrails and alignment training), storage intensive (especially because you likely couldn't safely share a model tuned like this between users without horrible data leaks and just straight up confusion between threads of conversation), and really just not viable as a user facing idea in any way. But damn it would be fun as an experiment to run with a single-session model on dedicated hardware and a small cohort of "friends and colleagues" of the model to converse with it.

        3 votes
  8. [2]
    Greg
    Link
    Why not? Genuine question, I’m interested to hear it in your words, because this is a very big assumption that seems to have slipped in kind of unexamined. I’m being a little annoyingly Socratic...

    Gen AI should (in theory) never be able to out preform or push the boundary of the sum of humanity at time of training.

    Why not? Genuine question, I’m interested to hear it in your words, because this is a very big assumption that seems to have slipped in kind of unexamined. I’m being a little annoyingly Socratic here, but I do think it’s fascinating to trace how people think about these things.

    To be clear, this isn’t my way of obliquely saying I think models are going to go full AGI on current or near future tech or anything, it’s more just that I think it’s an interesting axiom to have built the question on when it doesn’t match up with what I see of even 2025-era models.

    3 votes
    1. kaffo
      Link Parent
      I concerned a rant in the post but decided against it, I didn't want to put too much opinion in the top level. Maybe I should have put it in a collapsed section or just my own comment. Anyway yes,...

      I concerned a rant in the post but decided against it, I didn't want to put too much opinion in the top level. Maybe I should have put it in a collapsed section or just my own comment.

      Anyway yes, thanks for asking. My opinion is that models and/or agents, given the right training and data, then have the capability to produce content which exceeds that boundary. But I would put it down to a combination of randomness and any software (the agent) to capture the high quality output and drop the low quality stuff.
      Also I don't believe the step change would ever be large. We've proven that turning up the randomness on these models just produces more noise. I think there's a sweet spot where it'll start to produce content +/- 5 or 10% the threshold. You could capture that content above the line using some metric (would be difficult with the kind of boundary pushing content we are talking about) then feed that back into the next training set.

      So yeah, I think it's possible, but I don't think it's reliably pushing the boundary nor is it a large jump.

      What I would like to see is models getting more specific and less general. Iterate training on a model that only does math, or law, or software engineer, etc. Give it focus, cut out the context it doesn't need and see if it can seriously push the boundary.

  9. [2]
    tildes-user-101
    Link
    Definitely interested to hear answers for folks that understand the process better than I do. IMO fable was a genuine generational leap over the previous models (based purely on the scope of the...

    Definitely interested to hear answers for folks that understand the process better than I do. IMO fable was a genuine generational leap over the previous models (based purely on the scope of the projects I was able to undertake with it compared to Opus and how much more reliable its output was), and my guess is that it was in the post training step.

    My very limited understanding is that the biggest models have already been trained on almost all publicly available data so I don’t see big leaps coming from there. Which is why my guess is that Fabel was a post training turning/harness improvement. So I would love to better understand how models will continue to improve and grow more useful going forward.

    2 votes
    1. kaffo
      Link Parent
      That's interesting to know! Thanks for sharing, especially now Fable was locked down.

      That's interesting to know! Thanks for sharing, especially now Fable was locked down.

  10. skybrian
    Link
    Now that AI chatbots are typically an LLM augmented with tools, there’s no natural limit beyond what computers can do. The tools could do anything the LLM can’t do on its own. AI research is...

    Now that AI chatbots are typically an LLM augmented with tools, there’s no natural limit beyond what computers can do. The tools could do anything the LLM can’t do on its own.

    AI research is moving rapidly and putting any bound on what researchers might come up with is very hard.

    1 vote
  11. [3]
    delphi
    Link
    "GenAI" is a really useless term for what the question here actually is, and without getting into it I really don't like how the term has become to used, especially in circles that don't much...

    "GenAI" is a really useless term for what the question here actually is, and without getting into it I really don't like how the term has become to used, especially in circles that don't much think about the topic and use it as a shorthand for "the product I don't like". Yes, obviously you mean LLMs and image diffusion models, but stay with me here.

    Let's say that we get full, real Artificial Intelligence. Cmdr Data, Durandal, GLaDOS, Nick Valentine. A computer that is indistinguishable from a human in terms of their inner life. Would this not still be generative? Would these systems, human by any philosophical definition, generate their output? Don't humans do that now? I don't think it's a meaningful distinction.

    As for your question, I personally do not think that LLMs are or can ever be conscious, but I'm not an expert. I don't think LLMs get us to the point of Strong AGI. It's certainly worth examining, I think that research that Anthropic did a while back where they injected thought vectors into an LLMs reasoning space and it could retrieve the general "shape" of these ideas was fascinating, and while I'm pretty cynical about this I'll err on the side of caution and say that, sure, maybe, in some way, whatever's going on inside any given model may approximate the same mechanisms that in humans eventually cause sentience to emerge.

    But are we weeks, months or even years away from OpenAI releasing Consciousness-as-a-Service? I don't think so.

    1. creesch
      (edited )
      Link Parent
      The company I work for and many others are all in on ai use and explicitly use the term genAI. To me the term is one used by those in management and suffering from the corporate fomo.

      especially in circles that don't much think about the topic and use it as a shorthand for "the product I don't like".

      The company I work for and many others are all in on ai use and explicitly use the term genAI. To me the term is one used by those in management and suffering from the corporate fomo.

      1 vote
    2. kaffo
      Link Parent
      No, you have a point, it's not a good name. I mean "AI" isn't a good name for LLMs right now either, it's all marketing. Though, I'm afraid it might be one of those things we're stuck with for now...

      No, you have a point, it's not a good name. I mean "AI" isn't a good name for LLMs right now either, it's all marketing.

      Though, I'm afraid it might be one of those things we're stuck with for now until the "next thing" comes along. But yeah, it will likely be the case that whatever is next is better at "generating" than generative AI.

      1 vote
  12. LumaBop
    Link
    In the general case, it seems likely. However I think there are some domains where it’s possible LLMs/agents will be able to improve indefinitely. By the way, this is informed speculation, not an...

    In the general case, it seems likely. However I think there are some domains where it’s possible LLMs/agents will be able to improve indefinitely. By the way, this is informed speculation, not an evidence-backed claim (and in general I’m an LLM sceptic).

    There are certain domains and types of problems where constructing a solution is hard, but verifying a solution to once found is relatively simple. To give a concrete example: solving integrals is hard in general, but verifying a solution involves a relatively easier differentiation process; contrast this with a problem such as finding the shortest tour of a large number of locations (e.g. ”find the shortest route, starting at Paris, which visits every European capital exactly once and returns to Paris”) - even if I told you a route, it’s not trivial to confirm that it is indeed the shortest.

    For domains concerned with problems in the “hard to solve, easy to check” category, it seems at least in principle possible that, if agents are paired with a suitable “checking” (verification) tool, they could always have a good learning signal to continuously improve (since all agent output can be accurately labelled as “good” or “bad”). So, hypothetically, recursively training models on prior model output would allow continuous improvement in problem solving abilities. That’s the opposite of what it is understood happens in the general case where LLMs are trained on their own output, which is model collapse.

    Certainly several sub-fields of maths and computer science are “hard to solve, easy to check”, so I wonder if LLM ability may not hit a ceiling in those domains.

  13. Weldawadyathink
    Link
    One thing that works very well is synthetic training data. I believe this is much of what has powered the recent (1-2 years) of LLM innovation. It's actually a pretty simple concept. For a coding...

    One thing that works very well is synthetic training data. I believe this is much of what has powered the recent (1-2 years) of LLM innovation. It's actually a pretty simple concept. For a coding type example, take an existing codebase that is not AI generated. Take your "dumb" AI and have it remove a feature. Even the AIs that were not good at coding could do that pretty well. Also have it generate a user query requesting that feature be implemented. Again, small models that are bad at coding can do this easily. Now you have a training problem that includes a codebase without a feature, a user request to implement it, and a codebase with the feature. Everything except the final state was AI generated, and can be done easily with very old model technology (haiku, GPT 3.5 turbo, etc). And this process can be easily automated to generate a ton of training data.

    I kinda assumed the same back in the GPT 3.5 turbo era. But techniques like this seem to have worked to get us past that.