This a fairly recent talk Yann LeCun, who is a legendary figure in the machine learning world, gave at UC Berkeley. It's a bit in the weeds, but even if you have to ignore some of the technicals,...
This a fairly recent talk Yann LeCun, who is a legendary figure in the machine learning world, gave at UC Berkeley. It's a bit in the weeds, but even if you have to ignore some of the technicals, I think given all the recent buzz in this area you'd still find the content interesting.
In it, LeCun argues that autoregressive generative models (like GPT), in addition to... practically every other machine learning model that is mature today, and are dead ends towards general intelligence. He proposes new areas that are more promising for the latter goal.
Just at the start of the talk, it's really weird how he ignores evolution. Humans aren't some blank-slate that learn everything completely from scratch. We have uncountable hours of 'training'...
Just at the start of the talk, it's really weird how he ignores evolution. Humans aren't some blank-slate that learn everything completely from scratch. We have uncountable hours of 'training' from previous humans, and from whatever we were before we were humans (homo-erectus?, fish?). Really, all we are is a bit of fine-tuning. So any criticism of machine learning that starts by comparing to human learning seems fundamentally flawed to me.
The EBM architecture seems interesting, but I will comment that the 'world model' paradigm, in particular the hierarchical world model idea seems flawed, in the same way symbolic AI seems flawed: my intuition says that these kind of world modelling behaviors will be emergent, rather than a result of the architecture. But at the same time it's definitely a more "intellectually satisfying" kind of architecture. In the end the proof is in the pudding, I guess.
Current machine learning models can't really do that, though. Part of what was exciting about LLMs like GPT was that they're comparatively good at zero or few shot learning. Heavy emphasis on...
we have uncountable hours of 'training' from previous humans, and from whatever we were before we were humans (homo-erectus?, fish?). Really, all we are is a bit of fine-tuning.
Current machine learning models can't really do that, though. Part of what was exciting about LLMs like GPT was that they're comparatively good at zero or few shot learning. Heavy emphasis on "comparatively", though. And LeCun does not believe in autoregressive models due to exponential error compounding.
Whereas most other models are quite honestly very bad at them. If you take a large model trained on, say, image detection, and try to fine tune it to do completely different image-related task, it will fail quite utterly. So that part where humans and animals can make use of our prior cognitions to learn novel tasks relatively quickly is something that really hasn't existed in machine learning yet. Very large LLMs have shown some ability to do this, but they are both novel and some concerns about their scalability.
And in the end LeCun is really comparing them quite abstractly in terms of power, for use in comparative purposes.
Apologies for the double reply, but I've been thinking more on the second point, on emergent modeling vs explicit. From LeCun's angle, I think the issue with having the networks learn these...
Apologies for the double reply, but I've been thinking more on the second point, on emergent modeling vs explicit. From LeCun's angle, I think the issue with having the networks learn these processes as part of emergent behavior is that the more you want the network to be emergent, the more you have to move "upwards" in terms of training data.
That is to say, for something like the EBM, for a model to implicitly model something like that, you'd need to train it on the most baseline form of data (in this case, with your input data being the world, the output being an action, and the objective being to optimize the state of the world after the action is taken), which would turn back into reinforcement learning. The issue is that reinforcement learning has incredibly low information density. In practice, it has been brittle and reliant on huge amounts of iterations.
To bring it back to animals, animals don't seem to need to do this, or at least a complete loop, which indicates there's another way. That's important since going at it from pure RL may just be computationally impractical forever.
By having discrete networks in a graph, you can avoid that by more efficiently training each component to do their job. LeCun would probably say to aim to have as many if not every model trained with self-supervised learning.
It may be possible that something akin to evolutionary reinforcement learning over several hundred million years is how organisms developed these emergent systems, which, by the time it gets to a human child, is fine tuned on their observations. But hundreds of millions of years is a lot of time, even if gradient descent should be better than evolution as an optimization method. We may be able to side-step all that by explicitly encoding the systems.
I think I agree with you. We're already encoding a lot of knowledge/assumptions about cognition in the LLM architectures. I suspect advances will be more a result of emergent behavior due to...
I think I agree with you. We're already encoding a lot of knowledge/assumptions about cognition in the LLM architectures. I suspect advances will be more a result of emergent behavior due to increasing compute/data than it will be a cognition-inspired architecture, but honestly I have no idea.
This a fairly recent talk Yann LeCun, who is a legendary figure in the machine learning world, gave at UC Berkeley. It's a bit in the weeds, but even if you have to ignore some of the technicals, I think given all the recent buzz in this area you'd still find the content interesting.
In it, LeCun argues that autoregressive generative models (like GPT), in addition to... practically every other machine learning model that is mature today, and are dead ends towards general intelligence. He proposes new areas that are more promising for the latter goal.
Just at the start of the talk, it's really weird how he ignores evolution. Humans aren't some blank-slate that learn everything completely from scratch. We have uncountable hours of 'training' from previous humans, and from whatever we were before we were humans (homo-erectus?, fish?). Really, all we are is a bit of fine-tuning. So any criticism of machine learning that starts by comparing to human learning seems fundamentally flawed to me.
The EBM architecture seems interesting, but I will comment that the 'world model' paradigm, in particular the hierarchical world model idea seems flawed, in the same way symbolic AI seems flawed: my intuition says that these kind of world modelling behaviors will be emergent, rather than a result of the architecture. But at the same time it's definitely a more "intellectually satisfying" kind of architecture. In the end the proof is in the pudding, I guess.
Current machine learning models can't really do that, though. Part of what was exciting about LLMs like GPT was that they're comparatively good at zero or few shot learning. Heavy emphasis on "comparatively", though. And LeCun does not believe in autoregressive models due to exponential error compounding.
Whereas most other models are quite honestly very bad at them. If you take a large model trained on, say, image detection, and try to fine tune it to do completely different image-related task, it will fail quite utterly. So that part where humans and animals can make use of our prior cognitions to learn novel tasks relatively quickly is something that really hasn't existed in machine learning yet. Very large LLMs have shown some ability to do this, but they are both novel and some concerns about their scalability.
And in the end LeCun is really comparing them quite abstractly in terms of power, for use in comparative purposes.
Evolution comes up in the Q&A as well so you will probably be doubly unsatisfied.
Apologies for the double reply, but I've been thinking more on the second point, on emergent modeling vs explicit. From LeCun's angle, I think the issue with having the networks learn these processes as part of emergent behavior is that the more you want the network to be emergent, the more you have to move "upwards" in terms of training data.
That is to say, for something like the EBM, for a model to implicitly model something like that, you'd need to train it on the most baseline form of data (in this case, with your input data being the world, the output being an action, and the objective being to optimize the state of the world after the action is taken), which would turn back into reinforcement learning. The issue is that reinforcement learning has incredibly low information density. In practice, it has been brittle and reliant on huge amounts of iterations.
To bring it back to animals, animals don't seem to need to do this, or at least a complete loop, which indicates there's another way. That's important since going at it from pure RL may just be computationally impractical forever.
By having discrete networks in a graph, you can avoid that by more efficiently training each component to do their job. LeCun would probably say to aim to have as many if not every model trained with self-supervised learning.
It may be possible that something akin to evolutionary reinforcement learning over several hundred million years is how organisms developed these emergent systems, which, by the time it gets to a human child, is fine tuned on their observations. But hundreds of millions of years is a lot of time, even if gradient descent should be better than evolution as an optimization method. We may be able to side-step all that by explicitly encoding the systems.
I think I agree with you. We're already encoding a lot of knowledge/assumptions about cognition in the LLM architectures. I suspect advances will be more a result of emergent behavior due to increasing compute/data than it will be a cognition-inspired architecture, but honestly I have no idea.