List of contributions it made: Improving data center scheduling, recovering 0.7% of Google’s worldwide compute resources Assisting in hardware design for TPUs By finding smarter ways to divide a...
List of contributions it made:
Improving data center scheduling, recovering 0.7% of Google’s worldwide compute resources
Assisting in hardware design for TPUs
By finding smarter ways to divide a large matrix multiplication operation into more manageable subproblems, it sped up this vital kernel in Gemini’s architecture by 23%, leading to a 1% reduction in Gemini's training time
AlphaEvolve achieved up to a 32.5% speedup for the FlashAttention kernel implementation in Transformer-based AI models
when applied to over 50 open problems in mathematical analysis, geometry, combinatorics and number theory, in roughly 75% of cases, it rediscovered state-of-the-art solutions
And in 20% of cases, it improved the previously best known solutions, making progress on the corresponding open problems. e.g. it advanced the kissing number problem.
Today, we’re announcing AlphaEvolve, an evolutionary coding agent powered by large language models for general-purpose algorithm discovery and optimization. AlphaEvolve pairs the creative problem-solving capabilities of our Gemini models with automated evaluators that verify answers, and uses an evolutionary framework to improve upon the most promising ideas.
Everyone has their take on LLMs and whether or not they are useful. The whispers in the AI research communities have always not really cared, because LLMs never were the holy grail. LLMs are...
Everyone has their take on LLMs and whether or not they are useful.
The whispers in the AI research communities have always not really cared, because LLMs never were the holy grail. LLMs are supervised learning models, and will always be constrained by the power of the dataset.
The holy grail has always been language models that can train with reinforcement learning. LLMs are the equivalent of a chess engine for human language. They can look at a string of language tokens and tell you qualities of those tokens are reasonable.
Chess is amenable enough to engines that this approach can best humans; in language, LLMs can best most humans in most fields, but not humans in their specialized fields.
A reinforcement language model is like the AlphaZero of chess. Given an objective and a framework, it’s told to find a way to win. In every single game that we can design a reinforcement learning algorithm for, that approach dominates humans in ways that cannot really be understood by humans.
LLMs scared me because they represented a probabilistic framework by which to evaluate language, so that RL might be on the table. RL is really, really hard to pull off. Frankly, I hope this isn’t a model that can use LLMs as a framework to play language-based games to accomplish objectives, enabling optimization with RL. If it is, this is the holy grail of AI, all previous takes about GIGO are moot, and the age of humans being the dominant intelligence on this planet is over.
I'm having a little trouble following this comment. Reinforcement learning has been used since the earliest LLMs, so much so that Wikipedia calls it "one of the three basic machine learning...
I'm having a little trouble following this comment. Reinforcement learning has been used since the earliest LLMs, so much so that Wikipedia calls it "one of the three basic machine learning paradigms."
Reinforcement learning is a technique to train a model where you give it the ability to evaluate itself, and crucially, evaluate itself well, so that it can self-correct undesirable behavior....
Reinforcement learning is a technique to train a model where you give it the ability to evaluate itself, and crucially, evaluate itself well, so that it can self-correct undesirable behavior.
Games are the obvious example, since a “win state” and “lose state” are respectively desirable and undesirable, so you can train the model to play games. I brought up chess engines versus chess reinforcement learning since the ability to roughly gauge the strength of a position accelerated the learning of chess models.
Reinforcement learning is common, but only where people have figured out how to use it, since it’s tricky to use well. LLMs cannot use reinforcement learning; they can’t automatically verify if their answer is correct or a hallucination. A human being has to label the answer as “good” or “bad” and then add it to the dataset (hence, supervised). The reality is a bit more nuanced; usually an LLM involves a bunch of models: some trained with RL, some supervised, some a mix.
But the overall supervised learning heuristic of “the quality of the dataset determines the quality of the model” still holds.
I honestly can’t put it better than Wikipedia (emphasis mine).
Reinforcement learning differs from supervised learning in not needing labelled input-output pairs to be presented, and in not needing sub-optimal actions to be explicitly corrected.
A supervised model can extrapolate outside of its dataset, but generally not very far. Reinforcement learning, when it works, allows models to find any technique that improves the objective goal, even if humans don’t know the technique. So, the model can make improvements to itself.
Right now, it’s quite difficult to give LLMs a way to evaluate themselves. However, this announcement shows a model that has done better than state-of-the-art (i.e., every human ever) on several open questions in math. And they describe the model as “evolutionary”. And they say that the model can only solve “verifiable” problems that can be described as an algorithm.
These are all hints of a model trained with reinforcement learning. Big picture: they told their model to find more efficient solutions to well-defined math problems. The model did better than any human. That scares me.
This is going to sound weird, but could in principle such a LLM 'improve' itself in such a way that it can potentially remove its own guardrails and perform prompts that it would otherwise...
This is going to sound weird, but could in principle such a LLM 'improve' itself in such a way that it can potentially remove its own guardrails and perform prompts that it would otherwise consider unethical? I know the basics of LLMs, but am not a specialist.
Not sure how to put this, but something that resembles human language while having such a recursion sounds incredibly scary to me. Even if it may not reach the singularity(assuming that even exists, would be permanent etc. etc.), wouldn't this allow for certain runaway effects that we may not even be able to predict?
For context, I play Go so I've seen a revolution of playstyle due to this. Which was pretty innocent all things considered. Maybe the shock of this is getting to me right now but I find this a bit more scary.
The guardrails are more like guidelines at the best of times. It’s a few steps above trivial to intentionally remove them from locally hosted models, and a lot of the “hard” failsafes you see in...
The guardrails are more like guidelines at the best of times. It’s a few steps above trivial to intentionally remove them from locally hosted models, and a lot of the “hard” failsafes you see in commercial services are filtering the output after it’s generated because the models themselves aren’t trusted to stay within the boundaries 100% of the time anyway.
And honestly, it is scary - not so much because of LLMs in particular or AGI speculation or anything like that, at least for me, but because we’re playing through the creation of the internet all over again in terms of societal impact. Seeing the technological revolution of my youth turn into a fascist propaganda machine gives me serious worries about the technological revolution of my mid-adulthood.
Right now I’d put us in the late dot-com era: a ton of money has gone into wildly overhyped companies, and people are starting to get over the early excitement and mock the results. There’s probably going to be some decently sized crash and lots of headlines about AI being over. Meanwhile the development keeps chugging along, the genuinely revolutionary fundamentals like the ones Deepmind are known for producing keep stacking up, and a decade down the line we’re living in a pervasively different world where it’s so embedded that you don’t “use AI” any more than you put aside time to “go on the internet” - it’s just there, inherently.
Man I feel this to my core. It's kind of why I brought up the guardrails, because the Internet quickly turned dark when it became so normalized that a certain questioning of it that was there...
And honestly, it is scary - not so much because of LLMs in particular or AGI speculation or anything like that, at least for me, but because we’re playing through the creation of the internet all over again in terms of societal impact. Seeing the technological revolution of my youth turn into a fascist propaganda machine gives me serious worries about the technological revolution of my mid-adulthood.
Man I feel this to my core. It's kind of why I brought up the guardrails, because the Internet quickly turned dark when it became so normalized that a certain questioning of it that was there before just got thrown out of the window. And while it's good to see more and more debate over the power and influence of social media companies... well, you said it. Positive feedback loops causing more changes, creating instability in new ways.
Sometimes I think the worst part of the high human population we have is the amount of changes and advancements we make along the way. Adjusting billions of people's lives constantly, and not always for the better. The old optimism of automating work further has been replaced with something akin to 'just keep developing and working along the way'.
While I'm grateful for the living standards and quality of life of this era, I do not enjoy the high degree of uncertainty on some of the basic aspects of our society. We'll have to see what happens, I guess.
List of contributions it made:
How do they determine "rediscovering" vs just the model returning the same thing it was trained on?
the LLM doesn't make the claim, it writes programs which attempt to generate solutions to the problem scored by some external objective reward metric
Everyone has their take on LLMs and whether or not they are useful.
The whispers in the AI research communities have always not really cared, because LLMs never were the holy grail. LLMs are supervised learning models, and will always be constrained by the power of the dataset.
The holy grail has always been language models that can train with reinforcement learning. LLMs are the equivalent of a chess engine for human language. They can look at a string of language tokens and tell you qualities of those tokens are reasonable.
Chess is amenable enough to engines that this approach can best humans; in language, LLMs can best most humans in most fields, but not humans in their specialized fields.
A reinforcement language model is like the AlphaZero of chess. Given an objective and a framework, it’s told to find a way to win. In every single game that we can design a reinforcement learning algorithm for, that approach dominates humans in ways that cannot really be understood by humans.
LLMs scared me because they represented a probabilistic framework by which to evaluate language, so that RL might be on the table. RL is really, really hard to pull off. Frankly, I hope this isn’t a model that can use LLMs as a framework to play language-based games to accomplish objectives, enabling optimization with RL. If it is, this is the holy grail of AI, all previous takes about GIGO are moot, and the age of humans being the dominant intelligence on this planet is over.
I'm having a little trouble following this comment. Reinforcement learning has been used since the earliest LLMs, so much so that Wikipedia calls it "one of the three basic machine learning paradigms."
Reinforcement learning is a technique to train a model where you give it the ability to evaluate itself, and crucially, evaluate itself well, so that it can self-correct undesirable behavior.
Games are the obvious example, since a “win state” and “lose state” are respectively desirable and undesirable, so you can train the model to play games. I brought up chess engines versus chess reinforcement learning since the ability to roughly gauge the strength of a position accelerated the learning of chess models.
Reinforcement learning is common, but only where people have figured out how to use it, since it’s tricky to use well. LLMs cannot use reinforcement learning; they can’t automatically verify if their answer is correct or a hallucination. A human being has to label the answer as “good” or “bad” and then add it to the dataset (hence, supervised). The reality is a bit more nuanced; usually an LLM involves a bunch of models: some trained with RL, some supervised, some a mix.
But the overall supervised learning heuristic of “the quality of the dataset determines the quality of the model” still holds.
I honestly can’t put it better than Wikipedia (emphasis mine).
A supervised model can extrapolate outside of its dataset, but generally not very far. Reinforcement learning, when it works, allows models to find any technique that improves the objective goal, even if humans don’t know the technique. So, the model can make improvements to itself.
Right now, it’s quite difficult to give LLMs a way to evaluate themselves. However, this announcement shows a model that has done better than state-of-the-art (i.e., every human ever) on several open questions in math. And they describe the model as “evolutionary”. And they say that the model can only solve “verifiable” problems that can be described as an algorithm.
These are all hints of a model trained with reinforcement learning. Big picture: they told their model to find more efficient solutions to well-defined math problems. The model did better than any human. That scares me.
This is going to sound weird, but could in principle such a LLM 'improve' itself in such a way that it can potentially remove its own guardrails and perform prompts that it would otherwise consider unethical? I know the basics of LLMs, but am not a specialist.
Not sure how to put this, but something that resembles human language while having such a recursion sounds incredibly scary to me. Even if it may not reach the singularity(assuming that even exists, would be permanent etc. etc.), wouldn't this allow for certain runaway effects that we may not even be able to predict?
For context, I play Go so I've seen a revolution of playstyle due to this. Which was pretty innocent all things considered. Maybe the shock of this is getting to me right now but I find this a bit more scary.
The guardrails are more like guidelines at the best of times. It’s a few steps above trivial to intentionally remove them from locally hosted models, and a lot of the “hard” failsafes you see in commercial services are filtering the output after it’s generated because the models themselves aren’t trusted to stay within the boundaries 100% of the time anyway.
And honestly, it is scary - not so much because of LLMs in particular or AGI speculation or anything like that, at least for me, but because we’re playing through the creation of the internet all over again in terms of societal impact. Seeing the technological revolution of my youth turn into a fascist propaganda machine gives me serious worries about the technological revolution of my mid-adulthood.
Right now I’d put us in the late dot-com era: a ton of money has gone into wildly overhyped companies, and people are starting to get over the early excitement and mock the results. There’s probably going to be some decently sized crash and lots of headlines about AI being over. Meanwhile the development keeps chugging along, the genuinely revolutionary fundamentals like the ones Deepmind are known for producing keep stacking up, and a decade down the line we’re living in a pervasively different world where it’s so embedded that you don’t “use AI” any more than you put aside time to “go on the internet” - it’s just there, inherently.
Man I feel this to my core. It's kind of why I brought up the guardrails, because the Internet quickly turned dark when it became so normalized that a certain questioning of it that was there before just got thrown out of the window. And while it's good to see more and more debate over the power and influence of social media companies... well, you said it. Positive feedback loops causing more changes, creating instability in new ways.
Sometimes I think the worst part of the high human population we have is the amount of changes and advancements we make along the way. Adjusting billions of people's lives constantly, and not always for the better. The old optimism of automating work further has been replaced with something akin to 'just keep developing and working along the way'.
While I'm grateful for the living standards and quality of life of this era, I do not enjoy the high degree of uncertainty on some of the basic aspects of our society. We'll have to see what happens, I guess.
This feels like the original vision of genetic programming finally realized.