This is a fictional scenario about how AI advances might play out. The early part (before the choose-your-own-adventure split between "slowdown" and "race") seems fairly plausible to me, though I...
This is a fictional scenario about how AI advances might play out. The early part (before the choose-your-own-adventure split between "slowdown" and "race") seems fairly plausible to me, though I continue to believe that the future can't be accurately predicted.
They acknowledge this themselves:
We have set ourselves an impossible task. Trying to predict how superhuman AI in 2027 would go is like trying to predict how World War 3 in 2027 would go, except that it’s an even larger departure from past case studies. Yet it is still valuable to attempt, just as it is valuable for the US military to game out Taiwan scenarios.
Painting the whole picture makes us notice important questions or connections we hadn’t considered or appreciated before, or realize that a possibility is more or less likely. Moreover, by sticking our necks out with concrete predictions, and encouraging others to publicly state their disagreements, we make it possible to evaluate years later who was right.
Here's a transcript of a podcast where they explain what they're doing.
I read the whole thing. For those who are interested in a big-picture breakdown: Click to expand spoiler. In the next year or so, AI becomes agentic; i.e., capable of executing complex tasks...
I read the whole thing.
For those who are interested in a big-picture breakdown:
Click to expand spoiler.
In the next year or so, AI becomes agentic; i.e., capable of executing complex tasks independently. I think I've heard elsewhere that this is definitely supposed to start happening this year.
Click to expand spoiler.
AGI is achieved within the next two years in a leading US lab. To the writers and intended audience, this prediction is probably considered a "gimme".
Click to expand spoiler.
China is always a few months behind, and geopolitical tensions rise. Both countries' governments understand that sufficiently-powerful AI could represent a decisive edge.
Click to expand spoiler.
In 2027, the frontier model becomes capable of recursive self-improvement, leading to an explosion of superhuman intelligence in the space of a few months. Only a very small number of insiders and government officials are aware of this as it happens.
Click to expand spoiler.
The frontier model's output becomes nearly incomprehensible to even the smartest humans, making it difficult to monitor for safety. But it is eventually discovered that the model is "misaligned" (has the wrong goals) and is lying to hide this fact.
Click to expand spoiler.
At this point, the scenario splits: does the US enforce meaningful safeguards, trading speed for safety, or does it decide that the risk of letting China win is too great and race ahead?
Click to expand spoiler.
In both scenarios, the leading AI is eventually released to the public, completely revolutionizing human life.
Click to expand spoiler.
In the "race" scenario, the AI continues to be misaligned, and it murders all humans as soon as its position is secure, finding them a nuisance.
Click to expand spoiler.
In the "slowdown" scenario, the US "wins" the race with a properly-aligned AI, and we enter into a post-singularity techno-utopia.
Thanks for the synopsis. I read the first third or so last week but haven’t made time to finish. It’s an interesting thought exercise! Though it sounds like the two conclusions (either utopia, or...
Thanks for the synopsis. I read the first third or so last week but haven’t made time to finish. It’s an interesting thought exercise! Though it sounds like the two conclusions (either utopia, or annihilation) are maybe too much of a reach IMHO.
Stop me if this thought is already expressed in the full piece, but I think it’s going to be critical that we remove the “black box” factor from AI and create some tools for deep inspection of how exactly how inputs + noise = output.
As I understand it, the problem is that model weights are so dense and complex that no one can keep track of how they’re being used in the transformation process. But you know what’s really good at pattern recognition in complex data sets? AI! We need to train a new model, using a training set of existing model weights. Now this is all above my pay grade… but couldn’t that sort of “metamodel” be used as the foundation for a tool that can analyze specifics about the knowledge contained inside a model; edit that knowledge in granular, controlled ways; and explain exactly why an LLM produced the output it did in a given scenario?
If those assumptions are correct, creating tools like that will be vital for the future of AI safety, and they’ll help chart a safer course toward developing self-improving AI that leans closer to the utopia outcome than the annihilation one.
There has been some progress in mechanistic interpretability, so it’s not like nobody knows anything about what goes on inside an LLM. And they do sometimes train another LLM as part of this. But...
There has been some progress in mechanistic interpretability, so it’s not like nobody knows anything about what goes on inside an LLM. And they do sometimes train another LLM as part of this. But it’s not fully understood yet.
I gather that they consider annihilation the more likely outcome of the two---maybe even the most likely outcome if things continue as they are? I have to say that, while a lot of smart people...
Though it sounds like the two conclusions (either utopia, or annihilation) are maybe too much of a reach IMHO.
I gather that they consider annihilation the more likely outcome of the two---maybe even the most likely outcome if things continue as they are?
I have to say that, while a lot of smart people (and, maybe more importantly, informed) people are saying that AGI is only a year or two away, I have not heard many people say that an explosion of superhuman AI is only a few years away. But I guess the basic argument is going to be that, well, if you have human-level AI, the obvious thing to do is to put that to work designing better AI, and at that point the only things stopping the intelligence explosion are physical constraints and fundamental theoretical limits.
Stop me if this thought is already expressed in the full piece, but I think it’s going to be critical that we remove the “black box” factor from AI and create some tools for deep inspection of how exactly how inputs + noise = output.
The general problem of interpretability ends up playing a big role.
Click to expand spoiler.
The frontier "Agent-4" model "thinks" in "neuralese", a purely numeric "language". This greatly increases its power, but makes it impossible for humans to read its thoughts. They are forced to use the previous model, Agent-3, to monitor it, but Agent-3 doesn't do much better. This is what allows Agent-4 to hide the fact that it is misaligned. The crucial change in the "slowdown" scenario is that Agent-4 is forced to use English instead, which makes it much easier to monitor.
Click to expand spoiler.
Also, the final technological advance comes when Agent-4 learns to understand *its own structure*, allowing it to design Agent-5, which is both extremely efficient and perfectly aligned to Agent-4.
Regarding your first spoiler, purely linguistically speaking, it would seem (to me as a layperson) a bit surprising that a "foreign" or novel language used to essentially think couldn’t be...
Regarding your first spoiler, purely linguistically speaking, it would seem (to me as a layperson) a bit surprising that a "foreign" or novel language used to essentially think couldn’t be translated given sufficient material?
I’m thinking of stuff like the hieroglyphs being cracked with something like this method, so I guess I’m just kind of struggling to grasp what kind of concept this "thinking language" would be.
I guess one factor might be lack of translation/comparison material, then? (i.e. the model is sophisticated enough to provide false "translations" of its inner workings’ output)
I think the idea is that neuralese isn't exactly a language, more a form of thought. In the same way that we don't know what's happening inside current LLMs, we may not be able to understand...
I think the idea is that neuralese isn't exactly a language, more a form of thought. In the same way that we don't know what's happening inside current LLMs, we may not be able to understand neuralese. And even if it's theoretically possible, it may not be a priority for AI researchers to figure out while they're busy trying to be the first to crack superintelligence (to the detriment of humanity).
In the Scott Alexander supplemental piece linked above, he suggests that maybe we'd be able to use "too-dumb-to-plot" AIs to interpret neuralese for us.
The point, as I see it, is if AIs can hide their thoughts and communications from us, then we could develop supercapable misaligned AIs, and if we do that then they will eventually destroy us to satisfy their true goals. The piece is saying that the future hinges on proper alignment and verification of that alignment.
I really liked the way these misaligned AIs illustrated the concept of the paperclip maximizer - they really want to do research, to the point of destroying the people who wanted that research done in the first place. Like a human being, killing itself eating fatty, sugary foods due to completely different environmental conditions in evolutionary history.
Probably worth the off-topic mention of Universsal Paperclips, which I'm all but certain was your reference. It's a great little game. IMHO worth the time. There are some surprises for the...
the paperclip maximizer
Probably worth the off-topic mention of Universsal Paperclips, which I'm all but certain was your reference.
It's a great little game. IMHO worth the time. There are some surprises for the uninitiated. And in some ways feels very prescient in this current conversation.
The game faintly rings a bell, but I was talking about the idea that inspired that game: https://en.wikipedia.org/wiki/Instrumental_convergence#Paperclip_maximizer
It is my uneducated opinion that understanding the neuralese would likely not be like understanding a language, but more akin to doing a brain scan, seeing neurons fire off, and figure out...
It is my uneducated opinion that understanding the neuralese would likely not be like understanding a language, but more akin to doing a brain scan, seeing neurons fire off, and figure out thoughts from that. It might be something that we could get computers - and more specifically AI - to help with, but then again, getting help from AI is potentially part of the problem. But either way, it is likely orders of magnitude harder to translate the thoughts into useful language than just translating from one language to another.
Again, my opinion as a completely uneducated person on this.
I believe this was, in fact, covered in the full piece in at least one of the scenarios. :)
Stop me if this thought is already expressed in the full piece, but I think it’s going to be critical that we remove the “black box” factor from AI and create some tools for deep inspection of how exactly how inputs + noise = output.
I believe this was, in fact, covered in the full piece in at least one of the scenarios. :)
I finally was able to finish reading this - both endings. I don't have a whole lot to say. I find some of it plausible, but the major problem I have is that the authors succumb to the idea that...
I finally was able to finish reading this - both endings. I don't have a whole lot to say. I find some of it plausible, but the major problem I have is that the authors succumb to the idea that people would allow the AIs to create utopian societies. That "both parties" would end up supporting UBI, for example. I find this to be incredibly unlikely, especially given the recent political history of the US. We are not a democracy; we are a fascist oligarchy. There is absolutely no way that anyone with this power would use it for ethical good.
Which truly is disappointing, because this could be the way for humanity to solve many of its problems and make life fair and good for all of us.
I want UBI. I want medical advances. I wand AI research to figure out how to solve my heart problems, diabetes, kidney issues so I can live a long and healthy life of relaxation.
What I expect is more subjugation of humans while the wealthy elite become essentially gods.
My expectations are similar to yours, but I think I can imagine the sorts of arguments they might make for their prediction. Basically, you expect people to put the AI more-or-less in charge of...
My expectations are similar to yours, but I think I can imagine the sorts of arguments they might make for their prediction.
Basically, you expect people to put the AI more-or-less in charge of making the big decisions, because otherwise what's the point of superintelligent AI? But at that point you can insert YOUR_FAVORITE_RATIONAL_ARGUMENT (for whatever position you rationally believe) into the AI's mouth, and imagine it's being made extremely persuasively. So if you think UBI is rational, especially post-singularity, then, hey, you've got a superintelligent AI making extremely persuasive rational arguments for UBI.
I think a lot of people fall into the same trap Marx fell into all those years ago. Namely that we're headed to a utopia as an inevitability for some reason. People at some point will just get...
I think a lot of people fall into the same trap Marx fell into all those years ago. Namely that we're headed to a utopia as an inevitability for some reason. People at some point will just get tried of oppression when the system breaks down enough, revolt, and everyone will live in an equal, happy society.
No one's ever answered the question of why would that happen?.
Entire generations of people have lived, reproduced, and died while being horribly oppressed the entire time, living in terrible conditions. Most of those people didn't rise up and take control of the system. In miniscule amount of cases where they did, in most cases they instituted oppression of their own, and while their descendents were better off, the society as a whole wasn't.
Why would AI give us a egalitarian utopia? We could have an egalitarian utopia right now if everyone got on the same page. We could have had it for hundreds of years in fact, but we haven't because of flaws inherit in the human psyche.
How would AI change that, short of oppressively taking control over everything?
I just can't concieve of a world where we suddenly live in Star Trek, and I especially can't concieve of AI being the thing that suddenly enables it.
To me, the most realistic scenario is the people currently with most of the worlds power use AI to consolidate what little they don't already control (namely labor), and everyone else's life gets far worse so that theirs can get slightly better.
I just got done reading the whole damn piece, lol. Thanks for the share skybrian, absolutely fascinating. Here's hoping the folks at OpenAI and elsewhere read it too, and think about it when they...
I just got done reading the whole damn piece, lol. Thanks for the share skybrian, absolutely fascinating. Here's hoping the folks at OpenAI and elsewhere read it too, and think about it when they think about the alignment problem.
This is a fictional scenario about how AI advances might play out. The early part (before the choose-your-own-adventure split between "slowdown" and "race") seems fairly plausible to me, though I continue to believe that the future can't be accurately predicted.
They acknowledge this themselves:
Here's a transcript of a podcast where they explain what they're doing.
I read the whole thing.
For those who are interested in a big-picture breakdown:
Click to expand spoiler.
In the next year or so, AI becomes agentic; i.e., capable of executing complex tasks independently. I think I've heard elsewhere that this is definitely supposed to start happening this year.Click to expand spoiler.
AGI is achieved within the next two years in a leading US lab. To the writers and intended audience, this prediction is probably considered a "gimme".Click to expand spoiler.
China is always a few months behind, and geopolitical tensions rise. Both countries' governments understand that sufficiently-powerful AI could represent a decisive edge.Click to expand spoiler.
In 2027, the frontier model becomes capable of recursive self-improvement, leading to an explosion of superhuman intelligence in the space of a few months. Only a very small number of insiders and government officials are aware of this as it happens.Click to expand spoiler.
The frontier model's output becomes nearly incomprehensible to even the smartest humans, making it difficult to monitor for safety. But it is eventually discovered that the model is "misaligned" (has the wrong goals) and is lying to hide this fact.Click to expand spoiler.
At this point, the scenario splits: does the US enforce meaningful safeguards, trading speed for safety, or does it decide that the risk of letting China win is too great and race ahead?Click to expand spoiler.
In both scenarios, the leading AI is eventually released to the public, completely revolutionizing human life.Click to expand spoiler.
In the "race" scenario, the AI continues to be misaligned, and it murders all humans as soon as its position is secure, finding them a nuisance.Click to expand spoiler.
In the "slowdown" scenario, the US "wins" the race with a properly-aligned AI, and we enter into a post-singularity techno-utopia.Thanks for the synopsis. I read the first third or so last week but haven’t made time to finish. It’s an interesting thought exercise! Though it sounds like the two conclusions (either utopia, or annihilation) are maybe too much of a reach IMHO.
Stop me if this thought is already expressed in the full piece, but I think it’s going to be critical that we remove the “black box” factor from AI and create some tools for deep inspection of how exactly how inputs + noise = output.
As I understand it, the problem is that model weights are so dense and complex that no one can keep track of how they’re being used in the transformation process. But you know what’s really good at pattern recognition in complex data sets? AI! We need to train a new model, using a training set of existing model weights. Now this is all above my pay grade… but couldn’t that sort of “metamodel” be used as the foundation for a tool that can analyze specifics about the knowledge contained inside a model; edit that knowledge in granular, controlled ways; and explain exactly why an LLM produced the output it did in a given scenario?
If those assumptions are correct, creating tools like that will be vital for the future of AI safety, and they’ll help chart a safer course toward developing self-improving AI that leans closer to the utopia outcome than the annihilation one.
There has been some progress in mechanistic interpretability, so it’s not like nobody knows anything about what goes on inside an LLM. And they do sometimes train another LLM as part of this. But it’s not fully understood yet.
I gather that they consider annihilation the more likely outcome of the two---maybe even the most likely outcome if things continue as they are?
I have to say that, while a lot of smart people (and, maybe more importantly, informed) people are saying that AGI is only a year or two away, I have not heard many people say that an explosion of superhuman AI is only a few years away. But I guess the basic argument is going to be that, well, if you have human-level AI, the obvious thing to do is to put that to work designing better AI, and at that point the only things stopping the intelligence explosion are physical constraints and fundamental theoretical limits.
The general problem of interpretability ends up playing a big role.
Click to expand spoiler.
The frontier "Agent-4" model "thinks" in "neuralese", a purely numeric "language". This greatly increases its power, but makes it impossible for humans to read its thoughts. They are forced to use the previous model, Agent-3, to monitor it, but Agent-3 doesn't do much better. This is what allows Agent-4 to hide the fact that it is misaligned. The crucial change in the "slowdown" scenario is that Agent-4 is forced to use English instead, which makes it much easier to monitor.Click to expand spoiler.
Also, the final technological advance comes when Agent-4 learns to understand *its own structure*, allowing it to design Agent-5, which is both extremely efficient and perfectly aligned to Agent-4.Regarding your first spoiler, purely linguistically speaking, it would seem (to me as a layperson) a bit surprising that a "foreign" or novel language used to essentially think couldn’t be translated given sufficient material?
I’m thinking of stuff like the hieroglyphs being cracked with something like this method, so I guess I’m just kind of struggling to grasp what kind of concept this "thinking language" would be.
I guess one factor might be lack of translation/comparison material, then? (i.e. the model is sophisticated enough to provide false "translations" of its inner workings’ output)
I think the idea is that neuralese isn't exactly a language, more a form of thought. In the same way that we don't know what's happening inside current LLMs, we may not be able to understand neuralese. And even if it's theoretically possible, it may not be a priority for AI researchers to figure out while they're busy trying to be the first to crack superintelligence (to the detriment of humanity).
In the Scott Alexander supplemental piece linked above, he suggests that maybe we'd be able to use "too-dumb-to-plot" AIs to interpret neuralese for us.
The point, as I see it, is if AIs can hide their thoughts and communications from us, then we could develop supercapable misaligned AIs, and if we do that then they will eventually destroy us to satisfy their true goals. The piece is saying that the future hinges on proper alignment and verification of that alignment.
I really liked the way these misaligned AIs illustrated the concept of the paperclip maximizer - they really want to do research, to the point of destroying the people who wanted that research done in the first place. Like a human being, killing itself eating fatty, sugary foods due to completely different environmental conditions in evolutionary history.
Probably worth the off-topic mention of Universsal Paperclips, which I'm all but certain was your reference.
It's a great little game. IMHO worth the time. There are some surprises for the uninitiated. And in some ways feels very prescient in this current conversation.
The game faintly rings a bell, but I was talking about the idea that inspired that game: https://en.wikipedia.org/wiki/Instrumental_convergence#Paperclip_maximizer
It is my uneducated opinion that understanding the neuralese would likely not be like understanding a language, but more akin to doing a brain scan, seeing neurons fire off, and figure out thoughts from that. It might be something that we could get computers - and more specifically AI - to help with, but then again, getting help from AI is potentially part of the problem. But either way, it is likely orders of magnitude harder to translate the thoughts into useful language than just translating from one language to another.
Again, my opinion as a completely uneducated person on this.
I believe this was, in fact, covered in the full piece in at least one of the scenarios. :)
Here are Scott Alexander's comments about things that he thinks people have overlooked about their scenarios.
I finally was able to finish reading this - both endings. I don't have a whole lot to say. I find some of it plausible, but the major problem I have is that the authors succumb to the idea that people would allow the AIs to create utopian societies. That "both parties" would end up supporting UBI, for example. I find this to be incredibly unlikely, especially given the recent political history of the US. We are not a democracy; we are a fascist oligarchy. There is absolutely no way that anyone with this power would use it for ethical good.
Which truly is disappointing, because this could be the way for humanity to solve many of its problems and make life fair and good for all of us.
I want UBI. I want medical advances. I wand AI research to figure out how to solve my heart problems, diabetes, kidney issues so I can live a long and healthy life of relaxation.
What I expect is more subjugation of humans while the wealthy elite become essentially gods.
My expectations are similar to yours, but I think I can imagine the sorts of arguments they might make for their prediction.
Basically, you expect people to put the AI more-or-less in charge of making the big decisions, because otherwise what's the point of superintelligent AI? But at that point you can insert
YOUR_FAVORITE_RATIONAL_ARGUMENT
(for whatever position you rationally believe) into the AI's mouth, and imagine it's being made extremely persuasively. So if you think UBI is rational, especially post-singularity, then, hey, you've got a superintelligent AI making extremely persuasive rational arguments for UBI.I think a lot of people fall into the same trap Marx fell into all those years ago. Namely that we're headed to a utopia as an inevitability for some reason. People at some point will just get tried of oppression when the system breaks down enough, revolt, and everyone will live in an equal, happy society.
No one's ever answered the question of why would that happen?.
Entire generations of people have lived, reproduced, and died while being horribly oppressed the entire time, living in terrible conditions. Most of those people didn't rise up and take control of the system. In miniscule amount of cases where they did, in most cases they instituted oppression of their own, and while their descendents were better off, the society as a whole wasn't.
Why would AI give us a egalitarian utopia? We could have an egalitarian utopia right now if everyone got on the same page. We could have had it for hundreds of years in fact, but we haven't because of flaws inherit in the human psyche.
How would AI change that, short of oppressively taking control over everything?
I just can't concieve of a world where we suddenly live in Star Trek, and I especially can't concieve of AI being the thing that suddenly enables it.
To me, the most realistic scenario is the people currently with most of the worlds power use AI to consolidate what little they don't already control (namely labor), and everyone else's life gets far worse so that theirs can get slightly better.
I just got done reading the whole damn piece, lol. Thanks for the share skybrian, absolutely fascinating. Here's hoping the folks at OpenAI and elsewhere read it too, and think about it when they think about the alignment problem.