From the article: [...] [...] [...] [...] [...] [...] [...] [...] [...]
From the article:
For three years, I worked at one of the organizations you might expect—I will not say which, for reasons that will become apparent, though the cognoscenti will likely guess. I left in early 2024, and I have spent the months since reading and rereading a novel that was first published in 1872, searching for something I sensed but could not name. What I found there was so uncanny, so precisely calibrated to our present moment, that I began to suspect Dostoevsky of a kind of prophecy.
[...]
I will argue that Dostoevsky understood, with terrifying precision, the psychological and social dynamics that emerge when a small group of people convince themselves they have discovered a truth so important that normal ethical constraints no longer apply to them. He understood the particular madness of the intelligent, the way abstraction can sever conscience from action. He understood how movements that begin with the liberation of humanity end with its enslavement. And he understood—this is the critical point—that the catastrophe comes not from the cynics but from the believers.
[...]
I am not suggesting that AI safety is a conspiracy in any sinister sense. I am observing that the social dynamics of the movement share structural features with Dostoevsky's revolutionaries. The same information asymmetries, the same reliance on reputation rather than verification, the same exploitation of these features by bad actors.
One of the recurring features of the AI labs is the vast gulf between public statements and internal reality. The safety teams are presented externally as powerful internal voices shaping company direction. Internally, their actual influence varies from minimal to nonexistent, depending on the organization and the moment. The people who could reveal this gap have strong incentives not to. Their careers, their equity stakes, their social standing within the community—all depend on maintaining the illusion.
[...]
What is the central point? It is that neither "safety" nor "acceleration" is actually what drives the behavior of the people in power. What drives them is more immediate: the competitive dynamics of the industry, the career incentives of individual researchers, the political pressures from governments and investors, the personal relationships and rivalries that shape decision-making. The ideological debates are largely epiphenomenal—they provide vocabulary and justification, but they do not determine outcomes.
[...]
The AI researchers have the same mobility. If Anthropic becomes too restrictive, there's OpenAI. If OpenAI becomes too chaotic, there's Google. If the Bay Area becomes too weird, there's London. The ability to exit reduces the incentive to push for change from within—why fight a difficult political battle when you can simply leave? But it also means that the problems are never confronted, only redistributed. The researcher who leaves Lab A because of safety concerns may find the same concerns at Lab B, because the same people, with the same assumptions, are building the same systems.
[...]
The labs compete on certain dimensions: time to market, benchmark performance, talent acquisition. But they do not compete on the fundamental question of whether to build increasingly powerful AI systems as quickly as possible. On that question, they are aligned. The uniparty consensus is that AI development should proceed, that the benefits outweigh the risks, that the people currently in charge are the right people to be in charge.
[...]
The same dynamic applies to the AI companies themselves. Google, OpenAI, Anthropic, Meta—they all fund safety research, internally and externally. They have genuine reasons to want such research to succeed, but they also have genuine reasons to want it to succeed in ways that do not threaten their core business. The researchers funded by these companies are not bought in any crude sense, but they operate in an environment where certain conclusions are easier to reach than others.
[...]
This is, I think, the most accurate description of what has gone wrong in the AI industry. The possession is not by ideology but by capability—by the extraordinary power to build things that work, to solve problems that seemed unsolvable, to extend human reach in ways that feel like magic. This capability is genuinely intoxicating. I felt it myself. The experience of making a neural network do something it was not supposed to be able to do is genuinely thrilling.
But the capability is developing faster than the wisdom to direct it. We can build systems that generate human-quality text, that create images from descriptions, that engage in extended reasoning. We cannot reliably make these systems do what we want, or predict what they will do, or understand why they do what they do. We are possessed by our own creations in the most literal possible sense: they act on us as much as we act on them.
[...]
A striking feature of Demons is how little intentions matter. The characters have various intentions—good, bad, mixed, confused—but the outcomes seem almost independent of them. The catastrophe happens not because anyone intended it but because the system had that catastrophe as its attractor.
This is perhaps the most discomfiting lesson for the AI industry. There is intense focus on the intentions of the developers: are they trying to benefit humanity? Are they adequately safety-conscious? Are they motivated by altruism or greed? These questions matter, but Dostoevsky suggests they may not matter as much as we assume.
[...]
The collective behavior of the industry points toward something like a goal, even if no individual endorses that goal. The goal seems to be: maximize capability, externalize risk, capture value, and maintain optionality for as long as possible. This is not anyone's explicit objective, but it is what the system is actually doing.
Thanks for posting this. Another awesome link. My fundamental disagreement with the author is this: They've written almost a novella on what is, principally, an ethical topic. Over and over...
Thanks for posting this. Another awesome link.
My fundamental disagreement with the author is this:
I want to be careful here, because describing these dynamics in detail would identify individuals in ways that might cause harm.
They've written almost a novella on what is, principally, an ethical topic. Over and over they've castigated the AI researchers who have substituted reason for an ethical sense, who reach insane conclusions because they lack a functioning moral compass to tell them "hold on a minute, here." The author is worried these people will destroy the world.
In that context, naming them would do so much harm that it's off the table?
The author's lack of skin in the game defangs their entire... everything.
(I'm only about 3/4ths done, so I'll edit this comment if there's an argument addressing this). Edit: there is not. further reaction to this in a subsequent comment.
A more esoteric comment. Despite what follows, engaging with the author’s frame is, basically, the highest sign of respect I can show. This is from the conclusion of the OP. The author’s reading...
A more esoteric comment. Despite what follows, engaging with the author’s frame is, basically, the highest sign of respect I can show.
This is from the conclusion of the OP.
Stavrogin's confession fails because it is an attempt to achieve through speech what can only be achieved through being. He wants the relief of confession without the transformation of repentance. He wants to be seen as someone who has faced his crimes without actually facing them—without allowing the knowledge of what he has done to change who he is.
I am aware that this essay may be a similar failure. I have written twenty thousand words about the psychology of AI development and the lessons of Dostoevsky, and the writing itself has been absorbing, intellectually stimulating, even pleasurable in moments. Have I actually faced anything? Have I allowed the knowledge I claim to possess to change who I am?
I do not know. The question cannot be answered from the inside.
What I can say is that writing this has been an attempt—perhaps a failed attempt, but an attempt—to make articulate something I have felt but have not been able to express. The AI industry is in a situation of profound moral seriousness, and the discourse surrounding it is not adequate to that seriousness. The rationalist frameworks, the policy discussions, the technical papers—all of these have their place, but they do not capture what is actually happening.
What is actually happening is that a small group of people, shaped by particular histories and situated in particular social positions, are making decisions that may affect every human being who will ever live.
The author’s reading of Dostoevsky here is fairly conventional. If you ask AI about the chapter, it says something along the lines of “Stavrogin’s motivation for the confession is not genuine contrition but an experiment in aestheticized suffering. He seeks a burden so heavy that it might force him to feel something.”
From the relevant scene in Demons:
“This document comes straight from the needs of a heart which is mortally wounded,—am I not right in this?” he said emphatically and with extraordinary earnestness. “Yes, it is repentance and natural need of repentance that has overcome you, and you have taken the great way, the rarest way. But you, it seems, already hate and despise beforehand all those who will read what is written here, and you challenge them. You were not ashamed of admitting your crime; why are you ashamed of repentance?”
So again, fairly straightforward. But what if what Stavrogin is refusing to face isn’t introspection, as the OP discusses…
Stavrogin:
Enough. Tell me, then, where exactly am I ridiculous in my manuscript? I know myself, but I want you to put your finger on it. And tell it as cynically as possible, tell me with all the sincerity of which you are capable.
…but actual concrete consequences? This is how the chapter ends:
”No, not that penance, I am preparing another for you!” Tikhon went on earnestly, without taking the least notice of Stavrogin’s smile and remark.
“I know an old man, a hermit and ascetic, not here, but not far from here, of such great Christian wisdom that he is even beyond your and my understanding. He will listen to my request. I will tell him about you. Go to him, into retreat, as a novice under his guidance, for five years, for seven, for as many as you find necessary. Make a vow to yourself, and by this great sacrifice you will acquire all that you long for and don’t even expect, for you cannot possibly realize now what you will obtain.”
Stavrogin listened gravely.
“You suggest that I enter the monastery as a monk.”
“You must not be in the monastery, nor take orders as a monk; be only a lay-brother, a secret, not an open one; it may be that, even living altogether in society....”
“Enough, Father Tikhon.” Stavrogin interrupted him with aversion and rose from his chair.
[there are a couple lines after this. Stavrogin calls Tikhon a damned psychologist and leaves in fury.]
In this reading, the author of the OP has failed. Not because they confronted these ideas without being changed by the them, but because when the time came to truly repent — to do something concrete at great personal cost — to name the names of the people they think are threatening the existence of humanity, they, like Stavrogin, rose from their chair and fled.
(reading through it as I'm writing) ...man, this actually describes a lot of powerful positions in our society. Where the people ending up in charge are not the ones who can make the sound moral...
(reading through it as I'm writing)
The AI safety community has developed elaborate frameworks for thinking about existential risk, but these frameworks assume a kind of normal moral psychology that cannot be assumed in the people making the key decisions. Expected value calculations do not help when the person doing the calculating is incapable of feeling that the values in question are real.
...man, this actually describes a lot of powerful positions in our society. Where the people ending up in charge are not the ones who can make the sound moral decisions.
I have watched similar dynamics play out in AI safety organizations. The people who leave are not merely disagreed with; they are reconceptualized as having been flawed all along. Their previous contributions are reinterpreted in light of their eventual departure. The group's self-conception requires that anyone who rejects it must have been mistaken from the beginning.
You can see this dynamic in social media, where political opinions are treated as a kind of being morally right/wrong, but even taken further into an extreme. It makes disagreement not just more uncomfortable than it already is, but downright terrifying.
Consider the actual topology. Researcher A at OpenAI dated Researcher B at Anthropic; they met at a house party in the Mission thrown by Researcher C, who left DeepMind last year and now runs a small alignment nonprofit. Researcher D at Google and Researcher E at Meta were roommates in graduate school and still share a group house with three other ML researchers who work at various startups. The safety lead at one major lab and the policy director at another were in the same MIRI summer program in 2017. The CEO of one frontier lab and the chief scientist of another served on the same nonprofit board.
This is not corruption in any conventional sense. It is simply how small, specialized communities work. The number of people with the technical skills and intellectual orientation to do frontier AI research is measured in hundreds, perhaps low thousands. They attend the same conferences (NeurIPS, ICML, the various safety workshops). They post on the same forums (LessWrong, the Alignment Forum, Twitter/X). They read each other's papers, cite each other's work, argue in each other's comment sections. Many of them live within a few miles of each other in the Bay Area or London.
This goes, if you ask me, well with my point above: Sometimes you need the opinion of an outsider. To make sure you, or any group, aren't going down a wrong path somewhere. I hadn't considered this dynamic before within AI research, and I don't know enough about the internal world, but I can certainly see that dynamic there. It almost reminds me of how aristocracies in Europe used to justify their own existence with, well, their own existence. Basically being self-evident. That might be a rather extreme example, but still.
I want to be careful here, because describing these dynamics in detail would identify individuals in ways that might cause harm. But the general pattern is visible enough to anyone who pays attention. The AI safety community has its aristocracy—the founders of the field, the authors of the canonical texts. It has its ambitious climbers, its fallen stars, its heretics, its gossips. The social machinery is remarkably similar to what Dostoevsky describes in Demons, adjusted for a San Francisco context.
...Oh. Didn't expect the same word to be used in such a manner. <_<; Kind of makes me hope that the author's description is geared towards that conclusion rather than it being a proper description. Though so far I don't get the vibe that this is for attention or something similar, unfortunately.
Open Philanthropy has given more than a hundred million dollars to AI safety research.18 This is, in one sense, admirable—they have identified a problem they believe is important and they are trying to do something about it. But their funding creates dependencies. Researchers who want to continue their work must remain in Open Phil's good graces. This is not necessarily corrupting—Open Phil seems to be relatively hands-off—but it creates structural incentives that shape the discourse in ways that are difficult to perceive from the inside.
This is why separation of power is so, so important. It prevents that kind of dependencies, and I've actually been wondering if we need something akin to that with economics given the current issues in our world, but that's a story for another time I suppose.
I felt this dynamic when I was considering leaving. The decision was not just professional; it was about identity and belonging. My entire social world was the AI industry. My friends, my romantic partners, my sense of purpose—all of it was bound up with the work. Leaving meant not just changing jobs but changing who I was.
Honestly? Props to the author for leaving. It's easy to think you would do as such in such a situation, but when your entire identity is intertwined with it, that becomes easier said than done.
I do not know how to translate this into the AI context. I am suspicious of easy answers. But I am also increasingly convinced that the purely technical and policy approaches to AI risk are insufficient—that they treat symptoms while ignoring the underlying disease.
I want to highlight this because it's important to admit we don't have all the answers, and admitting as such in a text like this requires a lot of courage imo!
Perhaps the AI industry is possessed in this sense. Not by ideology, not by any single vision, but by the spirit of acceleration itself—the drive toward "more" and "faster" that has no end point and no criterion for success except continued motion.
I hadn't thought about it like this before, but it makes sense when you consider how much of a bubble it is. Keeping investing in it not only due to sunk cost fallacy, but out of pure inertia due to the structures ending up having, well, that structure.
Confession can be self-serving. It can be a way of claiming moral credit for acknowledgment while avoiding the costs of action. It can even be a form of action-substitution—the feeling of having done something when in fact one has only talked about doing something.
Never thought about confessions like that before, but yeah. They can very much be like that. Venting is similar actually, you may feel like you've provided relief for yourself - and you sure did at that moment - but that doesn't mean the problem itself is automatically tackled, even if it feels like you did.
...
I may need to read through this again. Perhaps my initial thoughts will be very different from what I'll think about this later but nonetheless, thanks for sharing. If nothing else, this has been a good read. I'd like to write down more thoughts immediately but I feel like I need to digest this first.
Damn, I just got called out. Maybe I should stop running and put some skin into the game.
why fight a difficult political battle when you can simply leave? But it also means that the problems are never confronted, only redistributed. The researcher who leaves Lab A because of safety concerns may find the same concerns at Lab B, because the same people, with the same assumptions, are building the same systems.
Damn, I just got called out. Maybe I should stop running and put some skin into the game.
It's also appropriate to leave when you've fought the internal political battles with all the power at your disposal, and lost them. As the author indicates, there are structural and monetary...
It's also appropriate to leave when you've fought the internal political battles with all the power at your disposal, and lost them. As the author indicates, there are structural and monetary forces arrayed which no individual or internal group can oppose effectively, and so it's better to expose the struggle and fight from outside the system.
From the article:
[...]
[...]
[...]
[...]
[...]
[...]
[...]
[...]
[...]
Thanks for posting this. Another awesome link.
My fundamental disagreement with the author is this:
They've written almost a novella on what is, principally, an ethical topic. Over and over they've castigated the AI researchers who have substituted reason for an ethical sense, who reach insane conclusions because they lack a functioning moral compass to tell them "hold on a minute, here." The author is worried these people will destroy the world.
In that context, naming them would do so much harm that it's off the table?
The author's lack of skin in the game defangs their entire... everything.
(I'm only about 3/4ths done, so I'll edit this comment if there's an argument addressing this).
Edit: there is not. further reaction to this in a subsequent comment.
A more esoteric comment. Despite what follows, engaging with the author’s frame is, basically, the highest sign of respect I can show.
This is from the conclusion of the OP.
The author’s reading of Dostoevsky here is fairly conventional. If you ask AI about the chapter, it says something along the lines of “Stavrogin’s motivation for the confession is not genuine contrition but an experiment in aestheticized suffering. He seeks a burden so heavy that it might force him to feel something.”
From the relevant scene in Demons:
So again, fairly straightforward. But what if what Stavrogin is refusing to face isn’t introspection, as the OP discusses…
Stavrogin:
…but actual concrete consequences? This is how the chapter ends:
[there are a couple lines after this. Stavrogin calls Tikhon a damned psychologist and leaves in fury.]
In this reading, the author of the OP has failed. Not because they confronted these ideas without being changed by the them, but because when the time came to truly repent — to do something concrete at great personal cost — to name the names of the people they think are threatening the existence of humanity, they, like Stavrogin, rose from their chair and fled.
(reading through it as I'm writing)
...man, this actually describes a lot of powerful positions in our society. Where the people ending up in charge are not the ones who can make the sound moral decisions.
You can see this dynamic in social media, where political opinions are treated as a kind of being morally right/wrong, but even taken further into an extreme. It makes disagreement not just more uncomfortable than it already is, but downright terrifying.
This goes, if you ask me, well with my point above: Sometimes you need the opinion of an outsider. To make sure you, or any group, aren't going down a wrong path somewhere. I hadn't considered this dynamic before within AI research, and I don't know enough about the internal world, but I can certainly see that dynamic there. It almost reminds me of how aristocracies in Europe used to justify their own existence with, well, their own existence. Basically being self-evident. That might be a rather extreme example, but still.
...Oh. Didn't expect the same word to be used in such a manner. <_<; Kind of makes me hope that the author's description is geared towards that conclusion rather than it being a proper description. Though so far I don't get the vibe that this is for attention or something similar, unfortunately.
This is why separation of power is so, so important. It prevents that kind of dependencies, and I've actually been wondering if we need something akin to that with economics given the current issues in our world, but that's a story for another time I suppose.
Honestly? Props to the author for leaving. It's easy to think you would do as such in such a situation, but when your entire identity is intertwined with it, that becomes easier said than done.
I want to highlight this because it's important to admit we don't have all the answers, and admitting as such in a text like this requires a lot of courage imo!
I hadn't thought about it like this before, but it makes sense when you consider how much of a bubble it is. Keeping investing in it not only due to sunk cost fallacy, but out of pure inertia due to the structures ending up having, well, that structure.
Never thought about confessions like that before, but yeah. They can very much be like that. Venting is similar actually, you may feel like you've provided relief for yourself - and you sure did at that moment - but that doesn't mean the problem itself is automatically tackled, even if it feels like you did.
...
I may need to read through this again. Perhaps my initial thoughts will be very different from what I'll think about this later but nonetheless, thanks for sharing. If nothing else, this has been a good read. I'd like to write down more thoughts immediately but I feel like I need to digest this first.
Damn, I just got called out. Maybe I should stop running and put some skin into the game.
It's also appropriate to leave when you've fought the internal political battles with all the power at your disposal, and lost them. As the author indicates, there are structural and monetary forces arrayed which no individual or internal group can oppose effectively, and so it's better to expose the struggle and fight from outside the system.