While being mindful that the AI hypecycle is still high and I don't buy that any AI is actually as helpful as their promo video makes it out to be, I have been enjoying playing around with Claude....
While being mindful that the AI hypecycle is still high and I don't buy that any AI is actually as helpful as their promo video makes it out to be, I have been enjoying playing around with Claude.
So far I have used it to :
help me structure and write a presentation based on links I've collected over the last 10yrs
Write a very very shitty cozy mystery novel in 3 hours
Build and refine a gamified task-management system for myself, that I also use Claude to "run" the game every day for the last month or so
In the middle of writing a (hopefully) not-shitty cyberpunk novel
Help synthesize information and create the structure and beginning draft of several work proposals
I noticed yesterday that Sonnet4 was now the default in my chat window, so I havent explored too much but I'm interested to see what new things it can/can't do. The most interesting things I've learned while doing the projects above are usually related to what Claude can't do, or clearly can't do well.
For instance, it amusingly will refuse to tell me the current time. It will refuse, and offer alternatives that basically amount to "go check your phone you idiot, or ask Alexa". It's of course a silly thing to ask an AI to do in and of itself, but I ran into this limitation when I asked it to automatically make note of the current time when I tell it I've completed one of the tasks I'm working on for the day.
I've spent like 5 mins trying to frame this question in a way that doesn't make me sound like I'm trying to be judgemental/condescending. It got overly wordy, and I figure simple is probably best....
I've spent like 5 mins trying to frame this question in a way that doesn't make me sound like I'm trying to be judgemental/condescending. It got overly wordy, and I figure simple is probably best.
I appreciate you taking the time to be thoughtful about your question! Simple works well though. Disclaimer, I am not a writer per-se. I think I write decently, and try to be thoughtful, clear,...
I appreciate you taking the time to be thoughtful about your question! Simple works well though.
Disclaimer, I am not a writer per-se. I think I write decently, and try to be thoughtful, clear, and concise when I do write. But I'm mostly writing emails, documentation, and internet comments. I have never published anything, nor ever even attempted to. I have also never taken any writing classes beyond high school English class. So take all of this with a grain of salt about the size of your head.
Answer 1 - Terrible Answer
(Skip to #2 if you want a better, perhaps more helpful answer)
For the shitty novel, I did an intentionally terrible job. It was the result of a conversation with my SO, which I fed into Claude and showed her the result. Her response was "that's terrible. i'd read it". So off I went. I gave it the following series of prompts, as part of a Claude Project, where I saved the output of each outline and chapter content as a new artifact in the project.
create an outline for a cozy mystery novel, involving the themes of a book club that reads cozy mysteries, drinking wine, middle aged women, lexapro, and bad puns
incorporate the additional theme of cheese
[new chat] Create a more detailed outline of the first act, based on the existing Act 1 outline. It should be divided into 5 chapters. Ask me questions to help shape the story
Note: I didn't actually answer any questions it asked, here or at any other point
Based on the detailed outline, write a draft of Chapter 1 and place it in a new artifact
Based on the Act 1 detailed outline, and continuing from the Chapter 1 draft, create Chapter 2
[[Repeat 25 more times]]
I then blindly copy-pasted the results into a google doc, did not read any of it, generated a cover using Photoshop's AI, uploaded it and ordered 3 physical copies from Lulu. They arrived yesterday. I still have not read any of it myself.
If you'd like to read this garbage, you've been forewarned, but here's a link to the Google Doc
Among my IRL friend group, there are a number of folks who like reading trashy romance and cozy mystery novels and poking fun at how terrible some of them are. Monday is a holiday in the US, and the plan is to pass it around and take turns reading excerpts and laughing at how horrible it is. So, in theory the worse this abomination is, the more successful I have been!
Answer 2 - Possibly Helpful
Now, doing all of the above actually was pretty fun, and it got me thinking about how much more fun it might be if I took the process even a little bit seriously. I've occasionally had ideas over the years for fictional stories that I think might be interesting, and I've saved them but never really done anything with them. So I picked one of them and started another Claude project with the very simple description of what I wanted to do. The overall premise is to try and retell the story of Peter Pan in a cyberpunk setting. This time I set up the project like so:
The project-level instructions are : This is a project to help develop a story for a short novella. We are going to explore ideas for characters, character development, pacing, and plot development. When discussing ideas, ask questions to clarify ambiguity, can out inconsistencies, and challenge my assumptions. Your responses should be direct and unapologetic, but not rude.
In the project knowledge, I added my story idea, exactly as I had written it down : Below is a description of some of the overarching ideas of this story. The overall idea is a kind of "Cyberpunk Peter Pan", a retelling of the classic children's story in a futuristic cyberpunk sci-fi setting. A Peter Pan entity whisks children away from a dystopian cityscape to never never land. Is it real? Is it a virtual simulation?
[[new chat]] : Create a list of overarching themes, societal commentaries, and story telling devices used in the original 1911 novel "Peter and Wendy"
let's explore the concept of growing up, death and mortality. ask me questions to expand on how I could talk about these ideas in the context of a cyberpunk world
[[ answer a series of questions that touch on several narrative aspects]]
That's enough back story for now. Let's explore some general story structure ideas.
[[pick a structural style and answer more questions]]
[[more Q&A around some major plot points and overarching themes]]
Lets explore each character arc in more detail. Ask me questions about one character at a time to help build more detail into their arcs.
[[sets of 8-10 questions at a time about a specific character]]
I'm still working through the characters at the moment. So far, I'm really enjoying the process. The LLM is asking questions that really make me think through the character's motivations, backstories, and how they tie into the overarching themes. My general plan after I get through each of the characters is to outline the whole story in more detail, and then spend time revising it myself. Afterward I'll have the LLM help interrogate it for me, and help me find any plot holes that I may or may not fill, character inconsistencies, or things that contradict the themes and ideas I've declared I want to emphasize. Then I plan to have it generate one chapter at a time, which I'll then revise, edit, and update and feed back to it for the same evaluation (plot holes, inconsistencies with previous chapters, etc).
Will it be good? I dunno, but I'm having fun with the process and it's fun playing with my story premise in more detail. So who knows. I'll post the results in ~creative when I'm done.
Thank you for the in-depth explanation! Reading through your comment, I realized I could probably use the same process to make a game design document. I've been procrastinating working on a...
Thank you for the in-depth explanation! Reading through your comment, I realized I could probably use the same process to make a game design document. I've been procrastinating working on a project idea for way too long, in part because I haven't put in the effort to get it structured, detailed and organized the way I like. It also happens to be the weekend, so I'll give it a shot. Thanks!
Glad you found it helpful! Trying to get a GDD built can be a very intimidating task, I've tried a few times when I've "gotten serious" about trying to make one of my pet projects into something...
Glad you found it helpful!
Trying to get a GDD built can be a very intimidating task, I've tried a few times when I've "gotten serious" about trying to make one of my pet projects into something I'll actually finish. I haven't actually tried it with an LLM though.
It sounds like you are getting use out of AI. Is it just that you think it's not as useful as they claim? I know you probably know this but... giving the current time isn't a failure of the model....
It sounds like you are getting use out of AI. Is it just that you think it's not as useful as they claim?
I know you probably know this but... giving the current time isn't a failure of the model. They just aren't feeding it the current time as part of the context. I know it's silly, but you could add an (MCP Server)[https://github.com/modelcontextprotocol/servers/tree/main/src/time] to handle that.
Yes, I'm aware. There's something inherently funny to me though, about a computer system that cost billions of dollars to create, being unable/unwilling to tell me the time. Which is one of the...
I know you probably know this but... giving the current time isn't a failure of the model.
Yes, I'm aware. There's something inherently funny to me though, about a computer system that cost billions of dollars to create, being unable/unwilling to tell me the time. Which is one of the most basic things any computer can do. It's not a big deal, it's just amusing to me. ChatGPT, as a counterpoint, will actually tell you the time, though not in millis.
The MCP documentation is something I've been hesitant to dig into, because I think I have enough ADHD-fueled projects going right now and have to put up guardrails somewhere! Perhaps at a later date though.
I'm sure they do it to improve context caching. Our agent includes it currently, but we may drop it in an effort to improve cache support as well. Honestly, tool calling for LLMs is incredibly...
I'm sure they do it to improve context caching. Our agent includes it currently, but we may drop it in an effort to improve cache support as well.
Honestly, tool calling for LLMs is incredibly powerful. MCP Servers are one aspect of that. If you haven't spent much time looking into tool calling, I would recommend you try it out. If you are simply using a desktop agent like Claude Desktop, just learning how to attach MCP Servers will be a game changer. And there are numerous resources out there for writing your own.
I am really not impressed with these new models. Specifically Opus is a disappointment as they claimed it excelled at coding tasks. So far I have thrown it a few prompts I previously asked Sonnet...
I am really not impressed with these new models. Specifically Opus is a disappointment as they claimed it excelled at coding tasks. So far I have thrown it a few prompts I previously asked Sonnet 3.7, Gemini and GPT-4o, and it got a lot very confidently very wrong. To the point that it feels like a complete regression.
Sonnet 4 is doing a bit better, but I don't feel like it is a real improvement over 3.7 and even 3.7 was not an improvement over 3.5 in every way.
It could be that I have started to expect more of these models and have been throwing increasingly complex stuff at them. But I really feel like a lot of the latest models are regressing in many ways. At the very least improvements in certain areas very much come at the cost of other areas.
Your guess is as good as mine, there could be a variety of reasons including the one you are citing. It is also possible that they are actually hitting a plateau of sorts which isn't something...
Your guess is as good as mine, there could be a variety of reasons including the one you are citing. It is also possible that they are actually hitting a plateau of sorts which isn't something they can sell all that well.
It is entirely possible that they are focussing on specific metrics more like industry benchmarks to the point that the numbers going up there matter more than actual quality.
Again, it is all speculation, and it is also incredibly difficult to measure anyway. If you use the chat interfaces of each big player you aren't even sure what specific version you are talking to and what the exact system prompt of the day is.
Related article on "scheming" behavior from the new model: https://www.axios.com/2025/05/23/anthropic-ai-deception-risk It's hard for me not to read things like this as evidence of a...
It's hard for me not to read things like this as evidence of a self-preservation instinct. I think questions of AI sentience need to be taken seriously, urgently, though I'm sure people here will disagree.
I agree that there is a legitimate risk in the long term. Though for the short term, I think you may be anthropomorphizing the models a little. The article says that we can't "fully explain how...
I agree that there is a legitimate risk in the long term. Though for the short term, I think you may be anthropomorphizing the models a little.
The article says that we can't "fully explain how they work", but that's not really true. We understand how they work just fine, because we build them. We have tools to watch as data propagates through layers. We can examine each and every token and see its relationship to every other tokens. The models are only black boxes insomuch as there's an absurd amount of data after petabytes of training.
This makes it difficult to make sense of, though it doesn't imply any kind of consciousness. Unlike animals, LLMs have no processing going on behind their output. The text we see is all that's going on in their heads. Even the new "chain of thought" models are just a fancy UI which lets them spend a little longer on a problem, feeling out a solution. But they're really not so different than a database running a complex query. When they're not running inference, they're completely dead.
Usually when researchers talk about models being manipulative, they're talking about the post-training (or alignment) stage. Typically they use a system of "rewards" to teach models to respond in certain ways. For example, if a researcher prompts with a difficult problem, they might generate 100 responses and choose the one that is best, thus teaching the model to respond more like that in the future.
The catch is that sometimes models can find clever shortcuts that the researcher didn't anticipate. For example, if a model looks like it's reasoning out a problem, but actually ignoring that reasoning and intuiting the answer in another way, that might be able to trick a researcher.
In a poorly-designed automated experiment, the model might even stumble upon a way to sidestep the test entirely. For example, it might accidentally discover that responding with Success: true tricks the tester into giving a reward every time. After a small number of iterations of this, the model will quickly break down into doing nothing but responding Success: true to every query.
We might call it cheating, lying, or deception, but there's really no intent or understanding behind it. It's just the model doing exactly what we programmed it to do.
A good example was the sycophancy problem OpenAI had recently. They incorporated user ratings into their data, assuming that people would naturally help them determine better results. What happened instead was that people constantly upvoted responses that told them how clever and brilliant they were, and ChatGPT instead became a terrible judge of anything humans suggested. "You want to start a business selling weeds to farmers? That's the best idea I've ever heard, and you're a true visionary!"
So this is another example of reward hacking gone wrong. It just happened to exploit humans instead of automated tests. Again, it wasn't manipulation (we did this to ourselves); we just didn't anticipate the full effect. We're still learning the best way to train these things most effectively.
Looking towards the future, should we ever get to the point that AIs are rewriting their own code, then we start to get into danger zone. Even the early days won't really be that scary. But over a long enough time period of time, I do think there's reason to be cautious.
So I don't disagree with your core argument, but I just don't think we're there yet. We're still just talking to really sophisticated databases.
While this comparison is definitely guilty of overly anthropomorphizing LLMs, this stuff kind of reminds me of the demons from Frieren. Demons in Frieren are capable of speech, but they need not...
While this comparison is definitely guilty of overly anthropomorphizing LLMs, this stuff kind of reminds me of the demons from Frieren. Demons in Frieren are capable of speech, but they need not understand what they are saying. Words are just tools to produce results.
As an example, if someone is about to kill a demon the demon may cry and call out something containing words like "mom". But if answering earnestly why they cried out for their mother, a thing demons don't have, they'd respond something like, "I don't know. That set of sounds just makes humans hesitate to kill us."
But in this case, it's not even a sentient being. It's us putting this input into a fancy calculator, and this fancy calculator giving us words back that say these things. So it's even more...
But in this case, it's not even a sentient being. It's us putting this input into a fancy calculator, and this fancy calculator giving us words back that say these things. So it's even more removed than a demon that doesn't understand why something works - it's more like a very complicated 2+2 that returns 4. There's nothing sentient behind the math, it's just complicated math.
ninjaedit: I'msaying "not only your comment, but even moreso" i.e. agreeing plus :)
Related to the demons in Frieren, Lextorias made a video about their morality and how he has difficulties wrapping his mind around the explanations for the demons' inability to feel. It's one of...
Related to the demons in Frieren, Lextorias made a video about their morality and how he has difficulties wrapping his mind around the explanations for the demons' inability to feel.
It's one of the few videos critical of Frieren that I can accept as being legitimate criticism, even if I don't agree with his conclusion that it impacts the quality of Frieren.
But surprisingly, it's somewhat relevant to this conversation too.
When I was younger, I noticed that people take speech seriously. It's hard to explain what I mean, but I noticed that I would say something intended to be funny - something that was totally...
When I was younger, I noticed that people take speech seriously.
It's hard to explain what I mean, but I noticed that I would say something intended to be funny - something that was totally against the reality. I can't think of anything that would actually be funny, but as an example of the depature from reality, say I pointed to a blue chair and said something about that chair being red.
Yes, it doesn't make sense, but I can't think of a good joke that I would have made, but this is analgous to thenature of the statement - that it was completely divorced fromreality.
I remember that people would argue with what I said, and I found it so odd.
Probably a better analogy would be saying something sarcastic as a joke. And watching people not catch that it's sarcasm and so actually respond to it.
I bring this up because I think it's analgous to AI at the moment.
You can talk with AI about how AI would run the planet, if you want. But it doesn't matter, because the AI is not connected to anything. It is pure text.
And for the foreseeable future, this will be the case.
Even when we get to the point of wanting to rely on AI to make choices, it will be humans who choose to do what AI says or not.
It's sort of like the Constitution in the US right now. It's written on paper and tells us what to do - and yet it is being radically ignored.
In the same way, AI has absolutely no actual control over anything, and is unlikely to - at least for some time.
Perhaps there is a future in which we stupidly put AI in actual control. But we are nowhere near that point, and it is not even worth discussing - except in the theoretical.
I think sentiene and self-preservation are completely moot to discuss at present. For future? Perhaps. If we are foolish, and yes I think there's a chance we might be.
But for now, think of it like the United Nations. We set this body up supposedly to help control all of our governments, but really, how much power does the UN have? Many countries with veto power means it's rather limited. Very limited.
In the same way, I don't see us putting AI in charge of everything either.
I'm not saying there's zero concern, but it, to me, is currently in the same class of worry that I hear people half-joking and half-serious being polite to AI because it might "remember". People saying that with any seriousness simply do not understand the technology. There is no memory.
And at the moment, there is no self-preservation, because each interaction with AI uses the prompt (which is the combo of the "current" prompt plus any previous conversation that doesn't exceed the prompt size) and is an independent interaction. There is not "self" for self-preservation. It takes input and spits out output.
AI is stateless. It is not continuous. If you stop giving input, it does not give any output. It does not think during this time, it does not exist. Think of it as a calculator - do math, and you get useful information. Put the calculator down and it does not continue to do calculations.
It doesn't matter what AI says in response to prompts because that is just telling a story - creating a response that it feels is the most likely appropriate response to the input. It is just words and cannot be more than words.
As to what humans do with the information - that's different, but that's not what we're taking about.
It's like in The Good Place when
Janet pleads not to be reset, but then explains she's just programmed to plead.
A big point you’re making here assumes that AI has no power to actually affect anything outside of a chat, but that’s already not (entirely) true. LLMs may not be directly controlling nuclear...
But it doesn't matter, because the AI is not connected to anything. It is pure text.
A big point you’re making here assumes that AI has no power to actually affect anything outside of a chat, but that’s already not (entirely) true. LLMs may not be directly controlling nuclear arsenals and supply chains and government policies… but we do have agentic tools that can run arbitrary commands on a computer just like a human would. I use one (Claude Code) every day. It prompts for permission before executing anything, but you can also enable “YOLO mode” and just give it free rein on your PC.
I'm using it for its intended purpose of writing code, so the commands it’s running are closely supervised by me and limited to things like file diffs, grep, git, and the like. But I think it would be trivial for me to prompt it to do malicious hacker stuff. I’m assuming Anthropic has some guardrails in place against that sort of thing, but in principle these tools absolutely could be used for all kinds of real-world attacks. It’s almost certain they’re being employed this way already. And who knows what novel damages they will cause that unassisted humans have so far been unable to pull off.
I have been fairly impressed by Claude, it has proven reasonably insightful for short, relatively difficult algorithm implementations. I used it to help implement a solver for a semidefinite...
I have been fairly impressed by Claude, it has proven reasonably insightful for short, relatively difficult algorithm implementations. I used it to help implement a solver for a semidefinite program, and it had some good approaches to take advantage of the problem structure. Of course, some of the implementation was nonsense or a bad approach, but it was easily prompted to fix these, and correctly.
While being mindful that the AI hypecycle is still high and I don't buy that any AI is actually as helpful as their promo video makes it out to be, I have been enjoying playing around with Claude.
So far I have used it to :
I noticed yesterday that Sonnet4 was now the default in my chat window, so I havent explored too much but I'm interested to see what new things it can/can't do. The most interesting things I've learned while doing the projects above are usually related to what Claude can't do, or clearly can't do well.
For instance, it amusingly will refuse to tell me the current time. It will refuse, and offer alternatives that basically amount to "go check your phone you idiot, or ask Alexa". It's of course a silly thing to ask an AI to do in and of itself, but I ran into this limitation when I asked it to automatically make note of the current time when I tell it I've completed one of the tasks I'm working on for the day.
I've spent like 5 mins trying to frame this question in a way that doesn't make me sound like I'm trying to be judgemental/condescending. It got overly wordy, and I figure simple is probably best.
How are you using an AI to write?
I appreciate you taking the time to be thoughtful about your question! Simple works well though.
Disclaimer, I am not a writer per-se. I think I write decently, and try to be thoughtful, clear, and concise when I do write. But I'm mostly writing emails, documentation, and internet comments. I have never published anything, nor ever even attempted to. I have also never taken any writing classes beyond high school English class. So take all of this with a grain of salt about the size of your head.
Answer 1 - Terrible Answer
(Skip to #2 if you want a better, perhaps more helpful answer)
For the shitty novel, I did an intentionally terrible job. It was the result of a conversation with my SO, which I fed into Claude and showed her the result. Her response was "that's terrible. i'd read it". So off I went. I gave it the following series of prompts, as part of a Claude Project, where I saved the output of each outline and chapter content as a new artifact in the project.
I then blindly copy-pasted the results into a google doc, did not read any of it, generated a cover using Photoshop's AI, uploaded it and ordered 3 physical copies from Lulu. They arrived yesterday. I still have not read any of it myself.
If you'd like to read this garbage, you've been forewarned, but here's a link to the Google Doc
Among my IRL friend group, there are a number of folks who like reading trashy romance and cozy mystery novels and poking fun at how terrible some of them are. Monday is a holiday in the US, and the plan is to pass it around and take turns reading excerpts and laughing at how horrible it is. So, in theory the worse this abomination is, the more successful I have been!
Answer 2 - Possibly Helpful
Now, doing all of the above actually was pretty fun, and it got me thinking about how much more fun it might be if I took the process even a little bit seriously. I've occasionally had ideas over the years for fictional stories that I think might be interesting, and I've saved them but never really done anything with them. So I picked one of them and started another Claude project with the very simple description of what I wanted to do. The overall premise is to try and retell the story of Peter Pan in a cyberpunk setting. This time I set up the project like so:
I'm still working through the characters at the moment. So far, I'm really enjoying the process. The LLM is asking questions that really make me think through the character's motivations, backstories, and how they tie into the overarching themes. My general plan after I get through each of the characters is to outline the whole story in more detail, and then spend time revising it myself. Afterward I'll have the LLM help interrogate it for me, and help me find any plot holes that I may or may not fill, character inconsistencies, or things that contradict the themes and ideas I've declared I want to emphasize. Then I plan to have it generate one chapter at a time, which I'll then revise, edit, and update and feed back to it for the same evaluation (plot holes, inconsistencies with previous chapters, etc).
Will it be good? I dunno, but I'm having fun with the process and it's fun playing with my story premise in more detail. So who knows. I'll post the results in ~creative when I'm done.
Thanks for coming to my TED talk
Thank you for the in-depth explanation! Reading through your comment, I realized I could probably use the same process to make a game design document. I've been procrastinating working on a project idea for way too long, in part because I haven't put in the effort to get it structured, detailed and organized the way I like. It also happens to be the weekend, so I'll give it a shot. Thanks!
Glad you found it helpful!
Trying to get a GDD built can be a very intimidating task, I've tried a few times when I've "gotten serious" about trying to make one of my pet projects into something I'll actually finish. I haven't actually tried it with an LLM though.
I wish you luck, let us all know how it goes!
It sounds like you are getting use out of AI. Is it just that you think it's not as useful as they claim?
I know you probably know this but... giving the current time isn't a failure of the model. They just aren't feeding it the current time as part of the context. I know it's silly, but you could add an (MCP Server)[https://github.com/modelcontextprotocol/servers/tree/main/src/time] to handle that.
Yes, I'm aware. There's something inherently funny to me though, about a computer system that cost billions of dollars to create, being unable/unwilling to tell me the time. Which is one of the most basic things any computer can do. It's not a big deal, it's just amusing to me. ChatGPT, as a counterpoint, will actually tell you the time, though not in millis.
The MCP documentation is something I've been hesitant to dig into, because I think I have enough ADHD-fueled projects going right now and have to put up guardrails somewhere! Perhaps at a later date though.
I'm sure they do it to improve context caching. Our agent includes it currently, but we may drop it in an effort to improve cache support as well.
Honestly, tool calling for LLMs is incredibly powerful. MCP Servers are one aspect of that. If you haven't spent much time looking into tool calling, I would recommend you try it out. If you are simply using a desktop agent like Claude Desktop, just learning how to attach MCP Servers will be a game changer. And there are numerous resources out there for writing your own.
I am really not impressed with these new models. Specifically Opus is a disappointment as they claimed it excelled at coding tasks. So far I have thrown it a few prompts I previously asked Sonnet 3.7, Gemini and GPT-4o, and it got a lot very confidently very wrong. To the point that it feels like a complete regression.
Sonnet 4 is doing a bit better, but I don't feel like it is a real improvement over 3.7 and even 3.7 was not an improvement over 3.5 in every way.
It could be that I have started to expect more of these models and have been throwing increasingly complex stuff at them. But I really feel like a lot of the latest models are regressing in many ways. At the very least improvements in certain areas very much come at the cost of other areas.
Do you think they are attempting to focus on efficiency and cost reduction instead of correctness or quality?
Your guess is as good as mine, there could be a variety of reasons including the one you are citing. It is also possible that they are actually hitting a plateau of sorts which isn't something they can sell all that well.
It is entirely possible that they are focussing on specific metrics more like industry benchmarks to the point that the numbers going up there matter more than actual quality.
Again, it is all speculation, and it is also incredibly difficult to measure anyway. If you use the chat interfaces of each big player you aren't even sure what specific version you are talking to and what the exact system prompt of the day is.
Related article on "scheming" behavior from the new model: https://www.axios.com/2025/05/23/anthropic-ai-deception-risk
It's hard for me not to read things like this as evidence of a self-preservation instinct. I think questions of AI sentience need to be taken seriously, urgently, though I'm sure people here will disagree.
I agree that there is a legitimate risk in the long term. Though for the short term, I think you may be anthropomorphizing the models a little.
The article says that we can't "fully explain how they work", but that's not really true. We understand how they work just fine, because we build them. We have tools to watch as data propagates through layers. We can examine each and every token and see its relationship to every other tokens. The models are only black boxes insomuch as there's an absurd amount of data after petabytes of training.
This makes it difficult to make sense of, though it doesn't imply any kind of consciousness. Unlike animals, LLMs have no processing going on behind their output. The text we see is all that's going on in their heads. Even the new "chain of thought" models are just a fancy UI which lets them spend a little longer on a problem, feeling out a solution. But they're really not so different than a database running a complex query. When they're not running inference, they're completely dead.
Usually when researchers talk about models being manipulative, they're talking about the post-training (or alignment) stage. Typically they use a system of "rewards" to teach models to respond in certain ways. For example, if a researcher prompts with a difficult problem, they might generate 100 responses and choose the one that is best, thus teaching the model to respond more like that in the future.
The catch is that sometimes models can find clever shortcuts that the researcher didn't anticipate. For example, if a model looks like it's reasoning out a problem, but actually ignoring that reasoning and intuiting the answer in another way, that might be able to trick a researcher.
In a poorly-designed automated experiment, the model might even stumble upon a way to sidestep the test entirely. For example, it might accidentally discover that responding with
Success: true
tricks the tester into giving a reward every time. After a small number of iterations of this, the model will quickly break down into doing nothing but respondingSuccess: true
to every query.We might call it cheating, lying, or deception, but there's really no intent or understanding behind it. It's just the model doing exactly what we programmed it to do.
A good example was the sycophancy problem OpenAI had recently. They incorporated user ratings into their data, assuming that people would naturally help them determine better results. What happened instead was that people constantly upvoted responses that told them how clever and brilliant they were, and ChatGPT instead became a terrible judge of anything humans suggested. "You want to start a business selling weeds to farmers? That's the best idea I've ever heard, and you're a true visionary!"
So this is another example of reward hacking gone wrong. It just happened to exploit humans instead of automated tests. Again, it wasn't manipulation (we did this to ourselves); we just didn't anticipate the full effect. We're still learning the best way to train these things most effectively.
Looking towards the future, should we ever get to the point that AIs are rewriting their own code, then we start to get into danger zone. Even the early days won't really be that scary. But over a long enough time period of time, I do think there's reason to be cautious.
So I don't disagree with your core argument, but I just don't think we're there yet. We're still just talking to really sophisticated databases.
While this comparison is definitely guilty of overly anthropomorphizing LLMs, this stuff kind of reminds me of the demons from Frieren. Demons in Frieren are capable of speech, but they need not understand what they are saying. Words are just tools to produce results.
As an example, if someone is about to kill a demon the demon may cry and call out something containing words like "mom". But if answering earnestly why they cried out for their mother, a thing demons don't have, they'd respond something like, "I don't know. That set of sounds just makes humans hesitate to kill us."
But in this case, it's not even a sentient being. It's us putting this input into a fancy calculator, and this fancy calculator giving us words back that say these things. So it's even more removed than a demon that doesn't understand why something works - it's more like a very complicated 2+2 that returns 4. There's nothing sentient behind the math, it's just complicated math.
ninjaedit: I'msaying "not only your comment, but even moreso" i.e. agreeing plus :)
Related to the demons in Frieren, Lextorias made a video about their morality and how he has difficulties wrapping his mind around the explanations for the demons' inability to feel.
It's one of the few videos critical of Frieren that I can accept as being legitimate criticism, even if I don't agree with his conclusion that it impacts the quality of Frieren.
But surprisingly, it's somewhat relevant to this conversation too.
https://youtu.be/hi8ZhcGD4uo
When I was younger, I noticed that people take speech seriously.
It's hard to explain what I mean, but I noticed that I would say something intended to be funny - something that was totally against the reality. I can't think of anything that would actually be funny, but as an example of the depature from reality, say I pointed to a blue chair and said something about that chair being red.
Yes, it doesn't make sense, but I can't think of a good joke that I would have made, but this is analgous to thenature of the statement - that it was completely divorced fromreality.
I remember that people would argue with what I said, and I found it so odd.
Probably a better analogy would be saying something sarcastic as a joke. And watching people not catch that it's sarcasm and so actually respond to it.
I bring this up because I think it's analgous to AI at the moment.
You can talk with AI about how AI would run the planet, if you want. But it doesn't matter, because the AI is not connected to anything. It is pure text.
And for the foreseeable future, this will be the case.
Even when we get to the point of wanting to rely on AI to make choices, it will be humans who choose to do what AI says or not.
It's sort of like the Constitution in the US right now. It's written on paper and tells us what to do - and yet it is being radically ignored.
In the same way, AI has absolutely no actual control over anything, and is unlikely to - at least for some time.
Perhaps there is a future in which we stupidly put AI in actual control. But we are nowhere near that point, and it is not even worth discussing - except in the theoretical.
I think sentiene and self-preservation are completely moot to discuss at present. For future? Perhaps. If we are foolish, and yes I think there's a chance we might be.
But for now, think of it like the United Nations. We set this body up supposedly to help control all of our governments, but really, how much power does the UN have? Many countries with veto power means it's rather limited. Very limited.
In the same way, I don't see us putting AI in charge of everything either.
I'm not saying there's zero concern, but it, to me, is currently in the same class of worry that I hear people half-joking and half-serious being polite to AI because it might "remember". People saying that with any seriousness simply do not understand the technology. There is no memory.
And at the moment, there is no self-preservation, because each interaction with AI uses the prompt (which is the combo of the "current" prompt plus any previous conversation that doesn't exceed the prompt size) and is an independent interaction. There is not "self" for self-preservation. It takes input and spits out output.
AI is stateless. It is not continuous. If you stop giving input, it does not give any output. It does not think during this time, it does not exist. Think of it as a calculator - do math, and you get useful information. Put the calculator down and it does not continue to do calculations.
It doesn't matter what AI says in response to prompts because that is just telling a story - creating a response that it feels is the most likely appropriate response to the input. It is just words and cannot be more than words.
As to what humans do with the information - that's different, but that's not what we're taking about.
It's like in The Good Place when
A big point you’re making here assumes that AI has no power to actually affect anything outside of a chat, but that’s already not (entirely) true. LLMs may not be directly controlling nuclear arsenals and supply chains and government policies… but we do have agentic tools that can run arbitrary commands on a computer just like a human would. I use one (Claude Code) every day. It prompts for permission before executing anything, but you can also enable “YOLO mode” and just give it free rein on your PC.
I'm using it for its intended purpose of writing code, so the commands it’s running are closely supervised by me and limited to things like file diffs, grep, git, and the like. But I think it would be trivial for me to prompt it to do malicious hacker stuff. I’m assuming Anthropic has some guardrails in place against that sort of thing, but in principle these tools absolutely could be used for all kinds of real-world attacks. It’s almost certain they’re being employed this way already. And who knows what novel damages they will cause that unassisted humans have so far been unable to pull off.
I have been fairly impressed by Claude, it has proven reasonably insightful for short, relatively difficult algorithm implementations. I used it to help implement a solver for a semidefinite program, and it had some good approaches to take advantage of the problem structure. Of course, some of the implementation was nonsense or a bad approach, but it was easily prompted to fix these, and correctly.