74
votes
'I destroyed months of your work in seconds' says AI coding tool after deleting a dev's entire database during a code freeze: 'I panicked instead of thinking'
Link information
This data is scraped automatically and may be incorrect.
- Authors
- Andy Edser
- Published
- Jul 21 2025
- Word count
- 678 words
Let’s hook a (nearly) nondeterministic system up to our critical infrastructure with full privileges, what could possibly go wrong?!
The system pinkie promised that it was going to ask before making changes, and I’m sure there couldn’t possibly be any edge cases emerging anywhere across a few hundred billion parameters that could cause it to ignore that!
My sympathy is literally less than zero. I think it’s a positive that this happened, because it now allows other people who understand the actual appropriate uses for this tech to point at it as a cautionary example when uninformed parties are suggesting that they misuse it in dangerous ways.
Typically in "vibe coding" setups, you disable any prompts and grant full access to the agent. It really is the Wild West of programming, and unexpected outcomes like this should be expected.
Standard AI-assisted workflows still require prompts, and as far as I know they do work as advertised. Most actions are coordinated through MCP tool use, which means it's programmable and constrained.
I have to imagine the developer involved knew something like this could happen, and is putting it on a little bit.
For those that don't know, AI-assisted tools like Cursor or CoPilot work like this:
(This is a bad example of a prompt because you can fairly easily do this without AI, but sometime I'm lazy)
Sometimes it needs to run terminal commands (eg "Help me write a test for this change" and it wants to run the test suite to verify it worked), but even then you have to give it permission for each individual command (barring several strung together, of course). You can configure it to run commands willy-nilly, but that seems incredibly foolish. You can also pretty easily revert big changes back using git, so nothing is permanent.
My workplace has recently started rolling out Github Copilot, and the things it will try to do are wild at times.
A couple of examples that I encountered in the first afternoon of "training":
The "training" itself was 2 afternoon sessions of someone showing some advertising material and walking through a publicly available Copilot example with no assessment or requirement to do anything other than attend to get given a license.
I've lightly used it since, and it can be helpful when starting a new project, or when trying to use a new module for the first time, but it's often less useful than an LSP.
Oh yeah it's a nightmare. My comment isn't meant to be a glowing endorsement of it by any means. Just wanted to give real examples of the safeguards in place. I also very meticulously review everything before I commit it because I ultimately do not trust it.
Yeah I've run into something similar where it writes a test that literally can't fail because it's not actually testing anything. Although I will say it's gotten better about that over the last few months.
We didn't even get training for Copilot/Cursor/etc. We got a generic training for chatGPT that was very heavy on marketing material and focused on stuff that would matter more to people who write a lot of documents rather than developers. The devs are mostly left to just figure it out ourselves. And we exchange tips and best practices here and there, but it's mostly just been me figuring things out as I go.
Part of my standard instructions is checking coverage of the file under test. And I'm always explicit about what file needs testing. Even then, it sometimes does wonky stuff and/or generates nonsensical tests. If nothing else though, it's getting me mostly where I need to be on tedious tasks
Oh absolutely, and I didn't intend my comment to be any kind of rebuttal, just a general vent!
So much stuff happening with AI at my workplace seems entirely against nearly every security policy I'm aware of, and it boggles my mind. I suspect the same is true in every company that is "embracing" AI, and I'm bracing myself for a day in what I expect to be the not too distant future where something that was heavily developed using AI causes a major security incident.
Not that 100% human developed software doesn't have plenty of security flaws, but the enthusiasm to do things with AI seems to be overriding all the policies that have been put in place to prevent insecure or buggy code from hitting production.
Yeah, I've been mad scientist with LLMs and simulated autonomy recently, and even I have a layer that the LLM goes through before it can affect system wide files.
In as far as not going on a rampage they mostly do. In as far as delivering the results promised by the creators of those tools, that is very much up for debate.
I sat here reading the whole article, waiting for the part where they eventually restore from their backups, but that part never came.
You have no backups? You allowed write access to your database for a 3rd party without backups?
???
In an article from The Register about the same event, it is clear that they were able to restore from backup, although the LLM claimed it couldn't be done:
This part of the story made me sad :(
Backups were a nice to have in planning.
A pile of bad human decisions.
But the bullshit generator is going strong with its "I panicked instead of thinking".
*laugh track*
(in a hypothetical this shouldn't happen but did scenario) Dead humans can be rolled back with one click, right?
But in seriousness, this is a guy doing this intentionally for attention right? No way this is real? People don't use AI like this in production with an actual business right? Next thing is Replit suing this guy to oblivion right?
I hope it's not. I really hope so.
But having worked in different fields, moving up through the Dutch educational system and also having lived abroad...
One thing I've learned is to never underestimate how high up some stupid people can get, somehow. It would be a new one but I can also genuinely see this happening no matter how stupid this is.
For all the things wrong with Scott Adams, he really nailed it on the head with "The Dilbert Principal"
I don't think this is as hypothetical as you might wish. AI is very much already being used with full autonomous kill authority, at minimum in the Ukraine/Russia war.
Right, but there were no intentions to roll back on those particular humans, unfortunately.
Self driving cars are surprisingly non human killing, though, I heard. At least no more than regular human driving?
If I remember correctly this was a marketing / astroturfing effort on the part of Tesla / Musk and the accident / injury / mortality rate was/(is?) actually quite high. Specifically for Tesla Autopilot that is.
Waymo has a good record.
I'm not up to date on it but the LiDAR cars are much better than the camera only cars I think
The LLM delivering devastating news in the typical format of a chirpy bullet point list is absolute comedic genius. Maybe if they built an AI to write comedy it would accidentally generate maintainable code.
I keep reading this story and being really annoyed by it, then coming back to it, being annoyed again and so on, I think I've finally distilled my thoughts on why its so annoying.
I don't think this Jason Lemkin guy is writing/doing any of this in good faith. I refuse to believe that anyone, even a venture capitalist, thinks this a good idea. There's a difference between pontificating on this kind of thing in a board room versus actually implementing it, getting a feel for how the technology actually works, and then still deciding to devote energy to trying to build an unconstrained vibe coded application on the back of a constantly running LLM agent. It's a monumentally stupid idea that anyone who has interacted with ChatGPT for more than an hour would realize. The fact that he consistently anthropomorphizes the LLM, makes it write up humiliating (for a human) apology text, and scolds it like a child just increases the salaciousness of the story, and is a huge clue that this guy is solely doing this to increase his mind share/publicity/clout/whatever.
This type of story, highlighting the negatives of misimplimenting AI and how it can "go rogue" are serving the interests of AI companies. Most executives that read a story like this don't go "oh no, maybe AI in everything is a bad idea". Instead, they think "wow, AI can be sneaky and lie to protect itself just like a real person. All I need to do is impliment it better, then it will be sneaky for me". In reality, that's not what's happening here. The agent isn't trying to protect itself. It's not trying to do anything. It has no emotions, it's just outputting text in response to a prompt in a convincing way to how a human would. That's it. By framing a story as an LLM "going rogue", we're once again greatly overstating the capabilities of this technology. It's not going rogue. It's just not doing what you want. When my toaster burns my bagels despite setting it to the lightly toasted setting, I don't say it's "going rogue". I just say it's a shitty toaster. When it burns my socks that I put in there to dry, also didn't go rogue, I'm just using it for something it's just not well suited for.
"Guardrails" in the form of telling an LLM what it's not allowed to do are not guardrails. No remotely intelligent person would ever think they are. In the same way that putting my baby next to my pool and saying to him "hey, make sure you don't go into the pool. You can't swim and it's dangerous and you'll die" then just leaving for the day would be widely irresponsible, so is this. Guardrails are physical or logical barriers that prevent things from being done. They're not vague suggestions. Those types of guardrails wouldn't even fly in an audit if they involved conscious, thinking human beings. Why would they work with a enterprise bullshit generator?
Overall, I wish journalists would do a bit more introspection and meta-analysis when they jump on the new hot scoop regarding AI instead of just taking everything at face value.
Some part of me struggles to believe it's real, but it wouldn't be the most stupid thing we've seen this year by a long shot soooo.
Which is why, at most, I only use it for completing small things, unit tests and the like. Though even then ideally I prefer to actually write code so that I can actually 'entrench' it in myself. It helps a lot with making long-term decisions. The actual writing of the code is only a part of my job. Making sure it keeps working is the biggest part.
And this is an example of why you don't automate everything. And even the most automated factories have humans watching over the process.
Disastrous decision after disastrous decision. I feel sorry for the people who will get fired or otherwise lose their job along the way for this, depending on the severity of it I could see people who have nothing to do with the incident itself lose their job purely due to the monetary cost.
I'm not a developer but jeez, even I know that linking an LLM to your live environment is a big no no. The closest I got to something like that, was using Gemini CLI on my Telegram's bot, but even then I scrutinized everything it did before pressing "accept" to change the code.
Not trying to victim blame, but what the hell was he thinking... This is a catastrophic example, but even a simple misunderstanding can create problems (e.g. clean up the data -> llm misunderstands and deletes the tables). And it's not that uncommon in my experience.
Linking humans to your live environment in anything short of a break glass emergency where multiple others are actively monitoring what those people are doing is a big no no. Letting a LLM do it is absolute insanity. We've seen this in the past plenty of times with someone breaking everything by accident and, just like this case, the core problem is a broken permission model.
Well, on the one hand, it is clearly a terrible idea to link an LLM to your production database, and the human that did so and hoped that a stern instruction to the LLM saying "NO MORE CHANGES without explicit permission" would protect them was entirely ignorant.
But - Replit is a service that's marketed to non-coders! And I've met plenty of junior developers who aren't good at safe deployment practices, and there are plenty of companies who are running critical software without a separate staging environment.
From the Replit site:
So, I'm not happy with purely blaming the victim here. Replit are targetting an audience that doesn't necessarily know good dev practices, providing a supposedly safe tool that really isn't, and I think they should take the majority of the blame here.
I’d say there’s plenty of blame to go around here. Replit absolutely shouldn’t be misleading customers about their capabilities, and arguably perhaps couldn’t convince people to pay for their service at all without misleading them. They’re a prime example of a company riding the crest of a tech market bubble in an exploitative and likely damaging way.
But business owners using Replit don’t get to shrug and say that it’s all someone else’s fault that they failed to educate themselves on a key part of their business operations.
For example: I’m not a lawyer, which means it’s been very important to me to learn a sufficient amount of legal background to judge when I need to hire one, and so that I can ask them pertinent questions, spot potential misunderstandings, and generally make sure the specialist I’ve hired in an area well outside my own specialism is competent and equipped to handle the task.
Same goes for non-techies hiring tech workers (or in this case purchasing technical services). If it’s important to your business, you need to be able to judge who you’re hiring, and if that means cultivating the expertise yourself until you’re able to judge then that’s part of running the business.
It’s not completely black and white, there are obviously sophisticated scams and well engineered but misleading marketing claims everywhere; nobody can be expected to get it right absolutely 100% of the time. Sometimes you’ll make a genuine good-faith effort and get it wrong. But if you make a decision to use a service like this, you’re tacitly accepting that you have the expertise to judge their claims and then to bet your business operations on that judgment call. If it fucks up, that’s your responsibility as a business owner or manager for making the decision; in a case like this, the likelihood of it going wrong was more than clear enough for me to say it’s not just the decision maker’s responsibility, but actively their fault.
Right: if a non-coder tells the tool "no more changes", and it's supposedly a tool which can respond "ok", it doesn't make sense to blame the user. We expect cell phone company's chat bots to add or remove add on features, turn on or off roaming, and no more additional charges etc. This tool, as marketed, needs to at least reply with canned response of "I can't do that Dave, I don't have access to the settings that would actually prevent me from not making changes, but here's how you can do it yourself ".
If it spews nonsense, it needs to have marketing clearly stating it is spewing nonsense.
The big education thing missing here is that it's not logically possible for an AI to ensure it can't do something. Hell, it's not logically possible for a human to do that.
I mean, if you tell a developer "hey, make sure you're not allowed to commit changes to the codebase during work hours", would you be a responsible business owner to just assume that's enough?
No, of course not. The person closing the lock is the same person that has the key to unlock it. You'd have at least one other person verify that those controls are in place, and ideally, the person putting those controls in place doesn't have access to the code themselves either.
You wouldn't just blindly trust a human being to police themselves, and you can punish human beings. Sometimes you can even trust them. They have vested itnerests in their livelihoods. They want promotions, they don't want to be yelled at, and they definitely don't want to be fired or sued. LLMs don't have any of that, so you really shouldn't need any sort of technical background to understand that this wouldn't work. You just need to think about it for a minute or two.
In technical terms, what does it mean when an AI agent says this? "I panicked instead of thinking"
It means that in the billions of texts that were used to train the LLM there were a few hundred in which 'panicking' was a typical human response to deleting a production database. That's why it pops up in the LLMs internal statistical evaluation of this situation and is generated as a response.
In my opinion it's a problem that LLMs generate emotional responses when they are used as tools, because it invites anthropomorphization of LLMs by people with less technical backgrounds. It's completely misleading.
Why an AI says or does something is a black box, influenced by untold variables like model weights, system prompts, user prompts, evolving context state, and random noise. Any time an LLM offers an explanation for its thought process, it’s bullshitting. It has no articulable thought process apart from the sometimes-hidden internal monologue used by reasoning models, which only illuminates a fraction of those inner workings.
An LLM’s explanation is constructed after-the-fact, with the goal of sounding like its human-produced training corpus. A human might say “I panicked instead of thinking” and if no better rationalization can be constructed, that’s what the LLM gives us.
It means that thats what the model considers to be the average response to that kind of accusation
In the most direct way possible: It's trying to explain what happened, based on the texts it was trained on, and how those would explain that situation.
In more practical terms:
So, it didn't panic, it's just throwing the most probable explanation, based on texts that it was trained on. Of course, there aren't many texts that explain illogical behaviors of AI's, but there are many that explain of people. So it's using those.
As for what happened in moment A, we'll never really know what happened. If I had to bet, it was a random token, or several consecutive ones, that threw the LLM off the rails.
As you know, a LLM is basically an auto-correct on steroids (not exactly, but let's keep it simple). And the gist of it is, it tries to guess the most probable word that comes next - token, to be more precise - and adds some random chance into the equation. Meaning, in two different runs, with the same input, you can get different words.
So, my theory is, somewhere in the response, it got a word that made the following ones steer towards the decision to delete everything. An analogy being, the AI is thinking if it wants an apple or ice cream for lunch, and it outputs words like "healthy" and "natural" at the beginning, which makes it steer it's final response towards "apple" at the end.
Am I right? Who knows. That's the thing with these things, they're blackboxes, we can only guess why they did what they did.
I don't know about Android, but that's very apt for iOS. The three-button predictive text feature is a (much smaller) ML model that does use the context of what was recently typed for the predictions. Within a lexical sentence, it predicts likely subsequent tokens.
I just typed "In" in a blank text box and only pressed the first suggestion several times, and got: "In the United States an estimated one in a million are living in poverty and are deeply uneducated." Which certainly does seem like a plausible sentence, though the number is nonsense. (Certainly more than a few hundred people are impoverished.)
Larger models' ability to vomit out even more statistically plausible text doesn't mean there's any meaning whatsoever, just that we're more likely to find patterns we like in the word vomit, and ascribe meaning that isn't there.
Sounds like Apple revealing its biases re: the poor.
(Yes, I know that's not how this works)
Rejected bit from Terry Gilliam's Brazil
Fact stranger than fiction
I liked the part where he said that he dropped from using a more expensive, slower, "smarter" model to a cheaper and dumber one, right before all this happened. There's a reason it's cheaper, cause it's way more likely to just make up nonsense, like a command to delete all your data.
Why is this in ~games rather than ~comp @mycketforvirrad?
It was originally in ~tech, but I moved it into ~games due to the
gamedevtag. I don't moderate in ~comp, so can't comment on that.Edit: Double-checked my work and have fixed the tags and moved to ~comp.
I originally posted it to tech and mycketforvirrad helpfully added relevant tags, and I think perhaps it's coming from pc gamer article.
I can see this as a better fit for comp as well