45
votes
Who’s liable when your AI agent burns down production? How Amazon’s Kiro took down AWS for thirteen hours and why the ‘human error’ label tells you everything wrong about the agentic AI era.
Link information
This data is scraped automatically and may be incorrect.
- Title
- Who's liable when your AI agent burns down production?
- Authors
- JP Caparas
- Published
- Feb 21 2026
- Word count
- 164 words
From the article:
*The delete heard in a datacentre in China"
This is a good essay about the hazards implicit in agentic AI processes, broadly applicable outside Amazon. It discusses the human tendency to automation bias. It also highlights the way Amazon has deflected responsibility for design failures in a way that U.S. laws frown upon, citing a couple of examples (Knight Capital's $440 million loss due to a test algorithm accidentally left running, and Boeing's 737 Max crashes).
I will delete this in the future.
I know an Amazon employee, and I can confidently say there have been at least five major outages caused by LLM tools deleting a production stack or otherwise fucking up a configuration. These are internal Correction of Errors documents (called COE's internally).
The default permissions are expansive, and there are no good methods for gating certain tools or commands. The internal guidance encourages highly autonomous agents with too many permissions. They've ratcheted down any access to external tooling, and the internal ones end up like this.
Why delete?
Understandable in the context of potentially revealing a connection to someone who could lose their job, be prosecuted, and/or sued into penury by a giga-corporation for violating an NDA?
Heavily redact at the very least.
Don’t worry, I’ll write my own Gen Z reinterpretation of your comment.
Jeffy AI oof… big oof… AI bot… 1, 2, 3, 4, 5… not 6… 1, 2, 3, 4, 5, no server… 404… remove the server… outage… customer infrastructure… remove the server…
Preventative measures… 🙅♂️ impossible… permissions… all of them… outage… 1, 2, 3, 4, 5…
Mature tech companies often have a "blameless postmortem" process where they create a full history of an incident and make recommendations at multiple zoom levels about how to fix the system so that nothing like that happens again. As the name indicates, the focus is on fixing the system, not blaming workers for mistakes. (That's reserved for situations where there is clear malice.)
I expect that process will continue to work well with more AI automation. "Someone accidentally deleted the production database" isn't a new problem and the safeguards you need are similar.
Legal liability isn't a good lens to use when fixing the system.
Except that the outage was blamed on a customer's employee, who would have had neither access to Amazon's "blameless postmortem" process nor any internal discussion of how to apply best practices and make systemic changes.
Kiro is a product that was released for public use without appropriate guardrails, and a liability framework is applicable.
I don't have access to this article, but looking at the Financial Times article (archive) that it's apparently based on, that doesn't seem right? It sounds like the "user error" was that of an Amazon employee, so this is basically an Amazon self-own.
Also, if a customer has any way to take out an Amazon service, using AI or not, that is definitely an Amazon bug and the outage would be worth doing a post-mortem on.
Thank you for the correction. Amazon posted its own rebuttal to the FT article, which doesn't really exculpate them on the underlying point.
Even if the risks for agents are identical to those of user error, there are changes in governance and systems design needed to mitigate and contain the blast radius of known failure modes. I ran across a playbook (paid content) that provides a good framework for what should have been done before turning the Kiro agent to act without supervision.
Blame the intern.
Amazon has very good blameless postmortems, I was quite surprised.
Unfortunately, the company pits teams against each other, which can really hurt communication.
They're encouraged to push post-mortems on other teams. It's a key part of the politics, especially at the senior level and up. Even the best orgs are subtly toxic due to the perverse incentives with stack ranking etc.
I'm just going to reiterate my concern from this comment that I'm worried my main marketable skill is going to be "legal meat shield for the AI" and point out that the "from when I said it to when it happened" time was <10 days. (Of course I was talking there about being the "head" of an AI-driven company, but I also was concerned I'd wind up doing PLEASE from How I Met Your Mother.)