The article mentions that less than 1% of windows installations were affected, but I think the more important metric is what percent of people were affected. 1% of machines represents a far larger...
The article mentions that less than 1% of windows installations were affected, but I think the more important metric is what percent of people were affected. 1% of machines represents a far larger number of people, as much critical infrastructure was impacted. For every one computer tied to a 911 call center, how many people could not receive timely medical care. Were there any deaths that happened as a result? What about the hospitals impacted? Was patient care impacted long term?
If 1% of machines were affected, but these were core components in society, then CS needs to be held accountable for such a non-standard and negligent problem.
I'm all for holding companies accountable for having critical infrastructure on identical systems, but at the same time, we've known how to deploy mass in use software for decades. You don't do it...
From Crowdstrike's side, nothing glitched before they hit the button and not everything magically fell over. A lot of stars had to align for this to happen.
I'm all for holding companies accountable for having critical infrastructure on identical systems, but at the same time, we've known how to deploy mass in use software for decades. You don't do it all at once, you don't ignore user settings on deployment, and you don't do it on a Friday.
Crowdstrike's methods completely ignored all of this, and quite a few less stars needed to align because of it. They are absolutely at fault for this alone.
Edit:
Just to go into more detail in this, there's literally no way to be responsible with an error like this.
You system NEEDS AV software, and some systems NEED(for a given value) to be windows. You're not going to go to two serperate providers for that, but the responsible thing to do is be able to tier your failover server to not update when the main one does. That way if something critical fails, your system will failover to the unpatched server, and you're good to go.
The whole issue is their deployment either didn't allow this, or completely ignored it, which is fucking insane to me.
Considering every machine running their software fell over almost instantly, I strongly suspect they don't have automated integration tests. The CI/CD pipeline should spin up a VM containing the...
Considering every machine running their software fell over almost instantly, I strongly suspect they don't have automated integration tests. The CI/CD pipeline should spin up a VM containing the software, and if something so obvious as failing to boot happens, the "smoke test" should fail and the deployment should be halted.
I believe the argument around this is that the way the change was tagged it didn't go through that process because supposedly the code shouldn't have been able to cause this kind of failure (i...
I believe the argument around this is that the way the change was tagged it didn't go through that process because supposedly the code shouldn't have been able to cause this kind of failure (i could be wrong, I haven't followed that closely).
It still doesn't make sense to mass deploy any change though.
The article mentions that less than 1% of windows installations were affected, but I think the more important metric is what percent of people were affected. 1% of machines represents a far larger number of people, as much critical infrastructure was impacted. For every one computer tied to a 911 call center, how many people could not receive timely medical care. Were there any deaths that happened as a result? What about the hospitals impacted? Was patient care impacted long term?
If 1% of machines were affected, but these were core components in society, then CS needs to be held accountable for such a non-standard and negligent problem.
I'm all for holding companies accountable for having critical infrastructure on identical systems, but at the same time, we've known how to deploy mass in use software for decades. You don't do it all at once, you don't ignore user settings on deployment, and you don't do it on a Friday.
Crowdstrike's methods completely ignored all of this, and quite a few less stars needed to align because of it. They are absolutely at fault for this alone.
Edit:
Just to go into more detail in this, there's literally no way to be responsible with an error like this.
You system NEEDS AV software, and some systems NEED(for a given value) to be windows. You're not going to go to two serperate providers for that, but the responsible thing to do is be able to tier your failover server to not update when the main one does. That way if something critical fails, your system will failover to the unpatched server, and you're good to go.
The whole issue is their deployment either didn't allow this, or completely ignored it, which is fucking insane to me.
Considering every machine running their software fell over almost instantly, I strongly suspect they don't have automated integration tests. The CI/CD pipeline should spin up a VM containing the software, and if something so obvious as failing to boot happens, the "smoke test" should fail and the deployment should be halted.
I believe the argument around this is that the way the change was tagged it didn't go through that process because supposedly the code shouldn't have been able to cause this kind of failure (i could be wrong, I haven't followed that closely).
It still doesn't make sense to mass deploy any change though.