Supposedly pre-market trading already has Crowdstrike down 21% I've been up for the past 6 hours now after getting a beautiful 45 minutes of sleep. Been pulled into four major incident calls,...
Supposedly pre-market trading already has Crowdstrike down 21%
I've been up for the past 6 hours now after getting a beautiful 45 minutes of sleep. Been pulled into four major incident calls, written up some quick documents on fixes for various environments, and trying to walk people through how to fix our thousands of servers because I don't want to stay up any longer lol
Absolutely gobsmacked this made it through any level of QA since it is affecting all flavours of Windows, not just specific combinations. And didn't we learn from the big Meta outage (and literally every other Friday outage) that you don't push shit on a Friday??
At least I can probably take the rest of the day off...
[insert that's a big list meme here] 911 services of eleven (so far) different states, American, United and Delta flights grounded along with dozens of others across the world plus several major...
911 services of eleven (so far) different states, American, United and Delta flights grounded along with dozens of others across the world plus several major airports grounding all flights, US DOJ, DC and NYC mass transit systems, Oracle, Nokia, broadcasting stations, banks, railways, Singapore's stock exchange, Paris Olympics, entire supermarket chains, hospitals cancelling procedures, Mercedes, McLaren, Aston Martin, and Williams F1 teams, universities, law firms, pharmacies, casinos, train networks, petrol stations, stadiums, fire alarm systems...
emphasis mine on some of the more interesting/frightening examples
Here's a comment from Hacker News that made me realize how serious this is (via user jmcgough in this post)
Here's a comment from Hacker News that made me realize how serious this is (via user jmcgough in this post)
Took down our entire emergency department as we were treating a heart attack. 911 down for our state too. Nowhere for people to be diverted to because the other nearby hospitals are down. Hard to imagine how many millions of not billions of dollars this one bad update caused.
The Associated Press coverage mentioned that surgeries are being postponed because anesthesia is off the table without equipment to manage and monitor it. I'm sure radiology equipment is also...
The Associated Press coverage mentioned that surgeries are being postponed because anesthesia is off the table without equipment to manage and monitor it. I'm sure radiology equipment is also affected. Never mind charting...
Saw that and found it humorous. Gotta have something to balance out some of the horrors that are happening. Hearing from some of my old healthcare worker friends that things like their medication...
Saw that and found it humorous.
Gotta have something to balance out some of the horrors that are happening. Hearing from some of my old healthcare worker friends that things like their medication systems are locked out, can't get patient charts to know what anyone should be getting anyway ...oh... and the in-room panic buttons and heart monitors that alert staff when a patient codes (flatlines) are all linked to the same system and don't work. So unless someone is literally within earshot of a patient's room and hears the monitor start to scream...
Was this something that was easy to catch? Or was this an advanced error that no one could have seen? I am a complete laymen on this topic. I’m just trying to gauge how preventable this was or if...
An update to a configuration file (here called a channel file) issued at 04:09 UTC on 19 July 2024 conflicted with the Windows sensor client, causing affected machines to enter the blue screen of death with the stop code PAGE_FAULT_IN_NONPAGED_AREA.[10][11][12]
Was this something that was easy to catch? Or was this an advanced error that no one could have seen? I am a complete laymen on this topic. I’m just trying to gauge how preventable this was or if there was no chance of seeing it coming.
It’s so absurdly easy it makes it extra weird it got out. The bare minimum catch here is “huh we deployed this on a windows machine and it nuked itself”. The best explanation I can think of is one...
It’s so absurdly easy it makes it extra weird it got out.
The bare minimum catch here is “huh we deployed this on a windows machine and it nuked itself”.
The best explanation I can think of is one patch got tested and a different one deployed but that should also be trivial to detect and stop.
It’s why this whole thing is so mind blowing because it MUST involve serious policy, procedure, and culture issues for this to even occur
Yeah, a simple release verification would’ve caught this. There is someone alleging on Reddit that they have proof that crowdstrike developers has the ability to build and deploy from their...
Yeah, a simple release verification would’ve caught this. There is someone alleging on Reddit that they have proof that crowdstrike developers has the ability to build and deploy from their laptops, if that’s true then it’s concerning for sure.
I hope it’s not because this just shows that if a single one of those laptops was compromised you could deploy kernel level malware to every crowdstrike machine
I hope it’s not because this just shows that if a single one of those laptops was compromised you could deploy kernel level malware to every crowdstrike machine
I don't think that's necessarily hard to believe? It seems possible a developer could override safeties to create and push a release candidate without oversight. Ideally there would be some alarms...
I don't think that's necessarily hard to believe? It seems possible a developer could override safeties to create and push a release candidate without oversight. Ideally there would be some alarms going off to get someone to review the rush-release. Getting access to the right engineer's laptop has always been a nice attack vector.
Got a call from the very top at 12:30 AM and have been in triage mode ever since. I'm less useful in this regard and mostly here as extra hands and testing, but jesus what a massive fuckup this is...
Got a call from the very top at 12:30 AM and have been in triage mode ever since. I'm less useful in this regard and mostly here as extra hands and testing, but jesus what a massive fuckup this is going to be.
Even with the "fix" being quick this is jacked on so many levels:
Kernel drivers strike again.
Memory UNSAFE language drivers strike again.
"Oops all crashes" pushed to PRODUCTION, possibly ignoring any configurations admins had up to prevent that.
Pushed ON A FUCKING FRIDAY.
They're HYPER lucky this is an "easy" fix, but even still that was hours upon hours of outage for half the fucking globe. The cost from this is incalculable, and just shoved down every C level's throat what a fucking mess all this single point of failure cloud stuff can be.
This doesn't have anything to do with cloud stuff. The outage was caused by locally running code, if cloudstrike was entirely on prem you'd have the same issue. This is an auto updating software...
This doesn't have anything to do with cloud stuff. The outage was caused by locally running code, if cloudstrike was entirely on prem you'd have the same issue.
This is an auto updating software with no possible way to test or have oversight of the update process issue.
No matter what, your EDR platform is going to be running the same software on all of your agents across the enterprise. Crowdstrike isn't alone in this, but not even having the option of controlling how they're updated is crazy.
If this software was designed in the pre-cloud era where the internet can be costly and not always on, then the silent update feature will not be a thing and there would not be an issue today.
If this software was designed in the pre-cloud era where the internet can be costly and not always on, then the silent update feature will not be a thing and there would not be an issue today.
Cloud was about externalizing server based infrastructure to dedicated companies. If anything, if you were entirely cloud based you would not have this issue. The internet existed before “the...
Cloud was about externalizing server based infrastructure to dedicated companies. If anything, if you were entirely cloud based you would not have this issue.
The internet existed before “the cloud” was popular. If you look at the major impacts of these outages, they must be connected to the internet to work by definition.
This really has nothing to do with “the cloud” and if anything mostly impacts on prem setups and edge devices - so, the opposite of “the cloud”.
I will say if they had any sort of staggering deployment process (which any sizable companies big enough to shard their infrastructure should have) they could have drastically reduced the blast...
I will say if they had any sort of staggering deployment process (which any sizable companies big enough to shard their infrastructure should have) they could have drastically reduced the blast radius of this.
I'm the major incident manager for my corp and have been dealing with this since early this morning. It's pretty bad. The fix is easy but it's a local fix. Meaning that our stores need to have a...
I'm the major incident manager for my corp and have been dealing with this since early this morning.
It's pretty bad.
The fix is easy but it's a local fix. Meaning that our stores need to have a manual fix applied done by the staff themselves. They're being guided by our techies but good lord..
Oh yeah. When you have that kind of problem spread out across... everyone (edit, everyone on Windows) running this software (which to be fair I had not heard of until today but apparently it is...
The fix is easy but it's a local fix.
Oh yeah. When you have that kind of problem spread out across... everyone (edit, everyone on Windows) running this software (which to be fair I had not heard of until today but apparently it is widely used) and every machine needs a human physically present to be able to fix? No remote fix?
Ooooohhhh.... ouch.
another edit: Well, I'll call this a win for anyone running their machines on a Hypervisor platform. At least they can run a fix remotely. Um... unless the Hypervisor they are using on bare metal is... Windows based??? OOOOHHHH, ouch again.
For servers you can reattach the disk to a new VM, take out the offending file, stick it back into the original VM and boot. The fun part comes when everyone else is doing this and storage latency...
For servers you can reattach the disk to a new VM, take out the offending file, stick it back into the original VM and boot.
The fun part comes when everyone else is doing this and storage latency across the cloud providers have shot through the roof.
I didn't have to do that but my direct colleagues did. We gave them and a couple of onsite techs emergency access to the vault. That sucked but it was better than being helpless.
I didn't have to do that but my direct colleagues did. We gave them and a couple of onsite techs emergency access to the vault. That sucked but it was better than being helpless.
The US folks got hit the hardest because they actually had to wake at 4. We were relatively fine waking up at normal hours and riding the coattails of Australia having done most of the...
The US folks got hit the hardest because they actually had to wake at 4. We were relatively fine waking up at normal hours and riding the coattails of Australia having done most of the investigative stuff.
Still not an easy day and I'm sure they've had (or are having) a stiff drink to start the weekend.
I'd run out of fingers and toes if I had to count the number of hospital systems I work with that use Crowdstrike. It's lightweight, easy to manage, and a reasonably cost-effective part of...
I'd run out of fingers and toes if I had to count the number of hospital systems I work with that use Crowdstrike. It's lightweight, easy to manage, and a reasonably cost-effective part of mitigation strategies for the ransomware plague.
A non-trivial number of surgeries, treatments, and patient visits are going to get postponed because of this. As /u/Eji1700 said, I'm baffled at how this update made it from DEV to TST, let alone PRD. The ugliest part is that since Crowdstrike Falcon is a vendor-managed client, the customers don't get to put its updates on a managed patching schedule with a test environment or spaced rollout. Let this be a lesson to us all.
Update via The Guardian:
CrowdStrike president George Kurtz said the problem was caused by a “defect found in a single content update for Windows hosts”.
He wrote on X:
CrowdStrike is actively working with customers impacted by a defect found in a single content update for Windows hosts. Mac and Linux hosts are not impacted. This is not a security incident or cyberattack. The issue has been identified, isolated and a fix has been deployed.
We refer customers to the support portal for the latest updates and will continue to provide complete and continuous updates on our website. We further recommend organizations ensure they’re communicating with CrowdStrike representatives through official channels. Our team is fully mobilized to ensure the security and stability of CrowdStrike customers.
That may be one of the most laughable uses of the passive voice I've ever seen.
Some random dev: "I'm gonna get chewed out if completing my PR is late again, so let's just do a little push to prod to get it checked off. What's the worst that could happen?" More seriously - I...
Some random dev: "I'm gonna get chewed out if completing my PR is late again, so let's just do a little push to prod to get it checked off. What's the worst that could happen?"
More seriously - I have to wonder, given the description of the incident, if the faulty file is even human generated. It sounds like it's a content update bundle of some sort. Perhaps it's being generated by the code deployment system and the bug is actually in the file generation/validation/deployment? It's possible the hole is in the test process itself - that the components of the update were being properly individually tested, but that something with the bundling and deployment was missed in the procedures.
Whatever the actual details turn out to be, this is one of those case studies that will be used as an example in every IT and software dev class from now till the end of time.
Whatever the actual details turn out to be, this is one of those case studies that will be used as an example in every IT and software dev class from now till the end of time.
I was in talks with CS not more than a month ago, and opted for Artic Wolf+ Egress to replace DarkTrace and ESET. Bullet dodged in this instance. It's showing the world just how fragile the...
I was in talks with CS not more than a month ago, and opted for Artic Wolf+ Egress to replace DarkTrace and ESET.
Bullet dodged in this instance.
It's showing the world just how fragile the interconnected "cloud" is though. One vendor can take down half the planet's first world services with a bad update.
Echoing everyone else, how this managed to get through testing cycles to global roll out beggars belief.
Luckily since my company only runs Linux and Mac, we've dodged this bullet ourselves. That said, I feel really bad for everyone who's directly affected by this. German news is reporting that some...
Luckily since my company only runs Linux and Mac, we've dodged this bullet ourselves. That said, I feel really bad for everyone who's directly affected by this. German news is reporting that some hospitals are cancelling all elective surgeries today because of this issue.
EDIT TO ADD:
My wife works as a data center technician at Microsoft, and weirdly enough she didn't even know this was happening until I told her. I guess she's lower-level than the stuff this affects.
Microsoft has made big strides in making Windows a lot more stable than it was when I was young, but it's this kind of thing that still gives me the impression that people who use it for...
Microsoft has made big strides in making Windows a lot more stable than it was when I was young, but it's this kind of thing that still gives me the impression that people who use it for mission-critical applications are completely nuts.
The main reason I don't use it at home these days is because I feel that Windows Update is currently so intrusive that it has essentially turned it into a managed system, and it requires a degree of trust in Microsoft that I am very far from giving them.
I very much appreciate that Apple forbids kernel extensions these days, and locks system folders down with SIP. Security happens at a design level, not by bolting on third-party malware. Third...
I very much appreciate that Apple forbids kernel extensions these days, and locks system folders down with SIP. Security happens at a design level, not by bolting on third-party malware. Third party software should never, ever be allowed to render a machine unbootable or have OS-level privileges.
I have enough daily rage about corporate "security" software on my development machine, so I'm glad Apple keeps it locked inside userland, at least.
Linux absolutely does allow this kind of access. Writing a broken kernel module that consistently panics the kernel when loaded is a rite of passage for budding kernel devs. In this case, however,...
Linux absolutely does allow this kind of access. Writing a broken kernel module that consistently panics the kernel when loaded is a rite of passage for budding kernel devs. In this case, however, Linux provides a separate, safer interface (eBPF) which Crowdstrike uses, rather than a full-privilege kernel module.
(More generally, most Linux systems in the enterprise are not being operated by non-technical end users, so the need for heuristic security software like antivirus is a lot lower; John from sales isn't going to blindly open email attachments on the server.)
This is true. In fact similar thing happened to RedHat / Rocky 9.4 not too long ago, difference being it was RedHat's fault and was fixed with a kernel patch, and it was not as widespread because...
This is true. In fact similar thing happened to RedHat / Rocky 9.4 not too long ago, difference being it was RedHat's fault and was fixed with a kernel patch, and it was not as widespread because you had to upgrade to 9.4 pretty early to have run into it.
When it first broke here people were calling it a Microsoft bug because it only affected Windows machines. It took a hot sec for everyone to realize it was actually Crowdstrike at fault.
When it first broke here people were calling it a Microsoft bug because it only affected Windows machines. It took a hot sec for everyone to realize it was actually Crowdstrike at fault.
I work at a hotel and our systems have all been nuked since a half hour after I came in. This was especially difficult because we had to take in a bunch of guests who were stuck in my city because...
I work at a hotel and our systems have all been nuked since a half hour after I came in. This was especially difficult because we had to take in a bunch of guests who were stuck in my city because of the airline outage, and then they all get fucked by our outage as well.
This is insane! I've heard 8 airports are closed because of this so far. Some airlines already stopping operations. I feel sorry for the poor dev that introduced this bug.
This is insane! I've heard 8 airports are closed because of this so far. Some airlines already stopping operations. I feel sorry for the poor dev that introduced this bug.
Honestly the dev is going to eat shit but they're the last person who should. The fact this EVER got into production is insane, let alone this massively.
Honestly the dev is going to eat shit but they're the last person who should.
The fact this EVER got into production is insane, let alone this massively.
Shit rolls down hill though. Dev gonna get reprimanded. CEO will get a bonus for getting out ahead of the issue and talking everyone down, as long as the shares recover.
Shit rolls down hill though. Dev gonna get reprimanded. CEO will get a bonus for getting out ahead of the issue and talking everyone down, as long as the shares recover.
This is enough shit i'm not sure the company makes it through as is. Granted the CEO probably still gets a golden parachute or some nonsense, but this is really quite special in it's cascading and...
This is enough shit i'm not sure the company makes it through as is. Granted the CEO probably still gets a golden parachute or some nonsense, but this is really quite special in it's cascading and immediate effects.
The company will 100% make it through this. Crowdstrike is probably the market leader in EDR and very well respected in that arena. If this was a consistent pattern I might agree with you, but...
The company will 100% make it through this. Crowdstrike is probably the market leader in EDR and very well respected in that arena. If this was a consistent pattern I might agree with you, but there are absolutely zero widely deployed software companies that have been around for any period of time that haven't had some sort of major outage or incident. Solar winds is still around, Palo Alto is still around, and crowdstrike will still be around.
Palo Alto and solarwinds didn’t take down half the globe in a way that will still be affecting companies for the next week, if not month. There are a fuck ton of systems that now need to be...
Palo Alto and solarwinds didn’t take down half the globe in a way that will still be affecting companies for the next week, if not month.
There are a fuck ton of systems that now need to be physically rebooted and oops that outsourced IT team won’t be doing that.
They might be fine but this isn’t in the same category in the slightest
Palo Alto and solar winds were arguably worse, because they were cybersecurity incidents, not operational incidents, and both of them could and did result in attackers launching attacks inside...
Palo Alto and solar winds were arguably worse, because they were cybersecurity incidents, not operational incidents, and both of them could and did result in attackers launching attacks inside private networks to get persistence, literally move through the network, and exfiltrate confidential data, the effects of which are still being felt. Having to reboot computers is more painful in the short term, but definitely less damaging.
Yep, and you can see this in the stock price. Did it go down? Yes. Is it at 0? Not even close. They’re actually still above where they were 6 months ago. They’ll lose customers for sure, but...
Yep, and you can see this in the stock price. Did it go down? Yes. Is it at 0? Not even close. They’re actually still above where they were 6 months ago.
They’ll lose customers for sure, but they’ll be around.
Looking at how drastically the share price dropped when markets opened but then climbed every single minute so far since then I reckon it will probably be at most a few days until the share price...
Looking at how drastically the share price dropped when markets opened but then climbed every single minute so far since then I reckon it will probably be at most a few days until the share price is back where it was, or even higher (because now they’ve had a bunch of free publicity worldwide)
What stock market data are you looking at? On Yahoo finance, it appears that it has stabilized today on the exact same price it was going for during after hours trading. It definitely hasn’t been...
What stock market data are you looking at? On Yahoo finance, it appears that it has stabilized today on the exact same price it was going for during after hours trading. It definitely hasn’t been climbing constantly all morning.
Yeah you’re absolutely right. I commented about half an hour after the market first opened, but the trend did not continue. It’s actually hilarious in retrospect just how perfectly poorly timed my...
Yeah you’re absolutely right. I commented about half an hour after the market first opened, but the trend did not continue. It’s actually hilarious in retrospect just how perfectly poorly timed my comment was. Here are some screenshots showing the point at which I made the comment, compared to how it went for the rest of the day.
Lawyer here. Damages from this incident are a big unknown and haven't finished playing out. Just casually reading reddit I have seen claims that hospitals and 911 systems were down and patients...
Lawyer here. Damages from this incident are a big unknown and haven't finished playing out. Just casually reading reddit I have seen claims that hospitals and 911 systems were down and patients died. Someone said that their city water purification system was down. Flights were cancelled and airlines are going to have to provide rescheduled flights at no additional cost. Global shipping and freight were impacted and resetting those schedules is not easy and not free.
Was Crowd strike negligent? What do the contracts say about risk? Crowd strike and investors don't yet know the company's costs for legal liability. They also don't know if customers will leave and what sales they will lose going forward.
The accelerationist deep inside me is lowkey smug about this. Maybe, just maybe, this will push for the adoption of more open source software in business critical applications.
The accelerationist deep inside me is lowkey smug about this. Maybe, just maybe, this will push for the adoption of more open source software in business critical applications.
Open source wouldn't really have fixed this. There are plenty of examples of massive issues caused by open source packages as well (heartbleed, for example). The main cause of this sort of thing...
Open source wouldn't really have fixed this. There are plenty of examples of massive issues caused by open source packages as well (heartbleed, for example).
The main cause of this sort of thing having such a wide impact is massive numbers of machines all running the same software. Unfortunately there's really no way around that. It's not like my company is going to write their own EDR platform.
It's just one of those things that have to be factored in when deploying IT systems. All critical functions need a continuity plan in case absolutely nothing works, because sometimes, nothing works.
I haven't used crowdstrike, but don't enterprises have the ability to create patch windows? I used a competitors product, and we could schedule patches to hit dev before prod, which would have...
I haven't used crowdstrike, but don't enterprises have the ability to create patch windows? I used a competitors product, and we could schedule patches to hit dev before prod, which would have caught this.
Yeah, every company I've been with built and deployed its own Windows patches. I believed this was the normal until this morning... Insane that CrowdStrike works by automatically and silently...
Yeah, every company I've been with built and deployed its own Windows patches. I believed this was the normal until this morning...
Insane that CrowdStrike works by automatically and silently updating all computers in an enterprise immediately. That's a huge red flag. There should always be a test batch.
It's kind of the norm for this type of security software. Most software patches in an enterprise are handled by a centralized patch management system that are tested on the hardware that...
It's kind of the norm for this type of security software. Most software patches in an enterprise are handled by a centralized patch management system that are tested on the hardware that enterprise uses before someone manually kicks off a deploy. EDR software usually automatically updates itself with updated data because cyber threats emerge very quickly, and a zero day can be used to exploit a lot of machines very very quickly. There have been countless attacks that could have been prevented by up to date EDR software, and up until now, there hasn't really been a widespread issue with that kind of software causing massive outages.
I imagine that risk analysis may yield a different outcome after taking this incident into account though.
Right, I supposed that there would have been a practice of testing it quickly on a batch of machines with a 1-3 hour window for approval before deploying it out wide. I think everyone has a story...
Right, I supposed that there would have been a practice of testing it quickly on a batch of machines with a 1-3 hour window for approval before deploying it out wide. I think everyone has a story where a single typo caused an issue with a release at some point, this kind of thing can happen so it's good to have a just-in-case test step. It seems like dangerous practice to me to simply launch to all machines at once without a precursory test, which we're seeing the result of here.
Yeah. I understand the need to quickly get updates out to security software. But... some environments require every change to be pre-authorized and tested. So it seems like a feature to group your...
Yeah. I understand the need to quickly get updates out to security software. But... some environments require every change to be pre-authorized and tested. So it seems like a feature to group your devices and force them to update in sequence with a delay between and the ability to halt updates would be a must have in an enterprise offering.
Move fast and break things is great when the things aren't people, airplanes, or stock markets.
Falcon is a threat sensor - it uses data bundles similar to an antivirus. The impression I've gotten from descriptions of the problem is that this is less update-update with new code and is more...
Falcon is a threat sensor - it uses data bundles similar to an antivirus. The impression I've gotten from descriptions of the problem is that this is less update-update with new code and is more equivalent to a bad antivirus threat definition package. These generally aren't something you want to delay. It does call in to question Falcon's parsing of the files if it can't handle an error without BSOD, though.
I get that it is similar to updating a fingerprint dictionary. But I've worked in environments where those had to be authorized as well. Anything involving a high CVE remote exploit was instantly...
I get that it is similar to updating a fingerprint dictionary. But I've worked in environments where those had to be authorized as well. Anything involving a high CVE remote exploit was instantly tested in dev and rolled to prod shortly. If necessary and not already in place, we'd put at F5 in front of customer facing services or other mitigations while we rolled the updates through the process.
Anything less critical would make it through in 1-3 days.
Yeah, the company I work at was crippled by this, we were instructed to turn off all machines until further notice. Thankfully an update seems to have resolved the issue.
Yeah, the company I work at was crippled by this, we were instructed to turn off all machines until further notice. Thankfully an update seems to have resolved the issue.
My laptop was affected but I have the added bonus of somehow having bitlocker enabled on my machine.... None of my accounts show this machine has bitlocker and I've never set it up, I don't really...
My laptop was affected but I have the added bonus of somehow having bitlocker enabled on my machine.... None of my accounts show this machine has bitlocker and I've never set it up, I don't really know what I'll do, I guess just wipe the drive and cry.
I think it was on by default on most W10 machines too. Maybe not on a retail install, but basically all OEMs have it enabled. Microsoft stores the recovery key in your Microsoft, which is one...
I think it was on by default on most W10 machines too. Maybe not on a retail install, but basically all OEMs have it enabled. Microsoft stores the recovery key in your Microsoft, which is one reason they push you away from a local account. As long as you used the Microsoft happy path you should be able to recover everything.
And if you aren’t using a Microsoft account: you have backups right?
Crowdstrike is an EDR platform. They deploy agents on all endpoints, so servers, workstations, kiosks, signage IoT devices that support it, which is why this thing is such a cluster. I drove past...
Crowdstrike is an EDR platform. They deploy agents on all endpoints, so servers, workstations, kiosks, signage IoT devices that support it, which is why this thing is such a cluster.
I drove past a billboard this morning that was BSODed
oh yeah I know that Crowdstrike is on tons of machines, but I forgot that people have Windows laptops for work that could be affected. All the other systems that failed are the sorts of things...
oh yeah I know that Crowdstrike is on tons of machines, but I forgot that people have Windows laptops for work that could be affected. All the other systems that failed are the sorts of things that I expected, I was just confused why someone would have it on their personal laptop.
Incidentally, my work uses Crowdstrike, though only on employee laptops, not on any servers. We just lucked out this time because we only use Macbooks and Linux.
My company has several thousand employees working remote and maybe a thousand within driving distance of the corporate headquarters. I was on the phone with many different IT agents yesterday...
My company has several thousand employees working remote and maybe a thousand within driving distance of the corporate headquarters. I was on the phone with many different IT agents yesterday trying to get my bitlocker key to no avail. I’m sure this is a common story for many companies right now.
The real kicker for me and my entire team is we’re contractors paid hourly. I don’t think any of us will be paid for the days we can’t work due to this outage (edit: we are being paid). I’m sure many hourly workers across the world are suffering the same fate. This won’t ruin me, but anyone living paycheck to paycheck that misses several days of pay because of this deserves some form of restitution.
I was under the impression that hourly and daily contractors remaining idle due to factors beyond their control are paid for days sitting idle??? How does that sound fair to me to be on stand by...
I was under the impression that hourly and daily contractors remaining idle due to factors beyond their control are paid for days sitting idle??? How does that sound fair to me to be on stand by not making money? Yikes 😬
If you can't work they just won't schedule you. It obviously depends on your contract, but that's kind of the nature of the beast with contracting work. You don't work, you don't get paid.
If you can't work they just won't schedule you. It obviously depends on your contract, but that's kind of the nature of the beast with contracting work. You don't work, you don't get paid.
You’re correct, I let my anxiety get the best of me without being certain. We have been compensated for yesterday and fortunately IT has been working around the clock to get laptops fixed via...
You’re correct, I let my anxiety get the best of me without being certain. We have been compensated for yesterday and fortunately IT has been working around the clock to get laptops fixed via phone calls.
Luckily, my org doesn't use Crowdstrike. Because I'm on vacation and outta town. Small org, so I'm the only guy who does any IT. Feel bad for the IT folks who are affected, though. Good luck everyone.
Luckily, my org doesn't use Crowdstrike. Because I'm on vacation and outta town. Small org, so I'm the only guy who does any IT.
Feel bad for the IT folks who are affected, though. Good luck everyone.
It's a doozy. I'm currently repairing Azure VMs for a client that fell over, and most of my company is bluescreening with 40 minute waits for the service desk.
It's a doozy. I'm currently repairing Azure VMs for a client that fell over, and most of my company is bluescreening with 40 minute waits for the service desk.
I'm feeling dumb because I don't get this. Or rather, I understand the basic idea- a piece of software used by TONS of businesses like major airlines got a bug/glitch, and is shutting everything...
I'm feeling dumb because I don't get this.
Or rather, I understand the basic idea- a piece of software used by TONS of businesses like major airlines got a bug/glitch, and is shutting everything down.
But... I guess the question is, who's actually using this software- it's called Cloudstrike and on Microsoft, right? That means Windows OS, correct? Be cause I have a different brand of laptop, but it still runs Windows - 11, I believe- and I haven't had any issues in the past 24 hours. So is it just the big businesses that are running the software? I'll be wary of any update requests, certainly... But as the Internet freaks out, I'm not seeing any of it (so far).
It's used by IT professionals to monitor/manage security of computers under their control, whether it be IT employees of companies/corporations or MSPs (Managed Service Providers) that manage the...
It's used by IT professionals to monitor/manage security of computers under their control, whether it be IT employees of companies/corporations or MSPs (Managed Service Providers) that manage the IT for other companies.
It's software that is meant for deployment at scale and managing security of devices at scale, so it's not like an anti-virus software you would download for your PC as an individual user like many people did back in the day with McAfee or Norton etc. This is why it has impacted so many computers, but not your individual computer because it's not something that a regular PC user would install.
CrowdStrike is software that businesses install on their computers (Windows, Mac, and Linux) to monitor and prevent malware (in short). If you don’t have it installed, you’re in the clear. It’s...
CrowdStrike is software that businesses install on their computers (Windows, Mac, and Linux) to monitor and prevent malware (in short). If you don’t have it installed, you’re in the clear. It’s not a Windows component—although only Windows computers are affected by this bad update.
What I find interesting is how this could have been done maliciously, IE not a hack as such, but a bad actor gaining access to a release and deliberately hacking it to bring systems down. Rather...
What I find interesting is how this could have been done maliciously, IE not a hack as such, but a bad actor gaining access to a release and deliberately hacking it to bring systems down. Rather than an accident.
Quick question to Linux experts, could this happen there or is this intrinsically linked to the lack of true separation on Windows?
Also could it happen to serverless applications on the cloud?
Certainly possible for a broken update to cause a similar issue on Linux servers, but I have a hard time seeing it could ever reach the scale of this. Linux servers are more diversified with many...
Certainly possible for a broken update to cause a similar issue on Linux servers, but I have a hard time seeing it could ever reach the scale of this. Linux servers are more diversified with many different distributions and versions. Updates are usually run with some level of oversight by administrators. The reason Cloudstrike could break so many systems so fast is because it can update itself automatically, so a single global update broke everything all at once. I am not aware of any Linux systems that would run kernel level updates on their own without any oversight. So even if a major Linux distribution released a broken update, it would likely be discovered pretty quickly before it was rolled out to thousands of servers worldwide at the same time.
I'm just going to drop an "I was here" comment for watching the dress rehearsal to a complete collapse of the Internet, wherever that may yet come from... (let me know if you want your name added XD)
I'm just going to drop an "I was here" comment for watching the dress rehearsal to a complete collapse of the Internet, wherever that may yet come from... (let me know if you want your name added XD)
Supposedly pre-market trading already has Crowdstrike down 21%
I've been up for the past 6 hours now after getting a beautiful 45 minutes of sleep. Been pulled into four major incident calls, written up some quick documents on fixes for various environments, and trying to walk people through how to fix our thousands of servers because I don't want to stay up any longer lol
Absolutely gobsmacked this made it through any level of QA since it is affecting all flavours of Windows, not just specific combinations. And didn't we learn from the big Meta outage (and literally every other Friday outage) that you don't push shit on a Friday??
At least I can probably take the rest of the day off...
Edit: Still coaching support teams on what to do. I did find that somebody has already started a wikipedia page for this :D - https://en.wikipedia.org/w/index.php?title=2024_CrowdStrike_incident&useskin=vector
[insert that's a big list meme here]
911 services of eleven (so far) different states, American, United and Delta flights grounded along with dozens of others across the world plus several major airports grounding all flights, US DOJ, DC and NYC mass transit systems, Oracle, Nokia, broadcasting stations, banks, railways, Singapore's stock exchange, Paris Olympics, entire supermarket chains, hospitals cancelling procedures, Mercedes, McLaren, Aston Martin, and Williams F1 teams, universities, law firms, pharmacies, casinos, train networks, petrol stations, stadiums, fire alarm systems...
emphasis mine on some of the more interesting/frightening examples
Here's a comment from Hacker News that made me realize how serious this is (via user jmcgough in this post)
The Associated Press coverage mentioned that surgeries are being postponed because anesthesia is off the table without equipment to manage and monitor it. I'm sure radiology equipment is also affected. Never mind charting...
Some lighthearted humour to counter all the countless hours being put in to fix this fuckduggery and the gravity of some of the outages:
https://www.skysports.com/f1/news/12040/13180880/global-it-outage-impacts-mercedes-f1-teams-pit-wall-screens-at-hungarian-gp
A certain irony that Crowdstrike is a sponsor for Mercedes F1...
I may just be a tad tired - going on 12 hours now, but the end miiiiight be in sight
Saw that and found it humorous.
Gotta have something to balance out some of the horrors that are happening. Hearing from some of my old healthcare worker friends that things like their medication systems are locked out, can't get patient charts to know what anyone should be getting anyway ...oh... and the in-room panic buttons and heart monitors that alert staff when a patient codes (flatlines) are all linked to the same system and don't work. So unless someone is literally within earshot of a patient's room and hears the monitor start to scream...
Was this something that was easy to catch? Or was this an advanced error that no one could have seen? I am a complete laymen on this topic. I’m just trying to gauge how preventable this was or if there was no chance of seeing it coming.
It’s so absurdly easy it makes it extra weird it got out.
The bare minimum catch here is “huh we deployed this on a windows machine and it nuked itself”.
The best explanation I can think of is one patch got tested and a different one deployed but that should also be trivial to detect and stop.
It’s why this whole thing is so mind blowing because it MUST involve serious policy, procedure, and culture issues for this to even occur
Yeah, a simple release verification would’ve caught this. There is someone alleging on Reddit that they have proof that crowdstrike developers has the ability to build and deploy from their laptops, if that’s true then it’s concerning for sure.
I hope it’s not because this just shows that if a single one of those laptops was compromised you could deploy kernel level malware to every crowdstrike machine
I don't think that's necessarily hard to believe? It seems possible a developer could override safeties to create and push a release candidate without oversight. Ideally there would be some alarms going off to get someone to review the rush-release. Getting access to the right engineer's laptop has always been a nice attack vector.
Got a call from the very top at 12:30 AM and have been in triage mode ever since. I'm less useful in this regard and mostly here as extra hands and testing, but jesus what a massive fuckup this is going to be.
Even with the "fix" being quick this is jacked on so many levels:
They're HYPER lucky this is an "easy" fix, but even still that was hours upon hours of outage for half the fucking globe. The cost from this is incalculable, and just shoved down every C level's throat what a fucking mess all this single point of failure cloud stuff can be.
This doesn't have anything to do with cloud stuff. The outage was caused by locally running code, if cloudstrike was entirely on prem you'd have the same issue.
This is an auto updating software with no possible way to test or have oversight of the update process issue.
No matter what, your EDR platform is going to be running the same software on all of your agents across the enterprise. Crowdstrike isn't alone in this, but not even having the option of controlling how they're updated is crazy.
If this software was designed in the pre-cloud era where the internet can be costly and not always on, then the silent update feature will not be a thing and there would not be an issue today.
Cloud was about externalizing server based infrastructure to dedicated companies. If anything, if you were entirely cloud based you would not have this issue.
The internet existed before “the cloud” was popular. If you look at the major impacts of these outages, they must be connected to the internet to work by definition.
This really has nothing to do with “the cloud” and if anything mostly impacts on prem setups and edge devices - so, the opposite of “the cloud”.
I will say if they had any sort of staggering deployment process (which any sizable companies big enough to shard their infrastructure should have) they could have drastically reduced the blast radius of this.
I'm the major incident manager for my corp and have been dealing with this since early this morning.
It's pretty bad.
The fix is easy but it's a local fix. Meaning that our stores need to have a manual fix applied done by the staff themselves. They're being guided by our techies but good lord..
Oh yeah. When you have that kind of problem spread out across... everyone (edit, everyone on Windows) running this software (which to be fair I had not heard of until today but apparently it is widely used) and every machine needs a human physically present to be able to fix? No remote fix?
Ooooohhhh.... ouch.
another edit: Well, I'll call this a win for anyone running their machines on a Hypervisor platform. At least they can run a fix remotely. Um... unless the Hypervisor they are using on bare metal is... Windows based??? OOOOHHHH, ouch again.
For servers you can reattach the disk to a new VM, take out the offending file, stick it back into the original VM and boot.
The fun part comes when everyone else is doing this and storage latency across the cloud providers have shot through the roof.
Unless your disk is bitlocker encrypted, and your keys are backued up to another machine that also had crowdstrike installed...
Manually entering a 48 digit recovery key is fun.
I didn't have to do that but my direct colleagues did. We gave them and a couple of onsite techs emergency access to the vault. That sucked but it was better than being helpless.
I hope you also gave them access to the emergency tequila cabinet as well. My sympathies to all such on-the-ground IT folks today.
The US folks got hit the hardest because they actually had to wake at 4. We were relatively fine waking up at normal hours and riding the coattails of Australia having done most of the investigative stuff.
Still not an easy day and I'm sure they've had (or are having) a stiff drink to start the weekend.
I am just commiserating the pain with you. On a Friday!
I'd run out of fingers and toes if I had to count the number of hospital systems I work with that use Crowdstrike. It's lightweight, easy to manage, and a reasonably cost-effective part of mitigation strategies for the ransomware plague.
A non-trivial number of surgeries, treatments, and patient visits are going to get postponed because of this. As /u/Eji1700 said, I'm baffled at how this update made it from DEV to TST, let alone PRD. The ugliest part is that since Crowdstrike Falcon is a vendor-managed client, the customers don't get to put its updates on a managed patching schedule with a test environment or spaced rollout. Let this be a lesson to us all.
Update via The Guardian:
That may be one of the most laughable uses of the passive voice I've ever seen.
Minor unimportant question: I was under the impression that “brick” meant something more permanent than what this seems to be.
"Soft" bricking didn't use to be the common usage, and I too think it's a bit hyperbolic.
Apparently the driver files were not correctly formatted
If that is true, then it is wild it could make it past any sort of QA.
Some random dev: "I'm gonna get chewed out if completing my PR is late again, so let's just do a little push to prod to get it checked off. What's the worst that could happen?"
More seriously - I have to wonder, given the description of the incident, if the faulty file is even human generated. It sounds like it's a content update bundle of some sort. Perhaps it's being generated by the code deployment system and the bug is actually in the file generation/validation/deployment? It's possible the hole is in the test process itself - that the components of the update were being properly individually tested, but that something with the bundling and deployment was missed in the procedures.
Whatever the actual details turn out to be, this is one of those case studies that will be used as an example in every IT and software dev class from now till the end of time.
There's a reply in that thread that the files being sent to the poster are all different, so that's a good theory.
I was in talks with CS not more than a month ago, and opted for Artic Wolf+ Egress to replace DarkTrace and ESET.
Bullet dodged in this instance.
It's showing the world just how fragile the interconnected "cloud" is though. One vendor can take down half the planet's first world services with a bad update.
Echoing everyone else, how this managed to get through testing cycles to global roll out beggars belief.
Luckily since my company only runs Linux and Mac, we've dodged this bullet ourselves. That said, I feel really bad for everyone who's directly affected by this. German news is reporting that some hospitals are cancelling all elective surgeries today because of this issue.
EDIT TO ADD:
My wife works as a data center technician at Microsoft, and weirdly enough she didn't even know this was happening until I told her. I guess she's lower-level than the stuff this affects.
Microsoft would likely dogfood their own software and run Defender on their infrastructure instead of Crowdstrike.
At least they'd still be up by doing so.
Microsoft has made big strides in making Windows a lot more stable than it was when I was young, but it's this kind of thing that still gives me the impression that people who use it for mission-critical applications are completely nuts.
The main reason I don't use it at home these days is because I feel that Windows Update is currently so intrusive that it has essentially turned it into a managed system, and it requires a degree of trust in Microsoft that I am very far from giving them.
I very much appreciate that Apple forbids kernel extensions these days, and locks system folders down with SIP. Security happens at a design level, not by bolting on third-party malware. Third party software should never, ever be allowed to render a machine unbootable or have OS-level privileges.
I have enough daily rage about corporate "security" software on my development machine, so I'm glad Apple keeps it locked inside userland, at least.
Apple can afford to forbid third-party kernel extensions because they control their hardware.
It's not a Windows problem, it's a CrowdStrike problem.
Arguably both. As others have pointed out Apple and Linux do not allow the kind of access that made this mess possible.
Linux absolutely does allow this kind of access. Writing a broken kernel module that consistently panics the kernel when loaded is a rite of passage for budding kernel devs. In this case, however, Linux provides a separate, safer interface (eBPF) which Crowdstrike uses, rather than a full-privilege kernel module.
(More generally, most Linux systems in the enterprise are not being operated by non-technical end users, so the need for heuristic security software like antivirus is a lot lower; John from sales isn't going to blindly open email attachments on the server.)
This is true. In fact similar thing happened to RedHat / Rocky 9.4 not too long ago, difference being it was RedHat's fault and was fixed with a kernel patch, and it was not as widespread because you had to upgrade to 9.4 pretty early to have run into it.
It doesn’t really have anything to do with Microsoft so I’m not surprised she wouldn’t know about it.
When it first broke here people were calling it a Microsoft bug because it only affected Windows machines. It took a hot sec for everyone to realize it was actually Crowdstrike at fault.
Also, Azure Central VMs went down for a few hours right before the Crowdstrike issue showed up. This is where some of the confusion stems from.
I work at a hotel and our systems have all been nuked since a half hour after I came in. This was especially difficult because we had to take in a bunch of guests who were stuck in my city because of the airline outage, and then they all get fucked by our outage as well.
This is insane! I've heard 8 airports are closed because of this so far. Some airlines already stopping operations. I feel sorry for the poor dev that introduced this bug.
Honestly the dev is going to eat shit but they're the last person who should.
The fact this EVER got into production is insane, let alone this massively.
Shit rolls down hill though. Dev gonna get reprimanded. CEO will get a bonus for getting out ahead of the issue and talking everyone down, as long as the shares recover.
This is enough shit i'm not sure the company makes it through as is. Granted the CEO probably still gets a golden parachute or some nonsense, but this is really quite special in it's cascading and immediate effects.
The company will 100% make it through this. Crowdstrike is probably the market leader in EDR and very well respected in that arena. If this was a consistent pattern I might agree with you, but there are absolutely zero widely deployed software companies that have been around for any period of time that haven't had some sort of major outage or incident. Solar winds is still around, Palo Alto is still around, and crowdstrike will still be around.
Palo Alto and solarwinds didn’t take down half the globe in a way that will still be affecting companies for the next week, if not month.
There are a fuck ton of systems that now need to be physically rebooted and oops that outsourced IT team won’t be doing that.
They might be fine but this isn’t in the same category in the slightest
Palo Alto and solar winds were arguably worse, because they were cybersecurity incidents, not operational incidents, and both of them could and did result in attackers launching attacks inside private networks to get persistence, literally move through the network, and exfiltrate confidential data, the effects of which are still being felt. Having to reboot computers is more painful in the short term, but definitely less damaging.
Absolutely. It's likely a handful of people probably get fired over this, but the company itself will 100% be just fine.
Yep, and you can see this in the stock price. Did it go down? Yes. Is it at 0? Not even close. They’re actually still above where they were 6 months ago.
They’ll lose customers for sure, but they’ll be around.
Looking at how drastically the share price dropped when markets opened but then climbed every single minute so far since then I reckon it will probably be at most a few days until the share price is back where it was, or even higher (because now they’ve had a bunch of free publicity worldwide)
What stock market data are you looking at? On Yahoo finance, it appears that it has stabilized today on the exact same price it was going for during after hours trading. It definitely hasn’t been climbing constantly all morning.
Yeah you’re absolutely right. I commented about half an hour after the market first opened, but the trend did not continue. It’s actually hilarious in retrospect just how perfectly poorly timed my comment was. Here are some screenshots showing the point at which I made the comment, compared to how it went for the rest of the day.
Lawyer here. Damages from this incident are a big unknown and haven't finished playing out. Just casually reading reddit I have seen claims that hospitals and 911 systems were down and patients died. Someone said that their city water purification system was down. Flights were cancelled and airlines are going to have to provide rescheduled flights at no additional cost. Global shipping and freight were impacted and resetting those schedules is not easy and not free.
Was Crowd strike negligent? What do the contracts say about risk? Crowd strike and investors don't yet know the company's costs for legal liability. They also don't know if customers will leave and what sales they will lose going forward.
The accelerationist deep inside me is lowkey smug about this. Maybe, just maybe, this will push for the adoption of more open source software in business critical applications.
Open source wouldn't really have fixed this. There are plenty of examples of massive issues caused by open source packages as well (heartbleed, for example).
The main cause of this sort of thing having such a wide impact is massive numbers of machines all running the same software. Unfortunately there's really no way around that. It's not like my company is going to write their own EDR platform.
It's just one of those things that have to be factored in when deploying IT systems. All critical functions need a continuity plan in case absolutely nothing works, because sometimes, nothing works.
I haven't used crowdstrike, but don't enterprises have the ability to create patch windows? I used a competitors product, and we could schedule patches to hit dev before prod, which would have caught this.
Yeah, every company I've been with built and deployed its own Windows patches. I believed this was the normal until this morning...
Insane that CrowdStrike works by automatically and silently updating all computers in an enterprise immediately. That's a huge red flag. There should always be a test batch.
It's kind of the norm for this type of security software. Most software patches in an enterprise are handled by a centralized patch management system that are tested on the hardware that enterprise uses before someone manually kicks off a deploy. EDR software usually automatically updates itself with updated data because cyber threats emerge very quickly, and a zero day can be used to exploit a lot of machines very very quickly. There have been countless attacks that could have been prevented by up to date EDR software, and up until now, there hasn't really been a widespread issue with that kind of software causing massive outages.
I imagine that risk analysis may yield a different outcome after taking this incident into account though.
Right, I supposed that there would have been a practice of testing it quickly on a batch of machines with a 1-3 hour window for approval before deploying it out wide. I think everyone has a story where a single typo caused an issue with a release at some point, this kind of thing can happen so it's good to have a just-in-case test step. It seems like dangerous practice to me to simply launch to all machines at once without a precursory test, which we're seeing the result of here.
Yeah. I understand the need to quickly get updates out to security software. But... some environments require every change to be pre-authorized and tested. So it seems like a feature to group your devices and force them to update in sequence with a delay between and the ability to halt updates would be a must have in an enterprise offering.
Move fast and break things is great when the things aren't people, airplanes, or stock markets.
Falcon is a threat sensor - it uses data bundles similar to an antivirus. The impression I've gotten from descriptions of the problem is that this is less update-update with new code and is more equivalent to a bad antivirus threat definition package. These generally aren't something you want to delay. It does call in to question Falcon's parsing of the files if it can't handle an error without BSOD, though.
I get that it is similar to updating a fingerprint dictionary. But I've worked in environments where those had to be authorized as well. Anything involving a high CVE remote exploit was instantly tested in dev and rolled to prod shortly. If necessary and not already in place, we'd put at F5 in front of customer facing services or other mitigations while we rolled the updates through the process.
Anything less critical would make it through in 1-3 days.
Crowdstrike doesn't allow for that, no.
That seems like a feature with a story to market. 🙂
Sure as shit they will in the very near future.
Yeah, the company I work at was crippled by this, we were instructed to turn off all machines until further notice. Thankfully an update seems to have resolved the issue.
My laptop was affected but I have the added bonus of somehow having bitlocker enabled on my machine.... None of my accounts show this machine has bitlocker and I've never set it up, I don't really know what I'll do, I guess just wipe the drive and cry.
oof... wtf it's on by default in Windows 11 and they don't tell you?!
I think it was on by default on most W10 machines too. Maybe not on a retail install, but basically all OEMs have it enabled. Microsoft stores the recovery key in your Microsoft, which is one reason they push you away from a local account. As long as you used the Microsoft happy path you should be able to recover everything.
And if you aren’t using a Microsoft account: you have backups right?
Apparently so... I'm not happy - I'm going to try to get around it with the link from Lapbunny but I think I'm screwed
Wait, are there people who have CrowdStrike on their laptops? I didn't know they did anything at the individual user level like that.
I believe my entire company has Crowdstrike on every single Windows laptop.
Ohhh for like work laptop. I completely forgot about that usecase and assumed they meant their own personal computer.
Crowdstrike is an EDR platform. They deploy agents on all endpoints, so servers, workstations, kiosks, signage IoT devices that support it, which is why this thing is such a cluster.
I drove past a billboard this morning that was BSODed
oh yeah I know that Crowdstrike is on tons of machines, but I forgot that people have Windows laptops for work that could be affected. All the other systems that failed are the sorts of things that I expected, I was just confused why someone would have it on their personal laptop.
Incidentally, my work uses Crowdstrike, though only on employee laptops, not on any servers. We just lucked out this time because we only use Macbooks and Linux.
Our entire company has crowdstrike (falcon sensor) on our workstations
You can't get around it via Windows RE?
I'll definitely try it! thanks
My company has several thousand employees working remote and maybe a thousand within driving distance of the corporate headquarters. I was on the phone with many different IT agents yesterday trying to get my bitlocker key to no avail. I’m sure this is a common story for many companies right now.
The real kicker for me and my entire team is we’re contractors paid hourly.
I don’t think any of us will be paid for the days we can’t work due to this outage(edit: we are being paid). I’m sure many hourly workers across the world are suffering the same fate. This won’t ruin me, but anyone living paycheck to paycheck that misses several days of pay because of this deserves some form of restitution.I was under the impression that hourly and daily contractors remaining idle due to factors beyond their control are paid for days sitting idle??? How does that sound fair to me to be on stand by not making money? Yikes 😬
If you can't work they just won't schedule you. It obviously depends on your contract, but that's kind of the nature of the beast with contracting work. You don't work, you don't get paid.
You’re correct, I let my anxiety get the best of me without being certain. We have been compensated for yesterday and fortunately IT has been working around the clock to get laptops fixed via phone calls.
Luckily, my org doesn't use Crowdstrike. Because I'm on vacation and outta town. Small org, so I'm the only guy who does any IT.
Feel bad for the IT folks who are affected, though. Good luck everyone.
The Crowdstroke bug lmao
I've seen it referred to as Generalstrike.
So glad my company doesn't have any Windows boxes...
It's a doozy. I'm currently repairing Azure VMs for a client that fell over, and most of my company is bluescreening with 40 minute waits for the service desk.
I'm feeling dumb because I don't get this.
Or rather, I understand the basic idea- a piece of software used by TONS of businesses like major airlines got a bug/glitch, and is shutting everything down.
But... I guess the question is, who's actually using this software- it's called Cloudstrike and on Microsoft, right? That means Windows OS, correct? Be cause I have a different brand of laptop, but it still runs Windows - 11, I believe- and I haven't had any issues in the past 24 hours. So is it just the big businesses that are running the software? I'll be wary of any update requests, certainly... But as the Internet freaks out, I'm not seeing any of it (so far).
It's used by IT professionals to monitor/manage security of computers under their control, whether it be IT employees of companies/corporations or MSPs (Managed Service Providers) that manage the IT for other companies.
It's software that is meant for deployment at scale and managing security of devices at scale, so it's not like an anti-virus software you would download for your PC as an individual user like many people did back in the day with McAfee or Norton etc. This is why it has impacted so many computers, but not your individual computer because it's not something that a regular PC user would install.
CrowdStrike is software that businesses install on their computers (Windows, Mac, and Linux) to monitor and prevent malware (in short). If you don’t have it installed, you’re in the clear. It’s not a Windows component—although only Windows computers are affected by this bad update.
What I find interesting is how this could have been done maliciously, IE not a hack as such, but a bad actor gaining access to a release and deliberately hacking it to bring systems down. Rather than an accident.
Quick question to Linux experts, could this happen there or is this intrinsically linked to the lack of true separation on Windows?
Also could it happen to serverless applications on the cloud?
See the discussion here: https://tildes.net/~tech/1hp7/crowdstrike_code_update_bricking_windows_machines_around_the_world#comment-d8ed
Thank you. Seems more like a kernel issue could break Linux like this, other than 3rd parties?
Certainly possible for a broken update to cause a similar issue on Linux servers, but I have a hard time seeing it could ever reach the scale of this. Linux servers are more diversified with many different distributions and versions. Updates are usually run with some level of oversight by administrators. The reason Cloudstrike could break so many systems so fast is because it can update itself automatically, so a single global update broke everything all at once. I am not aware of any Linux systems that would run kernel level updates on their own without any oversight. So even if a major Linux distribution released a broken update, it would likely be discovered pretty quickly before it was rolled out to thousands of servers worldwide at the same time.
I'm just going to drop an "I was here" comment for watching the dress rehearsal to a complete collapse of the Internet, wherever that may yet come from... (let me know if you want your name added XD)