Dave explains the Crowdstrike IT outage, focusing in on its role as a kernel mode driver. For my book on the spectrum, see: https://amzn.to/3XLJ8kYGet the s...
This likely going to be text book case of how to not a run a company in a dominant market position that caused world wide system failures.
Makes you wonder if we should be allowing such consolidtion in critical industries. This ain’t even about economics anymore. More of a infrastructure and national security decision.
Or fucking supervivise and train people properly… I don’t know. Sounds like management problems.
As someone that works in QA, yeah, they needed something to catch this. I saw someone mention somewhere without a source that they missed it as all test machines have their full suite of software installed. In that scenario, the computer wasn’t affected. So for QA it seems their labs might need to be more in tune with the user base.
However, the fact that they are able to push this so quickly worldwide seems like a big process issue. I get 0 day issues and that is how they justify it. But deploy to a small subset of customers before going global seems more reasonable.
I heard somewhere that the updated ignored staging settings set. So even if companies had it set to only roll out to a subset of their computers it went everywhere
Oof. Then that seems more on the ops side of things. Interesting. I can’t wait for them to never share what happened so we can all continue to speculate. 😂
That answered a lot of questions.
I hope they publicly state how they pushed a bad file, but I doubt it.
Seems like someone really didn’t pay attention to what they were doing, and they might have an internal problem with QA.
Their QA worked better than intended, they had tests fail worldwide and tons of results to work off of hahaha
This likely going to be text book case of how to not a run a company in a dominant market position that caused world wide system failures.
Makes you wonder if we should be allowing such consolidtion in critical industries. This ain’t even about economics anymore. More of a infrastructure and national security decision.
Or fucking supervivise and train people properly… I don’t know. Sounds like management problems.
As someone that works in QA, yeah, they needed something to catch this. I saw someone mention somewhere without a source that they missed it as all test machines have their full suite of software installed. In that scenario, the computer wasn’t affected. So for QA it seems their labs might need to be more in tune with the user base.
However, the fact that they are able to push this so quickly worldwide seems like a big process issue. I get 0 day issues and that is how they justify it. But deploy to a small subset of customers before going global seems more reasonable.
I heard somewhere that the updated ignored staging settings set. So even if companies had it set to only roll out to a subset of their computers it went everywhere
Oof. Then that seems more on the ops side of things. Interesting. I can’t wait for them to never share what happened so we can all continue to speculate. 😂
I read somewhere (commentes in that video) that CS ignored their own customer-configured stagger upgrades for some upgrades…
Apparently those settings are only for updates to the software itself, not for updates to the definition files.
They don’t have a lack of quality assurance. They have a lack-of-quality assurance.