This is an unpopular opinion, and I get why – people crave a scapegoat. CrowdStrike undeniably pushed a faulty update demanding a low-level fix (booting into recovery). However, this incident lays bare the fragility of corporate IT, particularly for companies entrusted with vast amounts of sensitive personal information.

Robust disaster recovery plans, including automated processes to remotely reboot and remediate thousands of machines, aren’t revolutionary. They’re basic hygiene, especially when considering the potential consequences of a breach. Yet, this incident highlights a systemic failure across many organizations. While CrowdStrike erred, the real culprit is a culture of shortcuts and misplaced priorities within corporate IT.

Too often, companies throw millions at vendor contracts, lured by flashy promises and neglecting the due diligence necessary to ensure those solutions truly fit their needs. This is exacerbated by a corporate culture where CEOs, vice presidents, and managers are often more easily swayed by vendor kickbacks, gifts, and lavish trips than by investing in innovative ideas with measurable outcomes.

This misguided approach not only results in bloated IT budgets but also leaves companies vulnerable to precisely the kind of disruptions caused by the CrowdStrike incident. When decision-makers prioritize personal gain over the long-term health and security of their IT infrastructure, it’s ultimately the customers and their data that suffer.

  • LrdThndr@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    edit-2
    5 months ago

    How would it not have? You got an office or field offices?

    “Bring your computer by and plug it in over there.” And flag it for reimage. Yeah. It’s gonna be slow, since you have 200 of the damn things running at once, but you really want to go and manually touch every computer in your org?

    The damn thing’s even boot looping, so you don’t even have to reboot it.

    I’m sure the user saved all their data in one drive like they were supposed to, right?

    I get it, it’s not a 100% fix rate. And it’s a bit of a callous answer to their data. And I don’t even know if the project is still being maintained.

    But the post I replied to was lamenting the lack of an option to remotely fix unbootable machines. This was an option to remotely fix nonbootable machines. No need to be a jerk about it.

    But to actually answer your question and be transparent, I’ve been doing Linux devops for 10 years now. I haven’t touched a windows server since the days of the gymbros. I DID say it’s been a decade.

    • Brkdncr@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      5 months ago

      Because your imaging environment would also be down. And you’re still touching each machine and bringing users into the office.

      Or your imaging process over the wan takes 3 hours since it’s dynamically installing apps and updates and not a static “gold” image. Imaging is then even slower because your source disk is only ssd and imaging slows down once you get 10+ going at once.

      I’m being rude because I see a lot of armchair sysadmins that don’t seem to understand the scale of the crowdstike outage, what crowdstrike even is beyond antivirus, and the workflow needed to recover from it.

      • John Richard@lemmy.worldOP
        link
        fedilink
        English
        arrow-up
        0
        arrow-down
        1
        ·
        5 months ago

        Imaging environment down? If a sysadmin can’t figure out how to boot a machine into recovery to remove the bad update file then they have bigger problems. The fix in this instance wasn’t even re-imaging machines. It was merely removing a file. Ideal DR scenario would have a recovery image already on the system that can be booted into remotely, so there is minimal strain on the network. Furthermore, we don’t live in dial-up age anymore.