Email or username:

Password:

Forgot your password?
nixCraft 🐧

The CrowdStrike IT outage is a good reminder that if you don't have a disaster recovery (DR) plan in place, there will be consequences. There will be many meetings and discussions about the need for DR, but by the end of the year, it will likely be forgotten amidst the usual job cuts, new priorities, and questions about IT budgets. This cycle will continue until another IT outage strikes. I speak the truth and nothing else. If I'm wrong, correct me below. #sysadmin #IT

35 comments
mvyrmnd :PUA:

@nixCraft disaster recovery absolutely, but in an incident like this you also need it's best friend, a Business Continuity Plan. What does the business do while you enact the DRP?

Extinction Studies

@mvyrmnd @nixCraft That's a Bizness Cruelty Plan. The workers will be laid off until security improves.

Bot4Sale

@nixCraft Also, don't use products that tell you the damage they're going to do.

BarkerJr

@nixCraft@mastodon.social We also need more virtualization. I can't think of any reason to run Windows Server on bare metal anymore. With a VM, a couple clicks and you roll back the #CrowdStrike update and you're up again

chris@strafpla.net

@bls @nixCraft “How many Oracle licenses do we need to buy for that?”

DougMerritt (log😅 = 💧log😄)

@nixCraft
You're right, and disasters are not always like Crowdstrike.

I was once at one of the better known Fortune 500 IT companies who had been prioritizing new features over bug fixes for many years, and as a result their reliability had deteriorated and was starting to impact their reputation and sales.

So far, nothing surprising, but the really surprising outcome was that someone managed to convince top management of the nature of the long term mistake, and they entered a 1.5 year cycle of doing literally nothing but bug fixes -- no new features handled at *all* -- which caused much weeping and gnashing of teeth and much general complaining. But they stuck with it.

And these days they're back to having a top tier reputation. As Fortune 500 goes; YMMV.

The point being that product bugs sometimes should be considered in the category of "outage" even when no single bug is severity 1.

@nixCraft
You're right, and disasters are not always like Crowdstrike.

I was once at one of the better known Fortune 500 IT companies who had been prioritizing new features over bug fixes for many years, and as a result their reliability had deteriorated and was starting to impact their reputation and sales.

Paul_IPv6

@nixCraft

<applause>

yes. DR plans really can save your ass. and much like backup files, testing your DR is a really good idea, just like testing restores regularly.

Scribe

@nixCraft@mastodon.social considering how often internal IT stuff with my employer will be broken for months if not years, I'm pretty sure the disaster recovery plan is just "fuck it we ball"

nixCraft 🐧

As CTO of a major fintech firm ACME Corp, this is my disaster recovery plan. Oh, you expected me to hire more staff and build an actual plan? That's adorable. Perhaps you'd also like a unicorn to handle our cybersecurity? 😉

MyYeeHaa

@nixCraft Amusingly, it's misspelled. Possibly indicating how much effort went into it?

nixCraft 🐧

@MyYeeHaa haha. sharp observation. it is good that some people still get humour

Gavin Jones

@nixCraft @MyYeeHaa looks like misspelled twice and corrected once. And hey, prompt engineering is hard work. /s

Davide

@nixCraft a lot of companies, not only small ones, shortcut most of the control steps. Quality and testing are used to sell but not to secure and improve the software. IMO, AI will not change that. Unfortunately, it is not only about software...

Sean

@nixCraft

“By failing to prepare, you are preparing to fail.”

While the closest he got to electronics was running about in thunderstorms, Ben Franklin’s advice is salient. Preparation is the difference between an emergency and a disaster

Kofi Loves Efia

@nixCraft not just a plan, but a physical thing you can find in the world. Your plan will do you no good if it is also BSOD'd/Encrypted. Also it has to be tested. Oh you plan to recover your 10 TB of data to a 2TB server across town over a 10mb back plane? Or recover from tapes you've overwritten twice a year for the last 10 years? Seen it seen it seen it

John Meadows

@nixCraft The eyes of CEO's glaze over when DR is mentioned, because it won't have a short term impact on their stock options. #IT

Glen Turner (VK5TU)

@Jgmeadows @nixCraft That is nothing legislation can't solve. At the moment CIOs of critical services testify to Parliament upon a disaster aiming to keep their reputation. In the future they will be testifying before the Federal Court aiming to keep out of jail.

Nothing focuses the c-suite mind like years of jail. Just look at OHAS. Big shrug from building companies. Then NSW started sending their company directors to jail. There is still a lot wrong with the construction industry, but safety has improved a hundred-fold.

@Jgmeadows @nixCraft That is nothing legislation can't solve. At the moment CIOs of critical services testify to Parliament upon a disaster aiming to keep their reputation. In the future they will be testifying before the Federal Court aiming to keep out of jail.

Nothing focuses the c-suite mind like years of jail. Just look at OHAS. Big shrug from building companies. Then NSW started sending their company directors to jail. There is still a lot wrong with the construction industry, but safety has...

JohnW

@nixCraft

On a much smaller scale, I'm in the process of prepping to upgrade a VPS. The script provided has a "dry-run" implementation that let's you see the errors that need fixed before the final upgrade takes place.

Tech infrastructure really needs a method like this as well.

Zorro Notorious MEB 😡

@nixCraft A lot of organizations have disaster recovery plans but don't test them, have disaster recovery drills, or otherwise reinforce the need for practice and vigilance. They just write the plans and present them and go back to business as usual.

ocdtrekkie

@nixCraft It's complicated because this is... not a scenario you'd generally plan for. There's a good chance you wanted to protect your disaster recovery plan and put CrowdStrike on your backup infrastructure!

Bryce Belcher

@nixCraft You also need a secondary service to use, and not just relying on one security service. If one service goes down, then if you had another security service, the best case scenario is it wouldn't be as impactful like it has been.

The Keymaker

@nixCraft Managers are reactive, not proactive. If you want them to get excited about a disaster plan you have to burn down the building across the street. This is why this was going to happen no matter what -- humans suck.

Kevin Karhan :verified:

@TheKeymaker @nixCraft well, the problem are decisionmakers that are at best #TechIlliterates if not absolutely egoistic morons!

plasticine_era

@nixCraft Preach. ✊

(Never waste a good global outage to fight the "DR Cassandra" problem).

Rivetgeek

@nixCraft Our HA and DR allowed us to turn things around in a matter of hours. And our team already is planning on improving our DR in response to some discoveries we made during recovery.

T313C0mun1s7

@nixCraft here is where you are wrong, not in concept but this specific situation. These were Windows problems, I have yet to see a disaster recovery plan that includes having a second set of all Windows devices running everything on an alternative OS. Where choices were made for OS, when they were actually made, was for a reason. In the remainder of the cases the OS was just what shipped on the hardware, and those people are even less likely to make a plan that involves using some other OS.

kiwilinux

@nixCraft there will be some dead reckoning here. This is not just one person but a whole team. who though this was a good thing? It will be a blood bath? Did they employ people on merit, those with the skills? this type of screw up is unforgivable. But of course a certain tech journalist who's name has been banned on Mastodon. He says even Linux and mac aren't immune to this. And we have had a close call with the xz hack.

gunstick

@nixCraft I only know about the band Disaster Area.
What kind of music does Disaster Recovery play?

((( Geekosaurus )))

@nixCraft Disaster recovery? That's for the flood that wipes out your DC.

Software issues are not the subject of DR plans, because the best DR plans include replicating software updates to the recovery site!

Go Up