Email or username:

Password:

Forgot your password?
Top-level
Aphrodite ☑️ :boost_ok:

@calamari

Checklists are only as useful as the knowledge necessary to know why the checklist exists.

Pilots and surgeons train for extensive periods so they learn why they need to go through their checklists.

What happens far too often is checklists turning into ritual disconnected from the rationale.

Religion often has this problem. Many of the rituals of religion have roots in Something Deep From Back In The Day, but that link, with time, has since worn away.

24 comments
Tim Hergert

@Aphrodite @calamari we've progressed from "cargo cult" to "checklist cult"

At least with the former, we got to build cool bamboo models of planes and control towers.

Aphrodite ☑️ :boost_ok:

@cjust @calamari

tbh the Adeptus Mechanicus of 40K make too much sense in that framing

they don’t know why tech works, they just know to do the rituals and they can make a thing

Tim Hergert

@Aphrodite @calamari I spent far too long one weekend looking into nuclear semiotics and have decided that the best thing that we could do for future generations is genetically engineer a cat species to glow in the presence of radiation rather than try to instill a nuclear priesthood.

I think that this same logic should be applied to software QA as well. I'm certain that we can bioengineer a cat to glow in the presence of a faulty AV update. Then we can change the checklist item to "□ IT department properly equipped with glowy cats"

@Aphrodite @calamari I spent far too long one weekend looking into nuclear semiotics and have decided that the best thing that we could do for future generations is genetically engineer a cat species to glow in the presence of radiation rather than try to instill a nuclear priesthood.

I think that this same logic should be applied to software QA as well. I'm certain that we can bioengineer a cat to glow in the presence of a faulty AV update. Then we can change the checklist item to "□ IT department...

TomDB 🦣

@Aphrodite @cjust @calamari the last paragraph seems to apply to modern day youth using AI to do homework as well.

Kayfox

@Aphrodite @cjust @calamari I was about to bring up the techpriests - following the Catechisms of Compliance but often not understanding why.

"our PCI scan shows this software is vulnerable" Yes because RHEL security backporting existed and you're only checking the version number and not if the vuln is actually there.

Sebastiaan Dammann

@cjust @Aphrodite @calamari That's because the reviewing the checklists can then be - no offence intended - offloaded to cheap workers in 2nd and 3rd world countries who are judged by the checklists they sign off. There is no room for critical thinking or adapting to the particular situation. I see this happening daily.

Twirrim

@Aphrodite @calamari
I'll have to see if I can find the study, but a study was done at a US hospital to compare reported checklist completion with actual checklist completion.

They found out of many dozens of entries in the list that it was rare for any of them to actually be done, including the one to double check the patient's name prior to anaesthetic, to make sure they're about to operate on the right person!

Dweebish

@Twirrim @Aphrodite @calamari As someone who's had a lot of surgeries & procedures involving anesthetic in the past 5 years, I'm quite thankful that my identity has been checked and re-checked every time.

Andrew

@Aphrodite @calamari an even more important check box is change management. How can you have effective change management when updates are applied automatically? If compliance frameworks require automatic updates, then they're broken, and given what has just happened, I really hope they'll be fixed.

Sure, have EDR etc, but the updates need to be validated, then rolled out by the organisations.

Sadly, as the world just discovered, there is no silver bullet when it comes to security.

Jess👾

A lot of vendors make it intentionally difficult to even do manual validation and deployments these days. Windows, Chrome, Edge, Adobe, etc. all really want to auto update. One of the odd parts of this particular outage is that CrowdStrike updates are SUPPOSED to go out in stages where your test machines are on update N, staging machines are on update N-1, and prod machines are on N-2. So somehow they not only made a bad update, but they also violated their own release cadence by pushing it out to all machines no matter what version they're scheduled to be on.

@puck
@Aphrodite @calamari

A lot of vendors make it intentionally difficult to even do manual validation and deployments these days. Windows, Chrome, Edge, Adobe, etc. all really want to auto update. One of the odd parts of this particular outage is that CrowdStrike updates are SUPPOSED to go out in stages where your test machines are on update N, staging machines are on update N-1, and prod machines are on N-2. So somehow they not only made a bad update, but they also violated their own release cadence by pushing it out to...

Andrew

@JessTheUnstill @Aphrodite @calamari Agree and understand the vendors want auto update. They need to be told where to stick that idea. My experience is a lot of that is due to the updates on MS Windows being hard to manage.

Interesting to hear about the CrowdStrike release cadence. I've never used it.

In my world, we manage what is released to servers and when.

Jess👾

You can do similar with Windows. They have "update rings". It lets you keep your systems on auto-update so IT doesn't have to manually faf with it, but you can have canaries before prod borks.

learn.microsoft.com/en-us/mem/
@puck
@Aphrodite @calamari

Andrew

@JessTheUnstill @Aphrodite @calamari Excellent!

However, that is for the software (incl drivers) that Microsoft supply. What about all the other random software you need to install?

Jess👾

I can't remember about Office or Adobe or Chrome whether they have things like that. It's been a few years since I worked at a Windows corp and interacted with endpoint engineering.
@puck
@Aphrodite @calamari

Andrew

@JessTheUnstill @Aphrodite @calamari Fair enough, I don't work in one either (thankfully). Interesting discussion though! Thank you.

Michael Potts (HMHackMaster)

@puck
Zscaler was very confused when I told them we would not be using their auto-update infra (even though theirs did allow for rings and stuff). We have an org-wide phased update process and we just included the zscaler client.

My management didn't like the idea as they were happy to transfer responsibility to zscaler as then it wouldn't be their fault if it broke.

I won though...

@JessTheUnstill @Aphrodite @calamari

Michael Potts (HMHackMaster)

@puck
I think some orgs value that "vendor is responsible, so it's not my fault" too much. Sure, the manager's head isn't gonna roll over this incident but productivity died and that's gonna upset a ton of people in the org.

@JessTheUnstill @Aphrodite @calamari

Andrew

@hmhackmaster Excellent to hear about your success, and that you've been vindicated (yeah, different tool, but same context)!

And agreed, many orgs will try to transfer responsibility. Will be interesting to see how well that goes.

Michael Potts (HMHackMaster)

@puck I care more about uptime and reliability than the blame game. But I am also the kind of person who has a reputation for making reasonable decisions and assuming responsibility when this things go wrong.

If taking responsibility (and not dodging accountability) costs me my job then that's clearly a sign the org has lost confidence in me and it was time to move on anyways.

Hasn't happened to me yet though, and I have made some pretty big mistakes!

Grant Gould

@hmhackmaster @puck @JessTheUnstill @Aphrodite @calamari
Much like "if you haven't done your restore procedure, you don't have a backup procedure," if you haven't actually invoiced or sued a vendor for screwing up, you haven't actually transferred liability to your vendor.
Vendor accountability is 99% imaginary.

Michael Potts (HMHackMaster)

@nonnihil I think you are completely right from a business point of view, but from some upper-management person's viewpoint the "it's the vendors responsibility" is the path to ensure their decision can't come back to bite them.
Whatever VP or CISO who approved Crowdstrike for an org isn't gonna lose their job over this.

@puck @JessTheUnstill @Aphrodite @calamari

Jess👾

And honestly, there's no way that you CAN'T put some level of trust in your suppliers. Whether it's AWS or Google Workspace or Windows or Microsoft365 or any of your anti-malware vendors or anything else, if they have a major outage, it's going to cripple your business for a while. They'll build terms into the contract about stability and reliability, but at the end of the day, if one of your critical suppliers fucks up, it's going to take you down. You pick the least bad of the options and pray.

@hmhackmaster
@nonnihil @puck @Aphrodite @calamari

And honestly, there's no way that you CAN'T put some level of trust in your suppliers. Whether it's AWS or Google Workspace or Windows or Microsoft365 or any of your anti-malware vendors or anything else, if they have a major outage, it's going to cripple your business for a while. They'll build terms into the contract about stability and reliability, but at the end of the day, if one of your critical suppliers fucks up, it's going to take you down. You pick the least bad of the options and pray.

Karl Baron

@Aphrodite @calamari When a company gets hacked and sued, they have to answer to "were you negligent in protecting against this or were you just unlucky?". Courts are incompetent in determining this, and companies are mostly actually negligent (because they don't want to pay for it), so we get these "best practices" checklists instead.

How do you legislate competence? Most companies can't even determine if the people they hire are competent!

Go Up