ps blaming "human error" for catastrophic systems failures is like blaming gravity for buildings falling over
ps blaming "human error" for catastrophic systems failures is like blaming gravity for buildings falling over 16 comments
@ben the other, perhaps more salient point here, is that when you take the approach of going "ahahah! what if!" consistently in risk management you inevitably set up an atagonistic relationship with the people you're trying to help it's alarm fatigue and everything you say sounds like "what if the sun collapses!" there's always a point in which you throw your hands up and declare an act of god or you end up with the ye old example of a reinforced door bolted to walls made of plasterboard @ben It also involves frequently asking “Are the people doing the reliability engineering work getting enough sleep on the regular?” the point you're missing here is that "we build processes that incorporate human error as a given, or they collapse, much like how we build literal buildings knowing full well we are subject to the laws of gravity" your point of "ah, but isn't it always human error!" is missing how human error is used to excuse systematic failures of risk management—the load bearing word here is "blaming" your processes should account for "an intern pressed the wrong key because their manager threatened to fire them" that's why we have things like building codes, building inspections, certified professionals involved, rather than simply being "someone forgot a brick lol, never mind @michaelgemar @tef @mawhrin The lens of human error stalls investigations and prevents learning. @michaelgemar @tef there's a reason the process looks like this and it's not because people don't know the risks of such deployments. safe software engineering practices are not cost-effective. (until today happens.) |
@tef Reliability engineering involves a lot of repeatedly asking the question "and what if THAT fails?"