What the Challenger Tragedy Can Teach CISOs and SOC Teams by

“Big security failures happen because management is greedy and tend to underinvest in security… or engineers are lazy, stupid and/or overworked, so they cut corners.” Right?

Maybe sometimes. But it’s much more likely that well-meaning CISOs and super-smart security engineers fall prey to their own assumptions in a very human sociological trap called the “normalization of deviance“.

Take Challenger.

The standard view of the Challenger accident holds that callous NASA brass overruled by-the-numbers engineers and launched because management was afraid another delay would hurt NASA’s image. Sociologist Diane Vaughan spent nine years researching the question and determined just the opposite. In her book, The Challenger Launch Decision, Vaughan found that NASA managers rigorously followed their own best-practices.

“Managers were, in fact, quite moral and rule-abiding as they calculated risk,” Vaughn writes. And yet they made the wrong decision. What happened, she said, was that managers (and engineers) had systematically deluded themselves over a period of years through a process she calls the “normalization of deviance.”

Here’s how N.O.D. works: Early in the shuttle program, the appearance of small leaks from the booster’s seals was an alarming event. NASA assigned a working group that determined the leaks would be manageable as long as they didn’t exceed a certain threshold. This is important: they moved their metric, redefining an old failure as a new acceptable standard… so that those failures became the standard. Small leaks were soon seen as “routine” during launches. The problem had been normalized. But as shuttle missions continued, the leaks kept getting bigger. Each time, NASA repeated the process, again determining that the seal failures were acceptable as long as they didn’t exceed certain, ever higher, thresholds. The fact that the shuttles kept flying and not exploding reinforced a false sense of security.

Organizations that suffer security disasters usually deserve to suffer the consequences. But if we think that we can prevent future massive security failures simply by pointing fingers at the people who caused the last one and classifying ourselves as fundamentally smarter or less susceptible to the same kinds of interpersonal and psychological forces, we are fooling ourselves. Humans are hard-wired to find patterns, even when none exist, and to try and make assumptions about the scary bear that just went behind those bushes. Preventing big security failures requires a special kind of daily vigilance in attacking assumptions with hard data that needs to be actively taught and constantly reinforced. Only by admitting that, yes, it could happen to us too – and instrumenting security so that we can actually see whats going on — can we take the steps to make security failures quantitatively (and progressively) less likely.

*** This is a Security Bloggers Network syndicated blog from Verodin Blog authored by Verodin Blog. Read the original post at:

Cloud Workload Resilience PulseMeter

Step 1 of 8

How do you define cloud resiliency for cloud workloads? (Select 3)(Required)