One Big Lesson From the Cyber Range to Help Solve Confirmation Bias

Author

Christopher Crummey

Executive X-Force Command - IBM Security

When any cybersecurity event takes over, you’ve got a dynamic and evolving problem and need to act fast. Time is money, and you can’t afford to go down the wrong path in your incident response because the added delays when you have to backtrack and start over could be extremely costly. This is the kind of scenario we simulate globally across all of our IBM X-Force Command Cyber Range experiences, where our trained staff puts customers through intense, gamified challenges to practice what you need to do in a real attack.

I know from running thousands of challenges in our cyber range that the biggest obstacles to a well-executed cybersecurity incident that has been escalated to a crisis response are not technical, but emotional, physical, psychological and cognitive challenges. The strongest challenge is confirmation bias, when you tend to focus on evidence that supports a hypothesis and ignore the evidence that doesn’t. We’re human, after all, and we like to take shortcuts that sometimes lead to mistakes.

Confirmation bias is especially problematic when you have limited information and time. During the boom event in a crisis, when you’re feeling the pressure to act fast, it’s understandable that response team would look for the information that supports its original hypothesis. What makes the situation worse is the physical reaction, the fight-or-flight instinct that kicks in during stressful situations, which is a major impediment to making good decisions. When customers have a flush of cortisol in the back of their brain, this stress hormone impairs cognitive functioning for up to 20 minutes. Incident responders can’t wait that long to make good decisions.

In the cyber range, and in our IBM X-Force Incident Response and Intelligence Services (IRIS) team, we practice a method called dual verification to help overcome confirmation bias and get to ground truth faster. Dual verification means one team tries to prove a single hypothesis and another team tries to disprove the same hypothesis. By establishing distinct people or teams to rapidly investigate multiple theories about a problem, we avoid herding as a group down the wrong track. It works, as we saw recently when we suddenly started detecting a destructive variant of ransomware among some of our customers.

Dual verification solves a destructive ransomware mystery

Recently, IBM X-Force IRIS was called to investigate a series of ransomware infections among several customers. The destructive ransomware was showing up in an alarming pattern: We saw it in a handful of customers one day, then a few more customers the next and a few more the next. The ransomware appeared to be spreading, but the only thing the affected customers seemed to have in common was that they were IBM customers.

We had several theories about the problem, each one of which would require a drastically different approach to solve. Our theories were:

The ransomware was propagating between clients.
IBM was somehow spreading the ransomware to its clients.
It was a false positive.

We stood up a team for dual verification. Smaller teams were given the task to either confirm or deny each of the three theories. So, one team set out to prove that the ransomware was propagating between clients, another was charged with disproving that theory, and so on. Later that day, we reconvened to go through each team’s findings.

For the propagating ransomware theory, the teams found nothing to support it. The second theory: Was it us? No government agency or industry partner had seen similar infections spreading in this way. All of the affected companies were clients of IBM, so it was conceivable that we were somehow infecting them, but how?

Then something amazing happened. One of the most junior people on the team asked, “What about the rule change?”

This caused a lot of puzzled looks. What rule change? Well, it turned out that IBM had been rolling out a new rule to customers for detecting a particular strain of destructive ransomware. We determined that we weren’t infecting customers with ransomware. The rule change was improving our detection rates, so these were not new infections but rather existing infections that we were getting better at detecting. As the rule rolled out on a rolling basis, it looked like an infection was spreading when, in reality, we were just getting better at detecting a ransomware variant that had been hiding within our clients’ infrastructure.

This was a remarkable discovery. With the dual verification method, IRIS teams parsed a complex problem, finding the root cause within 12 hours while avoiding confirmation bias that could have led us on a wild goose chase thinking this was a new variant of ransomware.

Industry newsletter

The latest tech news, backed by expert insights

Stay up to date on the most important—and intriguing—industry trends on AI, automation, data and beyond with the Think newsletter. See the IBM Privacy Statement.

Lessons from the Cyber Range: The human problem

In the cyber range, as we work with clients to help them simulate and perfect their incident response plans, we see examples all the time of human mistakes rendering even the most careful incident response plans moot.

In one of our gamified scenarios, the challenge is to respond to a breach in which a cybercriminal is holding your data for ransom. Usually, the teams in the range immediately start deliberating on whether or not to pay the ransom, skipping over an important step: Considering the possibility that the data was from an old breach. In other words, they forgot to confirm whether the data was legitimate before going down the path to make a decision.

This is an example of the human side of cybersecurity, where stress hormones, biases and lapses in judgment have enormous consequences. Some people overreact and immediately resort to shutting everything down. Others respond the opposite way, avoiding the problem rather than confronting it aggressively. I once witnessed a participant in the range, during a simulated scenario in which the phone keeps ringing with department heads asking for help and journalists demanding comments, simply take the phone off the hook and put it down on his desk. Customers have a duty to act, a duty to engage during a crisis, which is why testing and training your security and crisis culture is key to success.

We see again and again in the range that clients with first response, law enforcement or military experience perform best under pressure. They train consistently for crisis and understand the need for leadership, communication and common language. Plus, they have the requisite emotional intelligence — which is often more important than technical skills — to respond to threats effectively under pressure. They have the training to ride that adrenaline rush and clear their heads before acting.

There’s an acronym for response under pressure made popular by the U.S. Navy: stop, oxygenate and seek (SOS). Essentially, that means slow down, take a deep breath and seek more information before you take action. It’s hard to remember to do that sometimes, and it takes training and practice. But it works.

Mixture of Experts | 30 May, episode 57

Decoding AI: Weekly News Roundup

Join our world-class panel of engineers, researchers, product leaders and more as they cut through the AI noise to bring you the latest in AI news and insights.

Watch the latest podcast episodes

One big lesson from the cyber range to help solve confirmation bias

Author

Dual verification solves a destructive ransomware mystery

The latest tech news, backed by expert insights

Thank you! You are subscribed.

Lessons from the Cyber Range: The human problem

Decoding AI: Weekly News Roundup