It’s Not Just the Numbers, It’s the Insights that Count in Quantitative Cyber Risk Analysis

July 30, 2020  Taylor Maze

The purpose of quantitative risk analysis is to provide a rigorous, defensible way to analyze cyber risk in your environment in a way that is clear, concise, and easy to communicate to decision makers. Going from a strictly qualitative way of looking at risk to the world of FAIR™  and the RiskLens Platform, it can be easy to get caught up in the numbers – the excitement of having an economic value that can be leveraged when making resource prioritization decisions, communicating to the business, or even dealing with regulators. Recently, though, I was reminded there is much more to FAIR analysis than the numbers and that sometimes the journey is more important than the destination.

Story of a FAIR Analysis 

An organization had previously suffered what they deemed to be a critical event: an outage to a subset of their network and the platforms/processes it supported as a result of lack of adherence to policy. With the recent incident on their minds, they wanted to see what the same caliber of event would mean to their larger, more critical network segment.

The team did a thorough analysis of the specific business functions, processes, and applications that would be impacted in the event of such an outage. Multiple teams across the organization were brought in to discuss how the event would impact the productivity of their teams. The dollar losses got high in the analysis and it became obvious this would be a slam dunk to showcase why something needed to change in the environment to avoid such a fallout.

Taylor Maze is a RiskLens Risk Consultant.

But then as more people in the organization got involved, something became clear: a stark lack of clarity.

There was disagreement amongst and across teams about which systems and processes were supported by the segment and even the architecture of the environment was called into question. By the end of the week the team learned that the architecture of the network segment was designed in such a way that the failure that occurred previously was nearly impossible in this segment. What was far more likely to occur in the face of a similar lack of adherence to policy was a minor degradation to all supporting platforms and processes, due to an overload of the fail-over system. If the overload persisted for long enough, it could potentially result in a partial outage of the network segment.

With the new information in hand, the team ran two separate analyses on the RiskLens Platform. The first was to show the total amount of loss expected if there were to be a full failure of the network segment. This analysis shed light on which specific business functions, processes, and applications were hosted on that segment and whether or not they were replicated elsewhere. What were previously considered “High Value Assets” were called into question and a method to determine what truly warranted replication was discussed.

A second analysis was also conducted to model what the expected degradation network processes and applications would mean to the organization. It was determined that it was unlikely the failure would persist long enough to result in a full outage of any of the supported operations. However, if such an outage did occur, it was possible it would be of key support operations such as WIFI and certificates due to the high volume of capacity they require. Due to the high criticality of these functions, a slight chance was enough to warrant an in-depth review of what could be done to prevent this from occurring.

At the end of the day, the Annualized Loss Exposure (ALE), the output from running a FAIR analysis, was not the focus of the results presentation. There was no decisive moment in which the champion proudly proclaimed that their control improvement idea would save the organization $XX. There were, however, three important takeaways for the team:

Lessons Learned

1. FAIR is first of all a way to apply critical thinking to your organization’s processes

The FAIR model enabled the team to think critically about the scenario they were interested in analyzing. Without going through the exercise of understanding how often such a failure to adhere to policy occurred (Threat Event Frequency), how likely that failure was to result in an outage or disruption to network supported operations (Vulnerability), or what it would mean to the organization when it did (Loss Magnitude) the team would still be in the dark related to the current risk in their environment.

2. Get the right people in the room

One of the main reasons that the “a-ha” moment did not occur until the end of the week was the team did not have the right people in the room. A common objection I hear is that the more people are in the room, the less likely they will feel comfortable speaking up. While that may be true, less people in the room means it is less likely that assumptions will be challenged, and new points of view will be aired. Had we had another representative from the engineering team in the room at the beginning of the week, we likely would have learned of the incorrect assumption earlier. This brings me to my last takeaway.

3. Challenge, challenge, challenge till you get to the heart of the matter

Or to put it more pointedly, embrace the suck. The job of a FAIR risk analyst is to ask the right questions of the right people, challenge assumptions, and get to the heart of the matter. This sometimes involves being the “bad guy” and disagreeing with participants or playing devil’s advocate. While that may not always feel the greatest and nobody relishes telling Tod from Finance that his data point isn’t relevant – it needs to be done in order to have an analysis with the most accurate information available at that time.

Had the representative from engineering not called out my diagram for being wrong in the middle of a session – we never would have discovered the true information. As it happened, the actual architecture of the network segment is far superior to what was originally assumed. But the team learned that it is also the minority in terms of standard configuration. There is now a review taking place to evaluate if the same architecture design should be implemented elsewhere. All due to my new friend in engineering embracing the suck.

At the end of the engagement there was an honest reflection into the current lack of insight and transparency and what that could mean if left unchecked. A decision was made to reevaluate key processes and to thoroughly vet all assumptions going forward. Though not focused on dollars and cents – the conversation was clear, concise, and easy to communicate to decision makers: Something had to change.