Quantitative cyber risk analytics using FAIR is an inherently mathematical endeavor. Estimates for the factors of risk (like loss event frequency and loss magnitude) are expressed using probability distributions and are then used to create forecasts of loss exposure from one or more scenarios via Monte Carlo simulation.
These facts lead some would-be FAIR practitioners to cringe and apprehensively ask “just how much math do I need to know to do this stuff?”
For many people the mere mention of mathematics causes “panic, helplessness, paralysis, and mental disorganization,” to quote an early researcher of what is known in academia as “math anxiety.” At the root of the theory of math anxiety is the idea that dealing with mathematical processes and principles can cause an emotional response that prevents a person from performing math-related tasks. Beyond just an emotional response, neuroscientists have shown that an attack of math anxiety activates the same area of the brain where physical pain is registered. (That explains the pounding headache I got every time my alarm woke me up for college Calculus, a class I dropped four weeks into the semester.)
But fear not! In this two-part article I’ll be discussing the level of mathematical knowledge needed to perform each phase of the risk analysis process. As you progress from one paragraph to the next you should feel your shoulders loosen, your jaw unclench, and your brow unfurrow as the math anxiety melts away.
The first phase of the analysis process is Scoping. In this phase we clearly define and communicate the scenario we’re analyzing. A properly scoped risk scenario statement requires an asset, a threat against that asset, and an effect the threat seeks to have on the asset. (You can also identify a method or vector to make your scenario more specific, though this isn’t a requirement.) For instance:
“Analyze the risk associated with cybercriminals (the threat) impacting the availability (the effect) of electronic patient health records (the asset) via ransomware attack (the method/vector).”
Identifying an asset, threat, effect, and method/vector and combining them into a coherent scenario statement that describes a loss event with clarity and specificity doesn’t require any mathematical knowledge at all. So far so good!
Data Collection (Math Required: Minimal)
The second phase of the analysis process involves collecting data and obtaining estimates for the variables of the FAIR model you wish to use in your analysis. Continuing with our example ransomware scenario, we may need to obtain an estimate for threat event frequency — an answer to the question “over the next year how many times will cybercriminals attempt to impact the availability of electronic patient health records via ransomware?”
The estimates used in FAIR have four main parameters (a fancy statistics term that you don’t actually need to use — you could call them elements, parts, pieces, etc., though those last two options may call to mind the terror of learning about fractions in third grade…) none of which require advanced math knowledge to understand and apply.
In order to make this estimate you will need to gather data from internal sources, external sources like industry reports, and subject matter experts inside or outside your organization. While you *could* do all kinds of fancy statistics stuff with that data, remember that what we’re trying to estimate is just a count of how many times cybercriminals will attempt to impact the availability of the ePHI via ransomware. View the data through that lens and you’ll find that the amount of math knowledge required is minimal. Indeed, reading comprehension, active listening skills, and emotional intelligence will prove more necessary as you review reports and interact with SMEs and other stakeholders during this data gathering phase.
Confidence Level (Math Required: Basic)
We don’t want our estimate to reflect the theoretical minimum number of ransomware attempts (0) and the theoretical maximum number of attempts (infinity), however, so we do need to introduce a basic statistical concept: the 90% confidence interval.
If we were to make our range 0 to infinity we would be 100% confident that, a year from now, the actual number of ransomware attacks we experienced would fall within our range. But a range that wide doesn’t help anyone make well-informed decisions, so we need to narrow it down based on the information currently available to us. We still want the range to be accurate, but it needs to have a useful level of precision. We need to identify the range in which we are 90% confident.
Stated another way, let’s say we could simulate the next year and run 100 simulations. Our estimate of the minimum number of ransomware attacks needs to be the number that we think will be smaller than the actual number of ransomware attacks in 95 out of 100 of those simulations. Likewise, our estimate of the maximum number of ransomware attacks needs to be the number that we think will be larger than the actual number of ransomware attacks in 95 out of 100 of those simulations. With 5 simulations falling below the minimum and 5 simulations above the maximum, we are left with 90 out of 100 of the simulations within our range, meaning we have identified our 90% confidence interval.
After reviewing all of the data we could get our hands on and meeting with multiple subject matter experts, we’ve reached a consensus 90% confidence interval estimate of the number of ransomware attacks by cybercriminals seeking to impact the availability of ePHI:
Minimum: 2 times over the next year
Maximum: 30 times over the next year
Most likely: 6 times over the next year
Confidence: Medium, as three different data sources corroborate the conclusion that 6 (or a value close to 6) is considerably more likely than most of the other values within the range from 2 to 20
Setting the confidence in our most likely value at Medium does some fancy mathematical stuff to the probability distribution we’ve created, but I’ll translate that into practical terms for you. The confidence parameter controls the degree to which the distribution spikes at the most likely value we’ve selected. As we increase the height of the distribution at our most likely value, we increase the proportion of random values selected in our Monte Carlo simulations that will be close to the most likely value we’ve estimated.
We’re placing more emphasis on the estimated most likely value because we trust it more. Even if the true number of ransomware attacks over the next year may not turn out to be 6, we expect that the true value will be close to 6 to a far greater extent than we expect it to be 20 or 25. Changing the confidence parameter to medium (or high) is like telling the model “sure, 25 ransomware attempts is still within my 90% confidence interval, but based on all the data and expert opinion I’ve gathered it’s far more likely that the number of attempts will be around 6, so I want you to weight the distribution so that a number of attempts around 6 comes up in way (or way way) more simulations than a number of attempts like 25 or 30.”
If you are or want to be a math nerd, you should know that the kind of four-part estimates we use in FAIR create a modified PERT distribution (as opposed to a normal distribution, logarithmic distribution, etc.) but knowing what we call the special kind of curve we make for each estimate in the model is by no means required to be a competent and successful FAIR analyst.
Once we’ve made our estimates they get fed into a Monte Carlo simulation engine to compute thousands or tens of thousands of simulated years. In Part 2 of this article we’ll discuss just how simple Monte Carlo simulation is, as well as the minimal math skills needed to interpret and present the results of FAIR-based analyses.