The four horsemen of risk management

And how you should use them.

Jul 29, 2022

People don’t like to think about risk.

It makes them nervous. Unfortunately, though, risk is a part of life. Whether or not we acknowledge it, we make thousands of risk decisions every day. Doing this based on heuristic - “gut feel” when a human alone is doing the analysis - is generally necessary to just make it through the day without being paralyzed by inaction.

For example, people generally avoid having to consider explicitly such unfortunate but nonetheless necessary tradeoffs such as the fact that you might die in a car crash on the way to work, but if you don’t go to work, you won’t be able to support your family. This situation requires regularly accepting a low likelihood, high severity event (car crash) in order to mitigate the risk of a high likelihood, moderate-and-increasing severity event (losing your job if you don’t go to work as required).

Acceptance of the former risk can be perfectly rational, and is something every employed person does whose job requires commuting by car. Different situations, however, can get more gray. Is a very dangerous job like mining or drilling for oil worth extra pay? It might be, but determining this would require a detailed analysis of costs and benefits.

Unfortunately, in many cases, people don’t even do a heuristic analysis and will accept major risks due to inertia (“this is what I’ve always done) or other illogical justifications. Furthermore, in many cases, a lack of conscious analysis facilitates people taking risks that are obviously preposterous. Many people text and drive, don’t wear their seatbelts, and do other things that, in any rational accounting, don’t make any sense at all.

All of the above holds true for managing cyber risk, in my experience.

One would think that, given the resources, sophistication, and money to be made (or lost), both private sector companies and the government would have very finely tuned risk calculators to determine the optimal outcome when it comes to cybersecurity decisions.

Unfortunately, they rarely do.

For example, as I wrote about in the Cyber Safety Review Board’s analysis of the log4shell incident, the mere fact that risk decisions occur regularly in government and industry with respect to the use of open source code appeared to be new information to the board.

Furthermore, I have also found it uncommon - even in the security profession - for technology and business leaders to even speak in terms of risk management when making cybersecurity decisions. This lack of a framework often makes discussions about what actions to take challenging. Stakeholders often aren’t even able to communicate their respective positions, let alone agree on one.

With that said, I thought it made sense to do a rundown of the basic risk management techniques. There are fundamentally four options when dealing with risk: avoid, accept, transfer, and mitigate. No one has ever convinced me that there are any other options, and thus I will write as if they mutually exclusive and completely exhaustive (MECE).

All of this is conceptually inspired by the Project Management Institute as well as a post that Rob Black wrote on the topic a while ago, but I have sought to add my (hopefully) unique perspective.

1. Avoid

This is (anecdotally) perhaps the least-selected course of action in the security world, although I’m not sure why, considering that it’s possibly the “cleanest.”

Avoiding risk means just that: exiting a line of business, shutting down a product line, or not renewing a contract in order to eliminate risk inherent to the relationship or item. Companies, unfortunately, are loath to do things like this, as it often means forgoing revenue.

The reality is, though, that not all revenue is equally valuable.

In a pure business sense, revenue earned from an account outside of your ideal customer profile is probably diverting you from your desired product roadmap and consuming greater-than-average engineering and support resources.

If you apply this logic with a cybersecurity lens, you will almost certainly identify that some sources of revenue are “riskier” than others. This can be because of product’s architecture or age, the customer use case, the relevant regulatory compliance requirements, or some combination thereof.

In some situations, where the expected revenue is modest and the cybersecurity risk is high, it’s possible that the given product or line of business has net negative value.

This would be a good time to hit the risk avoidance button.

It will almost certainly be a hard conversation with the financially-minded folks, but if you quantify your recommendation in terms of dollars, it will be substantially easier. Just remember, though, that it should be a business decision in the end. If the relevant decision-maker disagrees with your assessment regarding avoidance, that person should sign on the dotted line accepting the risk or provide the resources to manage the risk in another way.

2. Accept

This means moving ahead despite the possibility of something bad happening. For example, it can be the right choice when:

Controls are more expensive than the associated risk. This is the most common situation where risk acceptance makes sense. Suppose mitigating a certain vulnerability would require a massive re-architecting effort equivalent to $100,000 in engineering costs. If, by your calculations, the Annualized Loss Expectancy (ALE) for exploitation of this vulnerability is only $1,000, then this investment would take 100 years to pay off (actually more, given the time value of money). Thus, this might not be the best investment to make. Obviously things get more ambiguous as the relative values change, but you need to run the numbers to make an informed decision.
Value delivered justifies it. Sometimes there is no way to mitigate the risk, but you should still press ahead nonetheless. Security professionals often squirm when business leaders talk about revenue at stake as a justification for cybersecurity risk acceptance, but frankly, if it is a dollars-to-dollars comparison, then the only question is which path generates more value for the organization and its customers. When lives hang in the balance, such as in medical, industrial, or military applications, things get grayer. With that said, devices like insulin pumps, remotely-controlled production lines, and avionics systems themselves can save lives. Taking them offline or refusing to deploy them due to cybersecurity risk can itself cause harm, so consider that carefully in your calculations.
Business value would be irrelevant if the event occurs. While my favorite example is that of a nuclear attack - there is no point in an enterprise SaaS protecting against it, because no one would be buying enterprise SaaS afterward - there are more nuanced examples. For example, a software vulnerability allowing an attacker to completely interrupt your service (i.e. a 100% impact to availability) but that requires physical access to your data center (or that of your IaaS provider) isn’t really something you attempt to mitigate directly. If an attacker has physical access to your infrastructure and wants to interrupt your service, they probably won’t try to exploit a software vulnerability - they’ll just smash your servers instead. So focus on physical security and accept the cybersecurity risk in this case.

Acceptance also includes ignoring the risk and allowing the status quo to persist. Although you haven’t affirmatively done it, you implicitly accept the risk through inaction. All other things being equal, this is probably the worst outcome. Without any recording of the decision made or information available at the time, it’s very easy to be influenced by hindsight bias. This can lead to excessively harsh punishments against whoever the scapegoat is, or, conversely, an inability to hold accountable someone who knew or should have known about a big problem.

Refer a friend

3. Transfer

In the security world, people generally think of risk transfer in terms of purchasing cyber insurance (which is a good example), but it can also take the form of contractual terms between a supplier and vendor (e.g. an availability Service Level Agreement [SLA]). This option is generally is preferable if you are able to transfer the costs to someone for a cheaper price than if you had to mitigate the risk yourself. Make sure you also include reputational costs (e.g. angry customers following an outage), rather than just the purely financial ones if choosing this path. Some other ways you can transfer risk are:

To a vendor, contractually. Most cybersecurity professionals are familiar with SLAs as they relate to service uptime (i.e. data availability), but what about the other components of the CIA triad (i.e. confidentiality and integrity)? As I have suggested previously, a vendor could be on the hook for paying you a set rate for every record that is exposed in a breach and/or corrupted by an attacker (or otherwise). Such an arrangement would align incentives between the parties to ensure that they are both focusing on minimizing overall cybersecurity risk, rather than indexing on other, less relevant metrics (e.g. maintaining compliance framework certifications, remediating CVEs of a given CVSS score in a given time, etc.).
To a customer, contractually. While this might seem like a tough sell, oftentimes, it’s not clear where the vendor’s responsibility ends and where the consumer organization or individual user’s begins. That’s why shared security models are an important part of a security program; they explain the various stakeholder duties and responsibilities. For example, if the product in question relies on an external directory service (e.g. Azure Active Directory) for authentication, then it’s on the customer security team to ensure that passwords are appropriately strong and unique - it’s not something the original product’s vendor can enforce. While this might seem like an obvious example, there are more nuanced ones, like who is responsible for updating third-party software on top of which the product runs, but with which it is not packaged? Things get gray quickly, so it’s important to spell these out in writing.
To the government (or general public), through lobbying. I would be remiss if I did not mention this option, as distasteful as it may seem. I have written at length about how the U.S. federal government is very unclear with respect to allowable risk tolerances for private companies and appears to be driven primarily by the latest headline (the SolarWinds software supply chain attack, the log4shell vulnerability, etc.). Due to the lack of clarity in the world of cybersecurity regulation and legislation, there exists a strong incentive for companies to pursue carveouts that specifically advantage them by indemnifying certain actions. While I’ll avoid naming names, it’s generally the bigger organizations that have the necessary government affairs teams and budgets to do this. Importantly, this isn’t necessarily even a bad thing. Statutes like the Cybersecurity Act of 2015 allowed companies to share information about cyber threats with the government without fear of certain types of civil litigation. By transferring the data privacy risks of such information sharing to the general public, the public benefitted from the ability of the government to collect and disseminate additional types of cyber threat data.

Bottom line: risk transfer is more than just buying insurance; there are a lot of creative ways to do it.

4. Mitigate

This is generally how security professionals earn most of their salary, but it shouldn’t be the only way they do so. This includes applying controls such as updating a vulnerable library, deploying a new firewall rule to block certain traffic, or creating an internal policy or procedure. Make sure that you are clear in your terminology here. I have heard people refer to “mitigating vulnerabilities” when in fact what they did was just confirm they weren’t vulnerable in the first place. Mitigation is only an applicable course of action when there is risk to begin with. Some recommendations I would make are:

All other things being equal, use the cheapest control. Most CVEs are not exploitable in a given configuration or deployment; these low risk issues can remain unpatched if doing so is more than trivially costly (with some caveats). Figuring out which ones are a problem, though, can take a lot of effort. It might make sense to simply upgrade a library to make the potential problem go away without conducting further investigation, in some cases. If this is the easiest path forward, just do it. Conversely, if a library upgrade, or worse, rip-and-replace, will consume a lot of effort, it probably makes sense to confirm the issue is actually a risk before doing anything else.
Look at the entire risk picture when deciding to apply controls. For example, there might be a small chance of a malicious actor exploiting a known software vulnerability in your product, but it could be blocked easily and reliably through a network policy or similar tool. If this is the case, it probably doesn’t make sense to re-architect your entire product or technology stack to remediate this flaw.
Focus on the alligator closest to the boat/biggest rock. In contrast to the above scenario, there might be a really big problem with serious consequences if an issue is left unaddressed. Think log4shell or heartbleed. In this case, it probably makes sense to drop everything - including working on lower-risk vulnerabilities - to “stop the bleeding.”

Conclusion

Although I said I wouldn’t rehash on the four risk management techniques, I think it does make sense to refresh on these fundamental tools. I have seen many a conversation go wildly off course because the parties weren’t able to communicate in these terms. By clearly identifying proposed options as one of these four techniques, I have found cybersecurity risk conversations to be much more focused and action-oriented.

Gary

Oct 21, 2023

Nicely put! I'd add that acceptance is the default treatment for risks that are not both explicitly identified, and effectively treated by other means. There is a possibility of errors and omissions in the risk identification, analysis and treatment activities, hence risk management is itself a risky activity. For example, we typically assume that various controls do what they are supposed to do, and seldom even consider the possibility that they might fail gradually or spectacularly in practice. Consider all forms of cryptography, for instance: when was the last time anyone actually checked a cryptosystem deeply embedded in a bit of software or hardware is, in fact, using truly random keys of the right length, or the relevant number of rounds or whatever? Even the typical assurance measures used to detect and report issues with controls, are themselves fallible controls: issues sometimes get missed, mis-reported, mis-understood and mis-treated, or more often ineptly and reluctantly addressed (nobody relishes acting on adverse audit findings!). Overall, it is challenging to maintain a sense of perspective and priorities, and unwise to assume that we've got everything right ... which means embracing business continuity, recovery, resilience and contingency approaches as well and acknowledging that even they may fail to work out entirely as expected. It's risky all the way down!

Expand full comment

1 reply by Walter Haydock

1 more comment...

Deploy Securely

Discussion about this post