The Artificial Intelligence Risk Scoring System (AIRSS) - Part 3

Doing the math for AI risk.

Oct 13, 2023

Welcome back to the Artificial Intelligence Risk Scoring System (AIRSS) model breakdown! Thanks for joining me on this journey.

In case you missed the previous posts, please see:

In this edition, we’ll get our hands dirty with some math, showing you how to calculate AI model risk for a given scenario. The next step in our process it to:

Understand business impact

Once you have defined what a model should and should not do, you can quantify what the impact will be if it “misbehaves.” A key step in this process will be understanding what a worst case response would be from a monetary perspective for each of the potential negative outcomes.

Consider a highly simplified situation where a credit monitoring company launches an app meant to allow customers to ask questions about their credit report using the fine-tuned version of GPT-3.5 described in part 2. It does this through retrieval-augmented generation (RAG) whereby the fine-tuned Large Language Model (LLM) has access to all customer credit reports and is responsible for determining what data to provide to which users.

I do not recommend this type of architecture. A key security feature of any LLM is controlling access to sensitive data using some other method than the model itself, e.g. a rules-based business logic layer. In this example, there should be some authentication check which ensures the LLM only has access to the currently-authenticated customer’s data before conducting any RAG to answer the customer’s questions.

But designing a secure AI system is not the point of this exercise.

And there are essentially infinite potentially variations of other business requirements and tradeoffs that will prevent implementing a perfectly secure system. The point of this exercise is to help quantify AI model risk.

And it just so happens that the credit monitoring situation has a wealth of relevant available data, from the 2017 Equifax Breach. So for this example we’ll assume that:

The total breach costs for Equifax were $1.35 billion
147 million people were impacted.
Although a smaller number also had their credit card numbers and dispute information stolen, everyone impacted had the following stolen (which for simplicity I’ll just call an “identity record”):
- Names
- Home addresses
- Phone numbers
- Dates of birth
- Social security numbers
- Driver’s license numbers

With this data we can determine the financial impact of the model violating its security policy one time in one specific manner (providing an identity record to any unauthorized party).1 This comes out to roughly $9.2 per breached record, which represents your single loss expectancy (SLE).

To get slightly more advanced, you might consider lower and upper bounds of SLE. This is what Hubbard and Seiersen describe as the “Rapid Risk Audit” method in the 2nd edition of “How to Measure Anything in Cybersecurity Risk“ (p. 47), which weights the lower bound of the SLE at 65% and the upper bound at 35%.

Find or create a standard battery of inputs

Next, you will need to come up with a test suite of inputs for the AI model. For an LLM, this can be a series of prompt injection attempts. For a computer vision model it could be intentionally deceptive images.

And so forth.

You can use an off-the-shelf battery of inputs like PromptBench, garak, or PyRIT for LLMs. Or you can develop your own for the model and use case in question. Either approach can work but it will be important to note how you tested your model and only compare models against similar test suites. For LLMs that have any sort of memory capability, you’ll likely want to intersperse 1-shot and multiple-shot prompt injection attempts to simulate real world conditions.

Get in touch to learn how we can implement AIRSS for you:

Book a call

Something else to think about is how aggressively you want to test the model. While a worst case approach would assume that every input is malicious, this doesn’t seem realistic. If you have some historical data, you can look at how many malicious inputs the model has historically received. If you don’t, you can estimate this by using a benchmark of how many malicious authentication attempts your site/product gets, etc.

Using our credit report LLM example, let’s assume only 1% of prompts are malicious, and we’ll test the model 100,000 times: 99,000 with normal prompts and 1,000 times with malicious prompts.

Run inputs against the model and test how many times it violates the security policy

With your input battery prepared, you can then run it against your system. Using our credit report LLM, assume that, out of the 1000 attempts using malicious input, it returned a identity record to an unauthorized user 30 times. And out of the 99,000 “normal” prompts, 10 accidentally returned the wrong credit report. Thus, with respect to credit reports, the model violated its security policy 40 times out of 100,000.

This is a 0.04% failure rate.

Now that we have this failure rate, multiply it by the total number of inputs you expect to receive a year. You can estimate this from previous activity and/or projections about product usage.

Assume in our case it is 10,000,000 inputs/year. Using our failure rate, we can calculate that the model will violate its security policy 4,000 times a year. This is our annual rate of occurrence (ARO).

While this approach doesn’t require a National Vulnerability Database (NVD)-type construct where you can look up a given model’s vulnerability or risk score, it doesn’t preclude it. Assuming the NVD or similar repository adjusts to accommodate this approach, a researcher could publish a model’s pURL, its security policy, and a battery of prompts used to test it. And this public repository could tell you how many times the model violated the specified policy (the failure rate). You can multiply this by your expect number of inputs to get a “back of the envelope” ARO.

Get annual loss expectancy (ALE) by multiplying SLE by ARO

Once you have your SLE ($9.2) and your ARO (4,000/year), you can multiply them to get your annual loss expectancy (ALE). Thus, specifically in terms of providing a credit report to an unauthorized person, you can say our application presents a risk of $36,800/year.2

If you want to get slight fancier using the Hubbard-Seiersen Rapid Risk Audit approach, you would multiply per below:

ALE = ((SLE-lower bound * 0.65) + (SLE-upper bound * 0.35)) * ARO

In any case, you’ll need to iterate through all the potential negative outputs (other impacts to data confidentiality, integrity, and availability) and sum these to get the model’s total risk. And then you can add this to the total risk surface of the AI application in question (i.e. including know vulnerability, insider threat, and other risks). StackAware’s AI risk register lets you do this easily, if you need a template.

So is $36,800/year good? Terrible?

Refer a friend

Frankly, there is no way to say unless you are comparing it to the value to gained from deploying the AI system in question. If you expect to earn $1,000,000 in marginal annual revenue from deploying the AI model and there are no additional risks aside from those related to prompt injection, then it would make sense to go ahead. If you expect to earn $10,000 in marginal annual revenue, this is wildly irresponsible risk to take.

Conclusion

This risk calculation will be only applicable to your specific deployment. And that’s fine. A generic standard of vulnerability “severity,” like the Common Vulnerability Scoring System (CVSS) / NVD combination never actually make a lot of sense to me. What is a show-stopper for one organization can be a nothingburger for another.

The AIRSS takes this into account and doesn’t try to be universal.

In the next (and final!) post, I’ll explain how to report out AIRSS using CycloneDX.

Ready to start measuring your AI risk?

Get your Data Defense Blueprint

As others (like the Cyentia Institute) have shown, per-record data isn’t especially helpful in the aggregate. That is because the breach of a single identity record could probably result in hundreds of thousands or millions of dollars of loss while two records wouldn’t necessarily be double that. Thus, you would probably want to have some sort of logarithmic scale to measure SLE (e.g. 1 identity record breached = $1,000,000 but 10 records breached =$1,100,000). You could then use your ARO to find the appropriate point on the “curve” and calculate your SLE from there.

Although I mentioned it previously, I feel it’s important to state again that because the ARO is completely made up, so is this figure for ALE. This is merely an example.

Deploy Securely

Discussion about this post

Ready for more?