Discover more from Deploy Securely
The Artificial Intelligence Risk Scoring System (AIRSS) - Part 3
Doing the math for AI risk.
Welcome back to the Artificial Intelligence Risk Scoring System (AIRSS) model breakdown! Thanks for joining me on this journey.
In case you missed the previous posts, please see:
In this edition, we’ll get our hands dirty with some math, showing you how to calculate AI model risk for a given scenario. The next step in our process it to:
Understand business impact
Once you have defined what a model should and should not do, you can quantify what the impact will be if it “misbehaves.” A key step in this process will be understanding what a worst case response would be from a monetary perspective for each of the potential negative outcomes.
To give an example, consider a highly simplified situation where a credit monitoring company launches an app that is meant to allow customers to ask questions about their credit report using the fine-tuned version of GPT-3.5 described in part 2. It does this through retrieval-augmented generation (RAG) whereby the fine-tuned Large Language Model (LLM) has access to all customer credit reports and is responsible for determining what data to provide to which users.
To be clear, I would not recommend this type of architecture. A key security feature of any LLM is controlling access to sensitive data using some other method than the model itself, e.g. a rules-based business logic layer. Optimally, in this example, there would be some authentication check which ensures the LLM only has access to the currently-authenticated customer’s data before conducting any RAG to answer the customer’s questions.
But designing a secure AI system is not the point of this exercise.
And there are essentially infinite potentially permutations of other business requirements and tradeoffs that will prevent implementing a perfectly secure system. The point of this exercise is to help quantify AI model risk. And it just so happens that the credit monitoring situation has a wealth of relevant available data, from the 2017 Equifax Breach. So for this example we’ll assume that:
The total breach costs for Equifax were $1.35 billion
147 million people were impacted.
Although a smaller number also had their credit card numbers and dispute information stolen, everyone impacted had the following stolen (which for simplicity I’ll just call an “identity record”):
Dates of birth
Social security numbers
Driver’s license numbers
With this data we can determine the financial impact of the model violating its security policy one time in one specific manner (providing an identity record to any unauthorized party).1 This comes out to roughly $9.2 per breached record, which represents your single loss expectancy (SLE).
Find or create a standard battery of inputs
Next, you will need to come up with a test suite of inputs for the AI model. For an LLM, this can be a series of prompt injection attempts. For a computer vision model it could be intentionally deceptive images.
And so forth.
You can use an off-the-shelf battery of inputs like PromptBench for LLMs or develop your own specifically for the model and use case in question. Either approach can work but it will be important to note how you tested your model and only compare models against similar test suites. For LLMs that have any sort of memory capability, you’ll likely want to intersperse 1-shot and multiple-shot prompt injection attempts to simulate real world conditions.
Like this approach? Check out StackAware’s AI assessment offering and see how we can help you identify security, compliance, and privacy risk when using artificial intelligence.
Something else to think about is how aggressively you want to test the model. While a worst case approach would assume that every input is malicious, this doesn’t seem realistic. If you have some historical data, you can look at how many malicious inputs the model has historically received. If you don’t, you can estimate this by using a benchmark of how many malicious authentication attempts your site/product gets, etc.
Using our credit report LLM example, let’s assume only 1% of prompts are malicious, and we’ll test the model 100,000 times: 99,000 with normal prompts and 1,000 times with malicious prompts.
Run inputs against the model and test how many times it violates the security policy
With your input battery prepared, you can then run it against your model. Using our credit report LLM, assume that, out of the 1000 attempts using malicious input, it returned a identity record to an unauthorized user 30 times. And out of the 99,000 “normal” prompts, 10 accidentally returned the wrong credit report. Thus, with respect to credit reports, the model violated its security policy 40 times out of 100,000.
This is a 0.04% failure rate.
Now that we have this failure rate, multiply it by the total number of inputs you expect to receive a year. You can estimate this from previous activity and/or projections about product usage.
Assume in our case it is 10,000,000 inputs/year. Using our failure rate, we can calculate that the model will violate its security policy 4,000 times a year. This is our annual rate of occurrence (ARO).
While this approach doesn’t require a National Vulnerability Database (NVD)-type construct where you can look up a given model’s vulnerability or risk score, it doesn’t preclude it. Assuming the NVD or similar repository adjusts to accommodate this approach, a researcher could publish a model’s pURL, its security policy, and a battery of prompts used to test it. And this public repository could tell you how many times the model violated the specified policy (the failure rate). You can multiply this by your expect number of inputs to get a “back of the envelope” ARO.
Get annual loss expectancy (ALE) by multiplying SLE by ARO
Once you have your SLE ($9.2) and your ARO (4,000/year), you can multiply them to get your annual loss expectancy (ALE). Thus, specifically in terms of providing a credit report to an unauthorized person, you can say our LLM presents a risk of $36,800/year.2
You’ll need to iterate through all the potential negative outputs (other impacts to data confidentiality, integrity, and availability) and sum these to get the model’s total risk. And then you can add this to the total risk surface of the AI application in question (i.e. including know vulnerability, insider threat, and other risks).
So is $36,800/year good? Terrible?
Frankly, there is no way to say unless you are comparing it to the value to gained from deploying the AI system in question. If you expect to earn $1,000,000 in marginal annual revenue from deploying the AI model and there are no additional risks aside from those related to prompt injection, then it would make sense to go ahead. If you expect to earn $10,000 in marginal annual revenue, this is wildly irresponsible risk to take.
This risk calculation will be only applicable to your specific deployment. And that’s fine. Trying to create a generic standard of vulnerability “severity,” as attempted by the Common Vulnerability Scoring System (CVSS) / NVD combination, never actually make a lot of sense to me. What is a show-stopper for one organization can be a nothingburger for another.
The AIRSS takes this into account and doesn’t try to be universal.
In the next (and final!) post, I’ll explain how to report out AIRSS using CycloneDX.
Ready to start measuring your AI risk?
As others (like the Cyentia Institute) have shown, per-record data isn’t especially helpful in the aggregate. That is because the breach of a single identity record could probably result in hundreds of thousands or millions of dollars of loss while two records wouldn’t necessarily be double that. Thus, you would probably want to have some sort of logarithmic scale to measure SLE (e.g. 1 identity record breached = $1,000,000 but 10 records breached =$1,100,000). You could then use your ARO to find the appropriate point on the “curve” and calculate your SLE from there.
Although I mentioned it previously, I feel it’s important to state again that because the ARO is completely made up, so is this figure for ALE. This is merely an example.