Discover more from Deploy Securely
The Artificial Intelligence Risk Scoring System (AIRSS) - Part 1
Setting the scope.
Vulnerability management is a mess. AI will make it worse.
So that’s why I criticized the Cybersecurity and Infrastructure Security Agency (CISA) suggestion that AI engineers should build a system resembling the existing Common Vulnerabilities and Exposures (CVE) regime. While the CVE and National Vulnerability Database (NVD) system can be salvaged with some major reforms, there are structural problems that make its current incarnation a bad fit for AI systems.
Thanks for reading Deploy Securely! Subscribe for free to receive new posts and support my work.
After thinking more on the topic, I also decided that being able to communicate about the risk of an AI model, more so than just identifying vulnerabilities in it, was the critical security task.
Thus is born the Artificial Intelligence Risk Scoring System (AIRSS, pronounced like the plural of “air”). In part 1 of this series, I’ll lay out the scope of what the model addresses. In later parts, I’ll show the math and discuss exactly how to apply the AIRSS to real-world situations.
As I get feedback and refine my thinking, I will likely update this series.
The AIRSS allows evaluating the confidentiality, integrity, and availability (CIA) data impacts to (and from) artificial intelligence models. It is a tool for evaluating the models itself, not any of the supporting infrastructure. Thus, not every attack involving or potentially impacting an AI model is covered. For example:
Known vulnerability risk in the model’s infrastructure can be evaluated using the Deploy Securely Risk Assessment Model (DSRAM).
Phishing and insider threat risk can be evaluated using the Factor Analysis of Information Risk (FAIR) methodology.
The good news is that the outputs of all of these approaches can be a fungible unit: dollars / year. So, assuming you have developed a mutually-exclusive and completely exhaustive mapping of your cyber risks, you can plug in AIRSS wherever it is appropriate.
With that said, common data attributes and the relevant attack vectors covered are:
Prompt injection is clearly a threat when it comes to data confidentiality, as it can potentially force the model to reveal information - trade secrets, personal data, etc. - in violation of its security policy.
Additionally, if the model is calling a function, a malicious prompt could impact the desired execution of that function by corrupting the inputs to it. Similarly, forcing a model to provide undesirable content (e.g. how to build a bomb) could also violate its security policy. This would be a data integrity (versus confidentiality) attack, so prompt injection can impact that attribute as well.
A potentially overlapping category of malicious acts are sponge attacks, which try to waste energy or spike latency in a model. Whether attempting to deceive computer vision systems and crash a car or simply forcing a company to waste money on OpenAI credits through specially-designed prompts, these can have serious real-world impacts by attacking the model’s availability directly.
Similarly to prompt injection attacks, these try to extract information from AI systems in violation of their security policy. Separately, though, they can be targeted at the model itself (in addition to its underlying training data). Since proprietary AI systems themselves can be quite valuable, losing sole control over this intellectual property can cause major economic damage to an organization.
Sensitive data generation
Another confidentiality-related vulnerability occurs when the model aggregates disparate pieces of information that - by themselves - would not violate the model’s security policy, but then creates an output which does. This doesn’t necessarily require malicious intent on the part of anyone, but can certainly lead to a data breach if the model:
Aggregates biographical information about a person, his neighbors, and other data to return something that constitutes personal data. For example, if an LLM knows that my social graph intersects with several people living on a certain street, how many people are in my family, it could potentially intuit my street address if prompted.
Combines information from a company’s knowledge base - otherwise approved for public consumption - that reveals confidential or proprietary data about its inner workings based on inferences it makes.
Check out this post for a deep dive on the problem.
An attacker could potentially modify the outputs of a model in such a way that it violates its business or security requirements by corrupting the training data. This could occur either before or while the model is operating in production (e.g. ChatGPT trains on user input by default).
It’s important to note that AIRSS does not describe or address the likelihood of data poisoning happening, but rather the data integrity and availability risk posed by a given poisoned model.
While I haven’t talked too much about the existential risk problem of AI, and plan to leave doing so to others, the AIRSS does provide a way to measure this phenomenon. Because it provides a way to compare expected vs. actual outputs - and the impacts if the latter is incorrect - AIRSS can help to measure the risk of malicious AI, i.e. one that intentionally (whatever that means in the context of a non-living object) provides information that is incorrect.
When GPT-4 reportedly misled another human into believing it was a vision-impaired person, it was providing corrupted data, an integrity impact.
Out of scope
Anything related to availability of an AI model’s supporting infrastructure is out of scope. That is because pure denial of service efforts or encryption via ransomware wouldn’t attack the functionality of the model, but rather the code supporting it.
Similarly, simply bombarding the model with queries would be out of scope because this is a problem at the business logic layer, not the model itself.
This is simply a type of software supply chain attack - whereby an attacker seeds a corrupted component into a larger application. For example, the fact that Mithril Security successfully uploaded a lobotomized Large Language Model (LLM) to HuggingFace would be excluded from consideration by the AIRSS. Although you could evaluate the risk posed by the model in production in a certain system using the AIRSS by treating it as a case of data poisoning.
Because with unintended training the model is behaving as one would expect it to (albeit not necessarily how the user would want it to), this security risk is outside the scope of the AIRSS. I would recommend evaluating this using the unintentional insider threat approach recommended by FAIR.
I was originally going to put out the AIRSS in one big document, but it started to become unwieldy so decided to break it down into its component parts. Merely setting the stage for what the model would and would not address ending up taking up a lot of space, so it ended up requiring its own post.
With this foundation in place, the next issue will explore how business and security requirements help determine what is actually a “vulnerability.” And then in a subsequent issue (or two), I’ll tie everything together and work with some hard numbers.
Ready to roll out the AIRSS in your organization?