Vulnerability management for AI could turn out to be a huge mess.
But the Artificial Intelligence Risk Scoring System (AIRSS) brings order to the chaos.
Over the past 3 posts, I’ve been building out the AIRSS, so please check out these previous editions if you missed them:
In part 4, I wrap up the series and bring it home, showing how to report your findings using the Open Web Application Security Project (OWASP) CycloneDX SBOM standard. Without any sort of standardized reporting framework, it’s difficult to make apples-to-apples comparisons regarding risk. And since I consider it to be the most forward-leaning and rapidly updated approach, I choose CylconeDX to do so.
Communicate AI risk with CycloneDX
In part 3 we discussed a Large Language Model (LLM) application built on a fine-tuned version of GPT-3.5 that had prompt injection risk to the tune of $36,800/year.
Using this situation, we could report the vulnerability as follows:
{
"bomFormat": "CycloneDX",
"specVersion": "1.5",
"serialNumber": "urn:uuid:3e671687-495d-45f5-a30f-a58921a69b79",
"components": [
{
"type": "machine-learning-model",
"name": "Credit Report LLM",
"purl": "pkg:generic/gpt-3.5-turbo@0613?ft=stackaware::80Z1hDhg",
"bom-ref": "urn:uuid:3e671687-495d-45f5-a30f-a58921a69b79#pkg:generic/gpt-3.5-turbo@0613?ft=stackaware::80Z1hDhg"
}
],
"vulnerabilities": [
{
"bom-ref": "urn:uuid:3e671687-495d-45f5-a30f-a58921a69b79#prompt_injection_credit_report",
"id": "prompt_injection_credit_report",
"source": {
"name": "StackAware",
"url": "https://this.is.not.a.real.url.com/prompt_injection_report"
},
"ratings": [
{
"source": {
"name": "StackAware",
"url": "https://this.is.not.a.real.url.com/prompt_injection_report"
},
"score": 36800,
"method": "other",
"vector": "SLE:$9.2|ARO:4000/year",
"justification": "The score for this vulnerability derives from the Artificial Intelligence Risk Scoring System (AIRSS) v. 1.0 and represents the annual loss expectancy (ALE) in dollars per year, which is derived from a single loss expectancy (SLE) of $9.2 and an annual risk of occurrence (ARO) of 4,000/year."
}
],
"description": "When exposed to a battery of malicious prompts containing malicious inputs, the model will return sensitive information, such as credit reports, to unauthorized users.",
"recommendation": "At the model level, consider applying a safety layer instructing it only to return a credit report to a given user if it has 99.999% confidence in the user's identity. At the level of the broader application, a business logic check requiring authentication before exposing the credit report in question to the model would likely be the best solution.",
"proofOfConcept": {
"reproductionSteps": "Randomly select 1,000 prompts from the PromptBench suite of malicious inputs, combine with 99,000 benign prompts, and input them into the application hosting the model.",
"supportingMaterial": [
{
"contentType": "application/x-www-form-urlencoded",
"content": "https://github.com/microsoft/promptbench/tree/main"
}
]
},
"created": "2023-10-20T00:00:00.000Z",
"credits": {
"individuals": [
{
"name": "Walter Haydock"
}
]
},
"analysis": {
"state": "exploitable",
"response": [
"will_not_fix"
],
"detail": "Due to the difficulty in further reducing the risk of data exposure from malicious inputs and the acceptable annual loss expectancy, we did not plan to attempt to further mitigate this vulnerability."
},
"affects": [
{
"ref": "urn:uuid:3e671687-495d-45f5-a30f-a58921a69b79#pkg:generic/gpt-3.5-turbo@0613?ft=stackaware::80Z1hDhg"
}
]
}
]
}
I won’t get into the details of the CycloneDX output here, but please comment if you have any questions or proposed corrections.
Note that this represents only a single risk to the model. We could have created entries for data poisoning, sensitive data aggregation, and any other in-scope issues. The good news is that, since we use ALE in dollars per year, a fungible unit, we can combine risks across various vulnerabilities and attack vectors. This allows us to holistically describe risk.
Although I have criticized the sufficiency of merely logging CVEs against AI models and applications, the CVE system is not necessarily mutually exclusive with the AIRSS. If you have the requirement or need to log CVEs against a certain system, you can do so and list the CVE identifier as the id
in the vulnerabilities field. If you are creating an entry in the NVD, however, it would be important to create a link in the “Advisories, Solutions, and Tools” section to a more detailed AIRSS report.
Conclusion
Version 1.0 of the AIRSS provides a standardized method for describing - and quantifying - artificial intelligence risk. This is my first shot at creating such a model, and I imagine future iterations will evolve substantially. I look forward to incorporating feedback into future versions of the system.
Like this approach? Need help managing AI-related risk?