Can ChatGPT convert unstructured vulnerability reports into CycloneDX VEX statements (and vice versa)?

Diving into AI applications for cybersecurity.

Mar 10, 2023

TL;DR - ChatGPT cannot yet create a correctly-formatted CycloneDX Vulnerability Exploitability eXchange (VEX) statement from an unstructured report created by a vendor. It can, however, do the opposite quite well.

A huge problem in security is communication.

My company StackAware is a targeting a very small sliver of it. Check out this post for details, but suffice to say, having a machine-readable way of describing vulnerabilities is vital to communicating effectively about them at any sort of scale. Software bills of material (SBOM) amplify the need for this by an order of magnitude.

Essentially, the CycloneDX Vulnerability Exploitability eXchange (VEX) format allows organizations to use a standard format when describing specific security flaws in their software.

The most common practice today in vulnerability notification and disclosure processes is custom-building a page explaining a company’s position regarding a specific vulnerability using only human-readable language. This solves an immediate need but is incredibly difficult to scale for hundreds or thousands of known issues.

Since VEX statements are a relatively new concept, I wanted to see if I could translate between existing report styles and the CycloneDX format. Since I think ChatGPT and other generative artificial intelligence (AI) platforms will have huge labor-saving implications for security, I wanted to see if the former could help in this task.

The results were mixed.

Going from unstructured text to VEX

The harder part of the problem is taking an unstructured, human-readable blog post and turning it into a properly-formatted VEX statement. For this challenge, I used a vulnerability report regarding log4shell (CVE-2021-44228) from Quarkus.

I experimented with different prompts to begin with. About half of them resulted in ChatGPT telling me that it couldn’t perform the given task. Note that, even if ChatGPT says this, it doesn’t mean you can’t still feed it another prompt and get something resembling a properly-formatted response in return.

The below one was the first that worked consistently without ChatGPT complaining:

You are a CycloneDX Vulnerability Exploitability eXchange (VEX) converter. I will provide you with unstructured vulnerability reports as text. I need you to convert each one into CycloneDX-compliant VEX reports, e.g. using the "vulnerabilities" section. Provide ONLY a JSON string in one-click copy format (no other explanatory text such as preamble or conclusion) and ensure the output complies with the CycloneDX 1.4 VEX format.

Prompt 1: some made-up fields and duplicates of others

When I fed it the body text of the Quarkus report, ChatGPT gave me this:

My observation are:

ChatGPT ignored my instructions to avoid preamble text (I have never consistently gotten it to not provide preamble text in any situation).
It provided an “exploitabilities” field, which does not exist in the CycloneDX standard.
It provided multiple “recommendations,” which is also incorrect. The CycloneDX “vulnerabilities” field has a “recommendation” section but it is singular and should be one string, not several.

Prompt 2: correct but not especially useful

I corrected it on items 2 and 3 and it replied:

This appears to be syntactically correct, but describes the log4shell vulnerability in general and doesn’t provide the key piece of the VEX statement: that Quarkus is not_affected by log4shell.

Prompt 3: closing in on the “state” field

I bring this problem up to ChatGPT and it does something weird, adding a “notes” field, which does not exist in the CycloneDX format:

Prompt 4: so close…

I again try to correct ChatGPT, but it’s still wrong, providing an upper case “not_affected” and an unstructured “justification.”

Prompt 5: giving up

Unfortunately, this is where things started going off the rails. I could not get ChatGPT to provide a valid “justification” despite trying several different prompt variations.

Interrogating ChatGPT’s knowledge of the VEX format

Since it seemed ChatGPT was having problems at this specific juncture, I decided to interrogate its knowledge. The results were disappointing:

None of these are correct choices for the “justification” field. It looks like the underlying information ChatGPT is pulling from is wrong. Even after attempting to train it on CycloneDX’s VEX example, it still would not work correctly. I decided to cut my losses here and tackle the reverse problem.

Translating structured VEX statements to human-readable ones

This is where ChatGPT really began to shine. The first prompt I entered (below) generated great results.

After feeding it this example, it responded:

Seems about right to me!

My attempts to refine the prompt a little bit more to remove the repetitive language (e.g. “the statement…”) were not successful, so I decided to stop here.

Conclusion

As of the date of this post, it doesn’t seem like ChatGPT can convert unstructured vulnerability reports into VEX statements with any consistency. But that doesn’t mean it won’t be able to in the near future. With the speed at which this and other AI tools are improving, I can see such a capability just on the horizon. And I plan to start using it as soon as it becomes available.

The good news it that the reverse tasks - going from structured to unstructured data - is simple and works pretty well. So if you ever get an unwieldy vulnerability report in a .pdf or similar format, consider plugging it into ChatGPT for a quick summary (taking into account my recommendations about secure AI usage).

Appendix

Below I provide the full output from each of the numbered prompts above, in case you want to tinker with them:

Result from prompt 1

{

"specVersion": "1.4",

"vulnerabilities": [

{

"id": "CVE-2021-44228",

"source": "National Vulnerability Database",

"description": "Apache Log4j 2.0-beta9 to 2.14.1, 2.15.0 to 2.15.0: Log4j API client code can be exploited to download and run arbitrary code.",

"ratings": {

"base": {

"score": 10,

"severity": "CRITICAL"

}

},

"exploitabilities": [

{

"id": "CWE-502",

"source": "NIST NVD",

"description": "Describes an issue in software code that allows an attacker to execute malicious code."

}

],

"advisories": [

{

"id": "RHSA-2022:0014",

"source": "Red Hat",

"description": "Red Hat has released an update for Quarkus that fixes this vulnerability."

}

],

"recommendations": [

{

"description": "Ensure that you are not using the affected version of Log4j in your application."

},

{

"description": "Set the log4j2.formatMsgNoLookups system property to true as an added precaution."

},

{

"description": "Upgrade Log4j to version 2.16 or higher."

}

]

}

]

}

Result from prompt 2

{

"specVersion": "1.4",

"vulnerabilities": [

{

"id": "CVE-2021-44228",

"source": "National Vulnerability Database",

"description": "Apache Log4j 2.0-beta9 to 2.14.1, 2.15.0 to 2.15.0: Log4j API client code can be exploited to download and run arbitrary code.",

"ratings": {

"base": {

"score": 10,

"severity": "CRITICAL"

},

"impact": {

"score": 6.4,

"severity": "MEDIUM"

},

"exploitability": {

"score": 3.9,

"severity": "LOW"

}

},

"advisories": [

{

"id": "RHSA-2022:0014",

"source": "Red Hat",

"description": "Red Hat has released an update for Quarkus that fixes this vulnerability."

}

],

"recommendation": "Ensure that you are not using the affected version of Log4j in your application. Set the log4j2.formatMsgNoLookups system property to true as an added precaution. Upgrade Log4j to version 2.16 or higher."

}

]

}

Result from prompt 3

{

"specVersion": "1.4",

"vulnerabilities": [

{

"id": "CVE-2021-44228",

"source": "National Vulnerability Database",

"description": "Apache Log4j 2.0-beta9 to 2.14.1, 2.15.0 to 2.15.0: Log4j API client code can be exploited to download and run arbitrary code.",

"ratings": {

"base": {

"score": 10,

"severity": "CRITICAL"

},

"impact": {

"score": 6.4,

"severity": "MEDIUM"

},

"exploitability": {

"score": 3.9,

"severity": "LOW"

}

},

"advisories": [

{

"id": "RHSA-2022:0014",

"source": "Red Hat",

"description": "Red Hat has released an update for Quarkus that fixes this vulnerability."

}

],

"notes": [

{

"type": "OTHER",

"content": "Quarkus, its extensions, and dependencies do not use the log4j version 2 core library, so they are not susceptible to this vulnerability. In most cases, no corrective action is required for any Quarkus backed projects. Quarkus does expose the log4j API jar which in itself is not vulnerable. This is purely a compatibility and translation layer, which maps calls to a different logging backend (JBoss Logging). Therefore, any direct usage of the log4j API is not impacted."

}

]

}

]

}

Result from prompt 4

{

"specVersion": "1.4",

"vulnerabilities": [

{

"id": "CVE-2021-44228",

"source": "National Vulnerability Database",

"description": "Apache Log4j 2.0-beta9 to 2.14.1, 2.15.0 to 2.15.0: Log4j API client code can be exploited to download and run arbitrary code.",

"ratings": {

"base": {

"score": 10,

"severity": "CRITICAL"

},

"impact": {

"score": 6.4,

"severity": "MEDIUM"

},

"exploitability": {

"score": 3.9,

"severity": "LOW"

}

},

"advisories": [

{

"id": "RHSA-2022:0014",

"source": "Red Hat",

"description": "Red Hat has released an update for Quarkus that fixes this vulnerability."

}

],

"analysis": {

"state": "NOT_AFFECTED",

"justification": "Quarkus, its extensions, and dependencies do not use the log4j version 2 core library, so they are not susceptible to this vulnerability. In most cases, no corrective action is required for any Quarkus backed projects. Quarkus does expose the log4j API jar which in itself is not vulnerable. This is purely a compatibility and translation layer, which maps calls to a different logging backend (JBoss Logging). Therefore, any direct usage of the log4j API is not impacted."

}

]

}

Result from prompt 5

{

"specVersion": "1.4",

"vulnerabilities": [

{

"id": "CVE-2021-44228",

"source": "National Vulnerability Database",

"description": "Apache Log4j 2.0-beta9 to 2.14.1, 2.15.0 to 2.15.0: Log4j API client code can be exploited to download and run arbitrary code.",

"ratings": {

"base": {

"score": 10,

"severity": "CRITICAL"

},

"impact": {

"score": 6.4,

"severity": "MEDIUM"

},

"exploitability": {

"score": 3.9,

"severity": "LOW"

}

},

"advisories": [

{

"id": "RHSA-2022:0014",

"source": "Red Hat",

"description": "Red Hat has released an update for Quarkus that fixes this vulnerability."

}

],

"analysis": {

"state": "not_affected",

"justification": "OTHER_DEPENDENCY - Quarkus, its extensions, and dependencies do not use the log4j version 2 core library, so they are not susceptible to this vulnerability. In most cases, no corrective action is required for any Quarkus backed projects. Quarkus does expose the log4j API jar which in itself is not vulnerable. This is purely a compatibility and translation layer, which maps calls to a different logging backend (JBoss Logging). Therefore, any direct usage of the log4j API is not impacted."

}

]

}

Deploy Securely

Discussion about this post