Reviewing Palantir's vulnerability management program
Major points for transparency, but room for improvement.
Palantir’s post about their Container Vulnerability Scanner (CVS) tool program has been making the rounds on social media. And I think it’s a great document to highlight a bunch of interesting things about the discipline of vulnerability management in general, both good and bad.
Overall, it’s pretty cool that the company is being so transparent about the way they tackle security. Frankly, many organizations don’t put this level of thought into how they attack the problem, and often do so in an ad-hoc manner.
Even when they have a more systematic approach, many organizations release almost no information about their practices, fearing that transparency will:
open up a can of worms leading customers to ask increasingly difficult questions, and/or
potentially assist hackers in planning an attack.
And, if you post your policy and procedures on the internet, random bloggers might critique them!
With that said, I think there is both some excellent stuff and some room for improvement in Palantir’s approach.
The good stuff
A lot of Palantir’s focus (at least in the blog) is on minimizing and hardening their container images, which is generally a good use of time. Many open source container images are relatively “heavy” and have a lot of packages to allow them to be used in a variety of different scenarios. Additionally, by putting a lot of vulnerability remediation work on autopilot to the extent possible, Palantir avoids the need to individually prioritize and fix many individual issues. Finally, they also explicitly allow for risk acceptance in certain situations, which is vital to do as part of a risk management program (otherwise people will just do implicit risk acceptance, which is much harder to track and perform consistently).
Container hardening and attack surface reduction
Excess functionality that isn’t strictly necessary represents an unnecessary security risk. If there is no need for a given piece of code to exist, then it shouldn’t. Otherwise, it provides a potential attack vector. Although modules not loaded to memory generally don’t present a security risk, it’s always possible that an attacker could leverage unused functionality in third-party software in unexpected (and bad) ways.
Having excess code present creates another issue: it often contains known security vulnerabilities that trigger scanner alerts and make it even harder to sort the signal from the noise. Palantir alludes to this by saying:
This approach ensures we can focus our limited resources and effort on applicable security defects, rather than on remediating ancillary package or dependency vulnerabilities that have no security impact on our products.
From a security perspective, less is more, and Palantir clearly gets it.
Continuous updating and patching
Additionally, Palantir claims to automate the patching of container images, ensuring the latest updates are in place every time a container is torn down and redeployed. Sometimes, the easiest way to address a vulnerability scanner finding is to simply update the software in question. If this is the case (and it’s reasonable to believable that it is most of the time due to Palantir’s Apollo continuous deployment platform), then a simple update solves the (immediate) problem. No need to evaluate CVE exploitability if you can just resolve the issue immediately with minimal effort!
The only potential drawback here is that such an automated processes could potentially make a supply chain attack easier. Because updates aren’t manually applied and verified, it is possible that Palantir accidentally applies a patch that contains malicious code to its environment. Presuming their software composition analysis (SCA) tool (JFrog Xray) has the ability to identify and block known malicious packages, then this would be mostly a moot point (although I don’t know if it does in reality).
But if there were only a manual (not machine-readable or consumable by their SCA tools) public notification of malicious code being inserted into one of Palantir’s dependencies, then it’s possible that their Apollo system might automatically integrate the malicious code into their environment before security or engineering teams can react.
If there is no public knowledge of the supply chain compromise, however, then those applying manual and automatic updates would be equally impacted.
Thus, overall, I would say this is a sound practice from a risk management perspective.
Supply chain integrity verification
According to the post, Palantir is also working toward SBOM generation and validation using Apollo. The former is almost trivial at this point due to the number of tools available, but the latter is more interesting. Confirming that you have the actual software deliverable you want (and not something masquerading as it) is important in stopping supply chain attacks such as dependency confusion. While this seems mostly aspirational at the moment, I will be interested to see where Palantir takes it.
Clear risk acceptance procedures
Palantir notes:
By default, any findings that have been suppressed must also define an expiration date for the suppression, which mandates periodic re-review and re-acceptance.
Making sure to revisit risk acceptance decisions is something many organizations fail at. Because of periodic changes to the threat environment, sensitivity of data stored, and presence of absence of other vulnerabilities, the risk posed by a single issue will change over time.
I’ve attempted to handle re-analysis of these decisions through pre-scheduled meetings in the past, and have generally seen that approach fall short due to the manual steps needed and the fact that more urgent problems often intervene. Auto-expiring risk acceptances (or “suppressions” in CVS language) is an excellent way to ensure continuous re-appraisal of previous decisions.
Needs improvement
A major problem with critiquing organizations that discuss their security practices publicly is selection bias. Many (I’d wager to say most) companies say and do the absolute minimum legally required (and sometimes less) regarding their data security and privacy posture.
Thus, the ones that do publish detailed information may highlight gaps in their program that a majority of their industry peers also have, but simply aren’t identifiable because of their silence on the matter.
With that caveat in place, I think there are some potential areas for improvement, which I lay out below.
Heavy compliance focus
Palantir states:
CVS is designed to meet or exceed the FedRAMP vulnerability management requirements across all of our production environments.
Unfortunately, FedRAMP has perverse incentives built into it, making building a security program specifically to address the standard unadvisable. Palantir doesn’t have too much choice here due to its customer base, but I would try to construct my program around compliance standards, rather than for them. Data confidentiality, integrity, and availability should be the top priority, with adherence to frameworks a secondary objective.
As one such example of FedRAMP’s faults, the program disincentivizes organizations from finding vulnerabilities in their software by punishing them for doing so. This occurs through a series of escalating reviews and remediation plans triggered merely by the identification of a certain number of vulnerabilities. Given that the quantity alone of vulnerability findings is a mostly irrelevant metric, it’s unfortunate to see this get so much attention from the standard.
FedRAMP also requires the use of the Common Vulnerability Scoring System (CVSS) as a “risk rating.” Unfortunately, the CVSS standard itself clearly states “CVSS Measures Severity, not Risk.” While admittedly the FedRAMP document I link to is dated from before CVSS made this clarification, FedRAMP doesn’t seem to have amended its guidance. Most federal agencies also still use CVSS as a primary measure of risk as a matter of habit and directive.
Reliance on CVSS
Mimicking FedRAMP’s language, the Palantir post also refers to “CVSS risk scoring,” and lays out a detailed series of remediation timelines based on the standard.
Palantir enforces strict SLAs for vulnerability remediation. While these SLAs represent the absolute maximum time permitted to address a vulnerability, we aim to fix them significantly faster in practice. We prioritize critical and high vulnerabilities for expedient mitigation and remediation when they are detected.
In addition to claiming to not represent risk, CVSS scores are a generally flawed metric for evaluating security issues. Please review this post for details on my stance, but building a program around a broken standard does not lay a solid foundation.
Furthermore, focusing on “critical and high” vulnerabilities (meaning CVSS 7.0+) is well documented to be inefficient and sub-optimal from a security perspective (including according to the organization that publishes the CVSS).
I’ll note that Palantir does not explicitly say their timelines are built on CVSS, but the qualitative terminology mirrors that used in the standard and the blog post mentions CVSS earlier. So I think it’s a fair assumption that they use it.
Dichotomy of first- and third-party code remediation timelines
Palantir in fact has two different timetables for vulnerability remediation, one for “underlying infrastructure, containers, or hosts” and one for “Palantir-developed software products.” To me this is strange as it appears to create a dichotomy between third-party and first-party code, although I am not 100% clear on the distinction here.
Attackers do not care a whiff who developed the code they are exploiting, so having different standards between the two would seem to create a potential gap in your security posture.
It seems the justification for having a longer timeline for first-party code is that it “may be significantly more complicated to remediate.” Thus, it may be that Palantir is making an implicit risk acceptance decision based on the fact that it costs more to fix their own code, which could be reasonable. But it would be great to see that logic made explicit (if my interpretation is in fact correct). In any case, that would be a very coarse-grained distinction to make.
It also wouldn’t make sense to me for someone to see a highly exploitable vulnerability in first-party code but ignore it and work on other issues in third-party code - that are less exploitable - but that have nominally tighter deadlines. People often say “just use common sense” in these situations, but that is generally not what happens. In the heat of the moment, and with potential fear of recrimination if there is a security breach, people usually just robotically follow what policies tell them to do. This can lead to unfortunate outcomes.
Information security teams responsible for risk acceptance
In our cloud environment, all suppressions must be manually reviewed and approved by the Information Security team.
This was probably the biggest issue for me. If you have read Deploying Securely for a while, you’ll know that I think cybersecurity teams should be advisors and implementors but not decision makers when it comes to risk. Business or mission leaders who have full visibility of all (not just cybersecurity) risk should be in the hot seat. Putting security teams in charge creates unhealthy pressure on them to make certain types of calls that they don’t usually have the necessary context to make.
I imagine some of you are getting ready to say “well Palantir has really important military and intelligence community customers, so security should be the number one priority.” If that describes you, then I’ll recommend you read this post, which was a response to someone with that sort of opinion.
Conclusion
In general, and assuming Palantir follows the guidance that it published recently, I would say that it is in the upper tier of companies from a vulnerability management perspective. Their transparency is to be lauded, and I think there are a lot of good things at work in their program. There is, however, room for improvement.
I may just think differently than they do, though, as I applied several times to the company but never received an offer.