Discover more from Deploy Securely
Clarifying data security concerns with 3rd party AI models.
A consensus appears to have developed that running artificial intelligence (AI) models yourself is necessarily more secure than having a third party do it for you.
I’m not so sure.
All other things being equal, Software-as-a-Service (SaaS) is the most secure deployment model. Check out this article for the reasons why.
Additionally, many proponents of self-hosting AI models as a security measure seem to suggest that entrusting another organization with your sensitive data at all is always the wrong call. Since very few companies (with some notable exceptions) these days aren’t doing this - at the very least most organizations leverage Infrastructure-as-a-Service (IaaS) offerings from hyperscale cloud providers - this strikes me as an odd position.
With that said, AI adds a wrinkle into the equation due to the phenomenon of what I call “unintended training.”
I define unintended training as whenever an AI model trains on information that the provider of such information does not want the model to be trained on. Examples include systems trained on:
Material non-public financial information
Personally identifiable information
where those not authorized to see this underlying information nonetheless can interact with the model itself.
Should a model be unintentionally trained on this data, it could conceivably regurgitate it - or something similar - to a party who shouldn’t have access.
Reported victims of this phenomenon include:
It’s not entirely clear how big of risk this vector represents, but since we are in the early days of generative AI, it is worth taking seriously. And when analyzing unintended training through a security lens, hosting a model yourself has the primary advantages of:
Controlling who else can use the model, be it trained intentionally or otherwise.
Being able to “roll back” the model to a previous version if any unintended training that does occur is serious enough to warrant doing so.
With most commercially-available options (e.g. GPT-4), neither of these are possible. Although #1 is an option for some Microsoft offerings.
Aside from these two advantages of using non-SaaS AI models, however, I think the SaaS vs. IaaS analysis is identical to any other situation. From a purely cybersecurity perspective, SaaS is the better way to go (again, ceteris paribus). And as I have noted before, there are ways to mitigate the risk of unintended training when using SaaS. In any case, you’ll need to conduct a systemic risk analysis to make the right call on which approach is best for you.
Since precision of language is important, though, I thought it made sense to clearly label the specific issue to which people are likely referring when they vaguely describe “data security” concerns related to SaaS AI models.
Unintended training is a risk that appears to be more serious when using an AI model operated by a 3rd party, but is simply one of many information security (and business) challenges organizations need to take into account when making build vs. buy decisions.
For those getting ready to say “but a contractual/regulatory/other legal requirement says I can’t use a SaaS-based model!” I’m not a lawyer, but think it’s sound to advise you to follow those requirements (assuming they actually exist and aren’t just made up, which I believe most such alleged requirements to be, outside some narrow exceptions). By “purely cybersecurity perspective,” I am referring to only that pertaining to defending the confidentiality, integrity, or availability of data from malicious impact. As I mentioned, there are always multiple factors at play when making these decisions.