Protect yourself against unintended training risk in legal agreements
Confidentiality in the age of AI.
Today’s post is a collaboration with Jeff Cunningham of the law firm McAngust Goudelock & Courie. Although Jeff is a lawyer, this is strictly educational information only and does not constitute legal advice.
Protecting confidential information is a key requirement for business operations. Whether it be
Material nonpublic information (MNPI)
Protected health information (PHI)
Personal data (per the GDPR)
keeping certain things secret can be important. Non-disclosure agreements (NDAs), either stand-alone or incorporated into larger contracts, are thus a very common legal agreement.
But as Jeff wrote about on his Substack (which Walter highly recommends) artificial intelligence (AI) is changing the game substantially when it comes to this and other legal issues. So in this post, we’re going to provide some draft confidentiality language that addresses AI head on.
While NDAs are often somewhat broad and some might say this approach is unnecessary, we think it’s important to be transparent with customers and partners as to how their data is treated. As Zoom’s fiasco last summer showed, opacity and confusion about AI use can cause serious reputational damage.
And most importantly, by putting guardrails in place all parties will be clear about what constitutes acceptable data handling procedures and what does not.
Suggested NDA language
The text below represents only the relevant definitions necessary to set the stage for the final paragraph (which specifically addresses AI use). This is only a skeleton: we expect your agreements would be more detailed and lengthy based on your needs.
“Confidential Information” is all information provided by one party to another, whether of a technical, business, financial, or any other nature, disclosed in any manner, whether verbally, electronically, visually, or in a written or other tangible form, which is either identified or designated as confidential or proprietary or which should be reasonably understood to be confidential or proprietary in nature. The Party disclosing Confidential Information under this Agreement is the “Discloser,” and the Party receiving the Confidential Information is the “Recipient.”
Recipient agrees that it will not use or disclose any Confidential Information of Discloser for any purpose except as contemplated under this Agreement. Recipient will limit access to Discloser’s Confidential Information to those employees, consultants, vendors, agents, or attorneys (“Representatives”) who must have access to it in order to implement this Agreement and are under an obligation of confidentiality.
These two paragraphs set up the provision specifically addressing AI:
Recipient may process Confidential Information using machine learning or artificial intelligence (“AI”) models. Such processing is contemplated by this Agreement so long as Recipient obtains commercially reasonable assurances the AI model is not both:
trained on Confidential Information; and
lawfully available to any person or entity aside from Discloser, Recipient, or Representatives of Recipient.
Basically, this says that if you use AI to process Confidential Information you need to make sure either:
The model is not trained on it; or
The model is trained on it but not available to anyone who isn’t allowed to see Confidential Information to begin with.
Which risks does this language address? Which ones doesn’t it?
The above provision specifically targets unintended training. For example, it would prohibit the Recipient taking Confidential Information and using it to prompt ChatGPT unless the Recipient had opted out of training. Unless the Recipient for some reason opts-in to training, however, it would be fine to use the GPT application programming interface (API) because it does not train on user data by default.
The language we provide does NOT address risks like the Recipient:
Training or fine-tuning its own proprietary model on your Confidential Information.
Giving Confidential Information to a third-party (AI) tool that retains it forever.
Exposing it through sensitive data generation (SDG)
The first issue is more of an intellectual property ownership than a confidentiality one. Clearly specifying what the Recipient can do with any derived from Confidential Information, aside from disclosing, will be critical to addressing this challenge.
The second issue isn’t AI specific, and you should specify retention practices in the contract or a security or data protection addendum.
The third issue, related to SDG, has nothing to do with the recipient’s use of AI, but rather how they expose information that does not appear confidential to AI tools, either directly or indirectly by publishing it on the internet. Large Language Models (LLM) have proven eerily capable of inferring sensitive items from scraps of publicly available data. So it’s quite conceivable a properly-trained LLM could intuit Confidential Information even if the Recipient didn’t intend to expose it. This is particularly challenging with the use of anonymized client testimonials and other marketing materials.
For example, an LLM could de-anonymize a “ghost story” (case study that doesn’t name the company in question) posted on a marketing website, provided sufficient context.
Ensuring your counterparty has an AI policy that sensitizes employees to this risk can help mitigate it. But frankly, because of SDG we suspect the confidentiality of certain types of data - especially trade secrets - is going to become less and less defensible as time passes.
Technology moves fast and often legal practices don’t catch up quickly enough. We hope the above provides a starting point for discussion when drafting confidentiality agreements that don’t obstruct the use of AI while at the same time applying reasonable safeguards.
Need StackAware’s help evaluating your AI risk?
Are you an attorney looking for outside General Counsel support?