3 ways ISO 42001 can help AI-powered companies comply with California's training data transparency bill (AB-2013)
California asserts global jurisdiction over AI.
On September 28, 2024, California passed bill AB-2013, “Generative artificial intelligence: training data transparency.” Here is my analysis of the law and its impacts from a practical, business perspective.
As always, this is not legal advice.
Whom and what does AB-2013 impact?
The bill applies:
to generative AI systems released on/after January 1, 2022
if “made publicly available to Californians for use”
beginning January 1, 2026
Someone asked on the relevant LinkedIn thread whether this applies to “[o]nly CA companies or any company doing business in CA?” My answer is that this applies far more broadly than either of these categories because:
a developer under the bill is a “person, partnership, state or local government agency, or corporation that designs, codes, produces, or substantially modifies an artificial intelligence system or service for use by members of the public.”
the law doesn’t define “Californian.” Assuming this term is merely limited to California legal residents, AB-2013 would still have global reach because it applies to anything “made publicly available” to them.
the bill applies “regardless of whether the terms of…use include compensation.”
So the short version is that this law applies to every open-source or commercially-available-to-the-public generative AI system released or “substantially” fine-tuned in 2022 or later on the planet.
The bill defines generative AI as that which “can generate derived synthetic content, such as text, images, video, and audio, that emulates the structure and characteristics of the artificial intelligence’s training data.”
The bill does not apply to systems:
used only for cybersecurity or physical safety1
solely used for aircraft operation in U.S. airspace2
only used for federal security, military, or defense purposes
What is the penalty for non-compliance?
Not specified in the bill.
ChatGPT o1-preview suggested the following (edited for clarity, brevity, and accuracy):
Implementation relies on existing legal frameworks and general principles of statutory interpretation. Here’s how AB-2013 might work under California law despite the absence of specified penalties:
Implied Enforcement Through Existing Laws
Unfair Competition Law (UCL): Non-compliance could be considered an unfair or unlawful business practice under California’s Unfair Competition Law (Business and Professions Code § 17200 et seq.). This law allows for civil actions against entities engaging in unfair competition, which includes any unlawful, unfair, or fraudulent business act.
Consumer Legal Remedies Act (CLRA): If consumers are affected by a developer’s failure to comply, they might bring actions under the CLRA (Civil Code § 1750 et seq.), which protects against deceptive business practices.
Attorney General Enforcement
The California Attorney General may have the authority to enforce compliance through injunctions or other legal actions, compelling developers to adhere to the law’s requirements.
Private Right of Action
Although AB-2013 doesn’t specify a private right of action, courts may interpret the statute to allow individuals or entities harmed by non-compliance to sue for remedies such as injunctions or declaratory relief.
Administrative Enforcement
Relevant state agencies could promulgate regulations or guidelines to enforce the law, using their existing powers to ensure compliance even without specified penalties.
Judicial Remedies
Courts may impose equitable remedies, such as ordering a developer to comply with the disclosure requirements, if a lawsuit is brought before them.
What does AB-2013 require developers to do?
Publicly post the following information about generative AI system training data:
sources/owners of data
whether data sets are purchased/licensed
whether synthetic data was used for training
whether it was cleaned, processed, or modified
how data sets relate to purpose of GenAI system
dates each data set was first used in development
number of data points (can be estimated if dynamic)
time period of collection (noting ongoing collection)
intellectual property protections for the training data
whether it has personal3 (including aggregated4) information
What are the corresponding ISO 42001 controls?
A.4.3 - data resources
This requires noting data:
provenance
intended use
preparation techniques
A.7.3 - acquisition of data
For this control, you must document
data rights
quantity of data
characteristics of data
A.7.6 - data preparation
This one focuses on techniques such as data:
cleaning
normalization
labeling and encoding
AB-2013 doesn’t call out ISO 42001 by name, how will its Annex A controls help me comply?
A well-functioning AI Management System (AIMS) will address all of these requirements and more.
A key challenge for ISO 42001-certified companies will be to ensure the publicly-posted data required by the bill stays up to date. This is not a trivial concern given how the one outstanding Securities and Exchange Commission (SEC) charge against SolarWinds and its Chief Information Security Officer (CISO) stem from alleged inaccuracies in the company’s publicly-posted statement about its security.
To address this challenge, I would recommend:
Automating updates whenever something changes internally.
Exposing as much internal data as you can while staying within your risk appetite.
The administrative costs most companies face to get multiple departments signed off on public statements can be steep. So agreeing on the exact data to be published automatically will likely save time and prevent inaccuracy in the long run.
Need help understanding your AI-related compliance obligations?
StackAware helps AI-powered companies in:
Financial services
Healthcare
B2B SaaS
measure and manage their risk related to:
Cybersecurity
Compliance
Privacy
Through our AIMS Accelerator program, we get companies ISO 42001-ready in 90 days, building the infrastructure to comply with laws like AB-2013 as well as adapting to future regulatory changes.
Ready to learn more?
This exception applies to any “system or service whose sole purpose is to help ensure security and integrity.” The bill refers to subdivision (ac) of Section 1798.140 of California’s code, which defines “security and integrity” as the ability of:
(1) Networks or information systems to detect security incidents that compromise the availability, authenticity, integrity, and confidentiality of stored or transmitted personal information.
(2) Businesses to detect security incidents, resist malicious, deceptive, fraudulent, or illegal actions and to help prosecute those responsible for those actions.
(3) Businesses to ensure the physical safety of natural persons.
AB-2013 also swaps out the term “developer” for “business” to highlight that these requirements apply to open-source developers as well as for-profit companies.
On the relevant LinkedIn thread, someone asked whether this exception would cover threat intelligence or a Security Information and Event Management (SIEM) tools leveraging generative AI models.
My answer is: yes, these systems would be covered by the exception.
Another question I got on the LinkedIn thread was why aircraft systems were excepted from this requirement, given their safety criticality.
I have no idea why. This exception doesn’t make sense to me.
AB-2013 uses the definition of “Personal information” as laid out in subdivision (v) of Section 1798.140 of California code, which means:
information that identifies, relates to, describes, is reasonably capable of being associated with, or could reasonably be linked, directly or indirectly, with a particular consumer or household. Personal information includes, but is not limited to, the following if it identifies, relates to, describes, is reasonably capable of being associated with, or could be reasonably linked, directly or indirectly, with a particular consumer or household:
(A) Identifiers such as a real name, alias, postal address, unique personal identifier, online identifier, Internet Protocol address, email address, account name, social security number, driver's license number, passport number, or other similar identifiers.
(B) Any personal information described in subdivision (e) of Section 1798.80.
(C) Characteristics of protected classifications under California or federal law.
(D) Commercial information, including records of personal property, products or services purchased, obtained, or considered, or other purchasing or consuming histories or tendencies.
(E) Biometric information.
(F) Internet or other electronic network activity information, including, but not limited to, browsing history, search history, and information regarding a consumer's interaction with an internet website application, or advertisement.
(G) Geolocation data.
(H) Audio, electronic, visual, thermal, olfactory, or similar information.
(I) Professional or employment-related information.
(J) Education information, defined as information that is not publicly available personally identifiable information as defined in the Family Educational Rights and Privacy Act (20 U.S.C. Sec. 1232g; 34 C.F.R. Part 99).
(K) Inferences drawn from any of the information identified in this subdivision to create a profile about a consumer reflecting the consumer's preferences, characteristics, psychological trends, predispositions, behavior, attitudes, intelligence, abilities, and aptitudes.
The bill borrows the definition of “aggregate consumer information” from subdivision (b) of Section 1798.140 of California code, which is defined as:
information that relates to a group or category of consumers, from which individual consumer identities have been removed, that is not linked or reasonably linkable to any consumer or household, including via a device. “Aggregate consumer information” does not mean one or more individual consumer records that have been deidentified.