ISO 42001 vs. HITRUST AI security certification: data management requirements
What two emerging AI standards demand for data governance.
AI doesn’t exist without data.
Which makes tracking, classifying, and organizing it so key for AI governance and security.
Two relatively new standards help organizations tackle this problem:
HITRUST’s AI Security Assessment with Certification
ISO/IEC 42001:2023
I compared the two standards in a previous post, but will go deep on their data management requirements specifically in this post.
And if you are wondering why I am qualified to write on this topic, StackAware is itself ISO 42001 certified and recently became a HITRUST Readiness Licensee.
What HITRUST requires:
The standard’s Baseline Unique IDs (BUID) 07.07aAISecOrganizational.4-5 require, for data sources used:
to train, fine-tune, test, and validate AI models
in retrieval-augmented generation (RAG)
Organizations must:
Maintain a catalog of trusted data sources
Inventory data used, including at least:
Provenance
Sensitivity
For ISO 42001, organizations don’t need to do any of these. But these (optional) Annex A controls require tracking:
A.4.3: Data resources
This includes information about:
Retention
Intended use
Update/modification
Quality (duplicating A.7.4 in my opinion)
Provenance (duplicative as well, of A.7.3 and A.7.5)
of “data resources utilized for the AI system.” I’ll note it doesn’t specifically say “training,” so this can include data for AI processing.
A.7.2: Data for development/enhancement of AI systems
This requirement focuses on data’s:
Privacy and security implications (duplicates A.7.3)
Potential security and safety threats
Accuracy/integrity (duplicates A.7.4)
Transparency and explainability
Representativeness
A.7.3: Acquisition of data
A broad control you can summarize as “data governance.” It requires noting:
Sources
Categories
Quantities
Demographics/biases
Data rights/ownership
Privacy and security requirements
Check out StackAware’s data classification and tagging practices for examples of how we do it.
A.7.4: Quality of data
ISO/IEC 25024:2015 defines data quality as the degree to which the data’s
characteristics satisfy
stated and implied needs
when used under specified conditions.
A.7.5: Data provenance
This is information about data’s:
update
creation
validation
abstraction
transcription
transfer of control
A.7.6: Data preparation
This control requires documenting granular steps in the model training process like:
Encoding
Data cleaning
Normalization
The verdict? ISO 42001's Annex A controls have much heavier demands for data management
This makes sense because 42001 is an AI governance standard, while HITRUST’s certification is a security-focused one. There is understandably a lot of overlap, though.
Are you considering ISO 42001 or HITRUST certification (or both)?
StackAware helps AI-powered companies achieve these to build trust with customers, regulators, and other stakeholders.