Data classification mastery: 5 simple categories for handling sensitive information
Start AI governance the right way with an actionable data governance system.
AI makes companies even hungrier for data.
This makes cybersecurity and privacy even more important.
A key component to any effective AI governance program is a data classification policy and set of procedures. If you can’t identify what your data means and how sensitive it is, you will have a tough time protecting it.
Most organizations use ascending levels of data sensitivity, but shouldn’t
Unfortunately, most of the systems I have seen leave something to be desired. For example, I often see ascending levels of sensitivity like:
Each level is increasingly valuable to the organization and thus many companies create this graduated series of data classifications. Additionally, I will also see other guidance implicitly create additional categories by referring to “sensitive data,” even though it is never defined in the data classification policy
Unfortunately, having this ascending (and somewhat unclear) structure is usually unnecessary and counterproductive.
That is because organizations rarely create separate systems or handling processes for these different classifications.
How StackAware labels data
We use the following top-level classifications:
These are different not in terms of the value of the data they describe but rather in terms of the handling procedures for the information. Additionally, these terms can be further sub-divided into more granular categories. And “Restricted” is a catch-all term that includes everything which is not Public.
Self-explanatory. This is anything that can be posted on the open internet without restriction. At StackAware, we take the additional step of trying affirmatively to publish anything classified as “Public.” That’s because, if there is no risk in getting out there, then it might as well serve as marketing collateral.
Building in public is part of our competitive advantage.
This is information belonging to us not currently meant to be public, but which can be disclosed unilaterally with the approval of the data owner, without further coordination outside of our company.
Below are some examples. If you were especially concerned about compartmenting data, you could create nested sub-categories using some or all of the below to restrict dissemination even further:
This is information belonging to another organization not meant to be public, which our company is bound by confidentiality requirements to protect and cannot release without external coordination.
While there should be an internally designated data owner, that person cannot release the information in question without permission from the relevant external party.
I am not an attorney and this is not legal advice.
There are a variety of regulations governing the use of data which can identify natural persons (i.e. human beings), such as:
Personal information (PI), defined by the California Consumer Privacy Act (CCPA).
Personally identifiable information (PII), defined by various U.S. state laws and federal regulations.
Personal data, defined by the European Union (EU) General Data Protection Regulation (GDPR).
The GDPR is the most restrictive and expansive of all of these categories. Thus, we use the blanket term of Personal Data for everything that falls under this rubric. Anecdotally, I know that many other organizations simply build their data privacy programs around GDPR because it is the most stringent standard and they don’t need to worry about complying with a variety of different rules.
Different jurisdictions, however, have specialized requirements for handling different types of data, so it might make sense to create nested categories under the general category of Personal Data. Additionally, if you are a “covered entity” under the U.S. Health Insurance Portability and Accountability Act (HIPAA), you might consider creating a sub (or entirely separate) category for protected health information (PHI). PHI has its own handling requirements mandated by HIPAA.
This is simply a way of collectively describing the aforementioned three types of Confidential information. For policy and procedure purposes, it can be helpful to have a single unifying term to describe all categories of data that can not be processed in a certain way (e.g. using uncertified systems per StackAware’s AI security policy).
Don’t make things more complex than they need to be
StackAware’s system is roughly the same level of complexity as most of the data classification systems that I have seen in my career, but far more actionable.
The key is to make things only as complicated as they need to be, but no more.
Otherwise, people just give up trying to follow the system at all.
The good news is, StackAware can tailor your data classification (and entire AI governance program) to your specific needs.
Interested in learning more?
StackAware is not a publicly-traded company so has more latitude to disclose than if it were. For public companies, where such information is heavily regulated by the U.S. Securities and Exchange Commission (SEC), it might make sense to create a sub-category specifically for material non-public information (MNPI). Again, this is not legal advice. Consult competent counsel.
Secrets refer to anything that are not - by themselves - sensitive, but which directly grant access to sensitive data. Thus, they require protection at the same level as they data which they guard. Examples include:
Application program interface (API) keys
Physical security codes