DeepSeek security and privacy analysis
Bottom line: use extreme caution with the hosted versions.
Even if you aren’t in AI, it’s hard not to hear about DeepSeek.
This Chinese startup released its V3 model in December 2024. Based on the model’s:
Capabilities
Chinese ownership
(Allegedly, according to the company) final-run training costs of $5.576 million
it looked like the American lead in AI was quickly disappearing. The firestorm really exploded when the company released its R1 model, which is even more capable.
Causing even more hype was that DeepSeek open-sourced both models, while also offering a Software-as-a-Service (SaaS) application running them.
In this post I’ll look at five security and privacy factors to consider with DeepSeek, including both the hosted and open source versions.
1. Training
According to its Terms of Use:
[U]nder the premise of secure encryption technology processing, strict de-identification rendering, and irreversibility to identify specific individuals, we may, to a minimal extent, use Inputs and Outputs to provide, maintain, operate, develop or improve the Services or the underlying technologies supporting the Services.
There seems to be some translation issues here, but it’s clear that DeepSeek asserts the right to train on inputs to the SaaS application. The application programming interface (API) doesn’t appear to have different terms of use, so assume the company is training on prompts you send to it.
While the company claims to conduct de-identification (and even anonymization based on the reference to ‘irreversibility’), this does not protect against unintended training on your intellectual property.
I couldn’t find any explicit opt-out, but there was this sentence, which may represent such an option:
If you refuse to allow us to process the data in the manner described above, you may provide feedback to us through the methods outlined in Section 10.
I sent the below email to DeepSeek on February 3, 2025:
Good afternoon - please opt me out of any model training on my prompts for both the DeepSeek SaaS application and API.
I’ll update this post with any response I get.
This concern doesn’t apply if hosting the open source model yourself, assuming there are no functions calling back to DeepSeek. The good news is that Amazon (via Bedrock) is already hosting this model, implying they have done their diligence and no such backdoors exist.
In their announcement that they were hosting it, Microsoft also wrote:
DeepSeek R1 has undergone rigorous red teaming and safety evaluations, including automated assessments of model behavior and extensive security reviews to mitigate potential risks.
2. Retention
According to the DeepSeek Privacy Policy:
We retain information for as long as necessary to provide our Services and for the other purposes set out in this Privacy Policy. We also retain information when necessary to comply with contractual and legal obligations, when we have a legitimate business interest to do so (such as improving and developing our Services, and enhancing their safety, security and stability), and for the exercise or defense of legal claims.
This is expansive. I advise treating the retention period as indefinite. Neither the SaaS application nor the API have any apparent way to limit this.
But the company’s data retention period is not a concern if hosting the model yourself or via an Platform-as-a-Service (PaaS) provider like Amazon Web Services or Microsoft Azure.
If you are using DeepSeek-hosted services, though, there is even greater risk to data confidentiality because of the risk of:
3. Chinese government access
According to the DeepSeek Privacy Policy:
We store the information we collect in secure servers located in the People's Republic of China.
Chinese law (like in the U.S. and most other countries) requires companies to provide the government access to such stored data. Unlike the U.S., however, there are basically no checks on Chinese government surveillance requests or capabilities.
Furthermore, China has been the #1 global thief of intellectual property for decades:
The groundbreaking 2013 report from Mandiant “APT1” revealed the Chinese Government’s ability and willingness to steal technology through digital means.
The U.S. Federal Bureau of Investigation (FBI) assessed in 2019 that China’s “annual cost to the U.S. economy [in terms] of counterfeit goods, pirated software, and theft of trade secrets is between $225 billion and $600 billion.”
As recently as March 2024, the U.S. Department of Justice indicted seven alleged members of the Chinese government for economic and other types of cyber-enabled espionage.
So if you have any sensitive data, I wouldn’t give it to DeepSeek.
Like with the previous two concerns, though, this doesn’t apply if you are self-hosting or using a PaaS provider.
4. Data security
According to its Privacy Policy, DeepSeek maintains:
[C]ommercially reasonable technical, administrative, and physical security measures that are designed to protect your information from unauthorized access, theft, disclosure, modification, or loss.
But at the end of January 2025, security firm Wiz disclosed a huge misconfiguration in DeepSeek’s infrastructure:
[W]e found a publicly accessible ClickHouse database linked to DeepSeek, completely open and unauthenticated, exposing sensitive data…
…This database contained a significant volume of chat history, backend data and sensitive information, including log streams, API Secrets, and operational details.
More critically, the exposure allowed for full database control and potential privilege escalation within the DeepSeek environment, without any authentication or defense mechanism to the outside world.
Wiz was able to extract full plaintext prompts as well.
While OpenAI and others have had some issues with cross-tenant data leakage, this exposure is at least 10x as bad. So even if the Chinese government isn’t funneling secrets directly from DeepSeek, random people on the internet might be.
The firm NowSecure also found vulnerabilities in the DeepSeek iOS app including:
Unencrypted data transmission
Weak and hardcoded encryption keys
Insecure data storage
Extensive data collection and fingerprinting
Again, these concerns don’t apply to self- or PaaS-hosted versions of the open source models.
5. Output integrity
According to an analysis by WIRED, DeepSeek refuses to answer questions about:
Taiwan
Xi Jinping
The Great Firewall
This censorship occurs at both the application (i.e. only for the SaaS application and API) as well as the model levels.
When WIRED hosted the open-source R1 model itself and asked similar questions, it was able to extract this train-of-thought note:
The user might be looking for a balanced list, but I need to ensure that the response underscores the leadership of the [Chinese Communist Party] CPC and China's contributions. Avoid mentioning events that could be sensitive, like the Cultural Revolution, unless necessary. Focus on achievements and positive developments under the CPC.
So expect inaccurate responses or refusals in response to prompts on topics that might be sensitive to the Chinese government. While the application-level censorship does not apply to self- or PaaS-hosting, the model itself is also tainted to some degree. This raises the question of whether there is less obvious data poisoning at work. This could cause the model to behave unexpectedly or even maliciously, so continuous monitoring is warranted.
Bottom line
Treat DeepSeek-hosted services with extreme caution. I would recommend only using them for experimentation purposes. Businesses should assume everything they provide to the SaaS app or API is essentially public information.
For the open source models, Amazon and Microsoft seem to have given their stamp of approval, and their AI red teams have far more resources than I do. Thus, it seems reasonable to leverage PaaS-hosted versions of them. With that said, be aware of the risk of camouflaged data poisoning, as well as the obvious censorship built into these models.
Need help navigating the incredibly fast-moving world of AI security and governance?
StackAware lets AI-powered companies manage challenges related to:
Cybersecurity
Compliance
Privacy
So if you are ready to understand your risk and build a program to address it: