Can Your Cloud Provider Train AI on Your Files?

Q: Can my cloud provider train AI on my files?

If the provider holds your encryption keys it can read your files, and anything it can read it can in principle feed to a model. Most large providers say they do not train their models on your stored files today, but that is a policy they can revise. A zero-knowledge provider cannot, because it only ever holds ciphertext. The protective question is not "do they?" but "could they, if they decided to?" This reflects each provider's public terms as of 2026-06-24.

If you have worried that the files you upload to a cloud service might end up training someone's AI, the honest answer is more unsettling than a simple yes or no.

Most of the big providers will tell you, truthfully, that they do not train AI models on the files you store. But almost all of them can. They hold the keys to your data, they can read it, and "we do not" is a policy they wrote and can rewrite. The only thing standing between your files and an AI model is a promise. And as the last two years have shown, those promises change, quietly, and often right after a product launch or an acquisition.

This page is about that gap: between what a provider says it does today, and what its architecture allows it to do tomorrow. Every claim below is dated and linked to the provider's own current terms, because the entire point is that those terms keep moving.

The short version

Cloud storage falls into three tiers. Only the third removes the company's ability to read your files at all.

Provider	Can they read your files?	Their stance on AI and your content	The protection is
Google Drive	Yes. Google holds the keys.	Says Workspace content is not used to train its models without permission. Consumer Gemini activity can be used to train its models when the Keep Activity setting is on.	A policy promise
Microsoft OneDrive	Yes. Microsoft holds the keys.	Says Microsoft 365 tenant content is not used to train its foundation models. Some consumer Copilot, Bing, and MSN data is used for AI training unless you opt out.	A policy promise
Dropbox	Yes. Dropbox holds the keys.	Pledges not to build generative AI on your content without consent, but does train its own non-generative models on your documents and metadata to power features.	A policy promise
pCloud	Yes, by default.	Client-side encryption is a paid add-on limited to a Crypto folder. Everything outside it, pCloud can read. The software is closed source.	A promise. Architecture only inside the paid folder.
Apple iCloud	Only if you do not turn on Advanced Data Protection, which is off by default.	Processes on device and via Private Cloud Compute. Says it does not use your personal data to train its foundation models.	Architecture if you enable ADP. Otherwise a promise.
MEGA	By design, no. See the 2022 caveat below.	Zero-knowledge by default, so it says it cannot scan or train on your files.	Architecture, with a documented and patched caveat.
Proton Drive	No.	Zero-knowledge by default. States it cannot scan files for ads or AI training.	Architecture, by default.
ShieldFive	No. Zero-knowledge by design.	Cannot train AI on your files, because it cannot read them. Not by policy, by math.	Architecture, by default.

The real question

The debate is usually framed as "does this company train AI on my files?" That is the wrong question, because the answer is almost always "not in the way you fear, not right now," and "not right now" is worthless if the company can change its mind.

The question that actually protects you is: could they, if they decided to?

If a provider holds your encryption keys, it can read your files. If it can read them, it can feed them to a model, today's policy notwithstanding.
A new product, an acquisition, or a quiet terms-of-service update can flip "we do not" to "we do," often with nothing more than an email you will skim and accept.
Zero-knowledge encryption removes the company's ability to read your files at all. There is no promise to break, because the capability was never there.

That is the line this comparison draws: promise versus architecture.

The receipts

Tier 1. They can read your files, so you are trusting a promise

Google Drive. Google states clearly that it does not use your Workspace content to train Gemini or other models outside Workspace "without permission" (Google Workspace). But that commitment is scoped to organizational Workspace use, and Google holds your keys: standard Drive is encrypted in transit and at rest with Google-managed keys, not end-to-end, so Google can access file contents. The only way to make a file unreadable to Google is Workspace Client-side Encryption, which is opt-in, administrator-enabled, and unavailable to consumer accounts (Google Drive Help). On the consumer side, when your Keep Activity setting is on (formerly Gemini Apps Activity), Google says it uses your activity "to provide, develop, and improve its services (including training generative AI models)," some chats are read by human reviewers and retained for up to three years even if you delete your activity, and Google warns: "Please do not enter confidential information that you would not want a reviewer to see" (Gemini Apps Privacy Hub). Verdict: promise.

Microsoft OneDrive. Microsoft states that for commercial Microsoft 365 customers, "prompts, responses, and data accessed through Microsoft Graph are not used to train foundation LLMs," and Graph covers tenant content including OneDrive and SharePoint files, email, calendar, and chats (Microsoft 365 Copilot privacy). On the consumer side it is the other way around: Microsoft says it "uses data from Bing, MSN, Copilot, and interactions with ads on Microsoft for AI training," which is on by default for personal accounts unless you turn off the training toggles (Privacy FAQ for Microsoft Copilot). OneDrive is encrypted at rest with per-file AES 256-bit keys, but Microsoft holds those keys; there is no user-only end-to-end mode, and Copilot is explicitly built to reason over your files, email, and chats (Data encryption in OneDrive and SharePoint). Verdict: promise.

Dropbox. After a December 2023 backlash over a third-party AI setting, Dropbox said third-party AI is used only when you actively engage an AI feature, that nothing passively sends your data to a third party, and that files passed to its partner OpenAI are deleted within 30 days and not used to train OpenAI's models (The Register, Dec 2023). That is one track. The other: Dropbox trains its own machine-learning models on your content. Its Privacy FAQ states that "these models may be trained on your documents and metadata, and power features within Dropbox such as improved search relevance, auto-sorting and organization features, and document summaries" (Dropbox Privacy FAQ). Its AI Principles pledge not to "build generative AI models using customer content without consent," a commitment scoped to generative AI that does not cover the non-generative training above (Dropbox AI Principles). Dropbox encrypts your files but holds the keys and does not offer user-held end-to-end encryption (How Dropbox keeps your files secure). Verdict: promise, and it already trains its own models on your files for features.

pCloud. pCloud markets itself as privacy-friendly, but its end-to-end encryption, branded pCloud Encryption or Crypto, is a paid, opt-in add-on that only applies to files you move into a special Crypto folder (pCloud Encryption). Everything in your regular storage is encrypted only in transit and at rest with keys pCloud holds, which means pCloud can technically read it (pCloud Security). The production software is also closed source, so the encryption cannot be fully independently verified, although pCloud did publish the source of its Crypto mobile client for a 2015 to 2016 "Crypto Challenge" contest. Verdict: promise by default. Architecture only inside a paid folder.

Tier 2. Architecture, but only if you opt in

Apple iCloud. Apple is the strongest of the mainstream players. With Advanced Data Protection turned on, Apple says 25 iCloud data categories are end-to-end encrypted, adding iCloud Backup, iCloud Drive, Photos, Notes, and more, and that "no one else can access your end-to-end encrypted data, not even Apple" (iCloud data security overview). The catch is that ADP is opt-in and off by default. Without it, 15 categories plus iMessage and FaceTime are already end-to-end encrypted, but for the rest Apple can decrypt your data on its behalf, and iCloud Mail, Contacts, and Calendars are never end-to-end encrypted even with ADP. The feature has drawn government pressure: the UK ordered access under the Investigatory Powers Act, Apple withdrew ADP for UK users in February 2025, and as of June 2026 Apple's support pages still list ADP as unavailable to UK users (Apple Support). Separately, Apple says Apple Intelligence runs on device or via Private Cloud Compute, where "your data is never made accessible to Apple," and that "we do not use our users' private personal data or their interactions when training our foundation models" (Apple privacy features). Verdict: architecture if you enable ADP. Otherwise a promise.

Tier 3. Zero-knowledge by default, which is architecture

MEGA. MEGA encrypts files on your device before upload, using keys derived from your password that MEGA says never leave your device, and on that basis it states it cannot decrypt or view your files, which means it cannot scan them or train AI on them. Two caveats keep this honest. First, MEGA's client apps are source-available rather than fully open source: the code is on GitHub for review and non-commercial use under MEGA's own Code Review Licence, not an OSI-approved license (MEGA source code). Second, in 2022 cryptographers at ETH Zurich documented flaws (the "MEGA: Malleable Encryption Goes Awry" research, later presented at IEEE S&P 2023) that, in theory, would let an entity controlling MEGA's own server infrastructure recover a user's private key after a number of logins and decrypt their files. MEGA patched the disclosed attacks within the same month, June 2022, and said no accounts were known to be compromised (MEGA security update); the researchers acknowledged the fix blocked their proof of concept but called it a narrower countermeasure than the redesign they recommended (The Register). MEGA remains far more private than the mainstream clouds, but the episode is a reminder that a zero-knowledge claim is only as strong as the cryptographic design behind it, which is why independent verification matters. Verdict: architecture, with a documented and patched caveat.

Proton Drive. Proton Drive is end-to-end, zero-access encrypted by default on every plan, including the free tier, and encrypts file names and metadata as well as contents; its clients are open source and it has been independently audited by Securitum, with the web app audited in 2021 and the mobile apps in 2022 (Proton Drive security). Proton states that it "never scans or uses the contents of your files for AI or any other purpose" (Proton Drive for business), and its own AI features do not train on your content: Proton Docs says your work "will not ever be used to train AI models" (Proton Docs). Swiss-based, with more than 100 million accounts. Verdict: architecture, by default.

The pattern: privacy promises have a short half-life

The reason "we do not train on your data" should never be enough on its own: the track record of that promise is mixed, and the terms keep moving in both directions.

Adobe faced a revolt from creative professionals in 2024 over terms that appeared to claim rights to their work, and clarified that customer content "will never be used to train any generative AI tool," with an exception for content submitted to Adobe Stock (Adobe). That is the direction users want.
Meta, Google, and Snap moved the other way, opening up training on users' public content and their interactions with AI assistants. Meta resumed training its AI on adults' public posts in the EU from May 2025, with an objection form (Meta). Google's privacy policy says it uses "publicly available information" and "your interactions with AI models and technologies like Gemini Apps" to train its models (Google). Snap says it may use content "you have posted publicly on Snapchat" to train its generative AI, with an opt-out in settings (Snap). None of these claim to train on your private messages, and each offers an opt-out, but the default moved.
X began using public posts to train Grok after its November 2023 launch, was acquired by xAI in March 2025, and under its Terms of Service effective January 2026 expanded the definition of "Content" to include AI prompts and outputs, with users granting a broad license that includes training its models (X Terms of Service). There is a settings opt-out, private Grok chats are excluded, and EU and EEA public posts were carved out after a 2024 regulator intervention. This is the acquisition risk in one story.
Zoom (2023) and Mozilla/Firefox (2025) both triggered backlashes over terms that appeared to broaden data use for AI. Zoom walked it back within days, eventually stating unconditionally that it "does not use any customer audio, video, chat, screen sharing, attachments, or other communications like customer content to train Zoom's or its third-party artificial intelligence models" (Zoom). Mozilla rewrote its license after the reaction (Mozilla).
The U.S. Federal Trade Commission has warned that quietly rewriting your terms can itself be a problem. In a February 2024 post titled "AI (and other) Companies: Quietly Changing Your Terms of Service Could Be Unfair or Deceptive," the FTC said it "may be unfair or deceptive for a company to adopt more permissive data practices ... and to only inform consumers of this change through a surreptitious, retroactive amendment to its terms of service or privacy policy."

Most of these started as "we would never." A promise is only as durable as the next business decision.

Where ShieldFive fits

ShieldFive belongs to the third tier: zero-knowledge by design. Your files are encrypted in your browser, with keys only you hold, before they ever reach us. We can only ever hold ciphertext, which means we cannot train AI on your files, cannot scan them, cannot hand them to a third party, and cannot be compelled to, even by a court or a future acquirer. There is no setting to get wrong and no promise to break.

We are not the only service that can say that. Proton Drive is a genuine zero-knowledge peer, and so is MEGA with the caveat above, and we would rather tell you that than pretend otherwise. The no-AI, no-scanning property is shared by zero-knowledge storage generally; it is not unique to us. What we focus on is a few specific things:

Post-quantum encryption by default. Every new upload is encrypted with a hybrid suite, ML-KEM-1024 (FIPS 203, the highest NIST post-quantum level) combined with a classical cipher, so a break in either component alone does not expose the file. This is our actual differentiator, not the no-scanning property.
An open crypto core you can check. The encryption library, @shieldfive/crypto, is published under Apache-2.0, and its specification and machine-readable test vectors are public in the source repository, so an independent implementation can reproduce our outputs byte for byte. The library is currently beta and has completed an internal review, not yet an external third-party audit; we would rather say that plainly than imply a certification we have not earned.
EU data residency. Encrypted file data is stored in the EU, under EU and GDPR jurisdiction.

We do not ask you to trust that we will not. We removed our ability to.

Not even we can read your files.

The way to evaluate this is in practice. ShieldFive's free tier is 20 GB with no card required, enough to test the workflow with non-critical files first, upload, share with expiry, revoke, before moving anything that matters.

Frequently asked questions

Can my cloud provider train AI on my files?

If the provider holds your encryption keys, it can read your files, and anything it can read it can in principle feed to a model. Most large providers say they do not train their models on your stored files today, but that is a policy they can revise. A zero-knowledge provider cannot, because it only ever holds ciphertext. The protective question is not "do they?" but "could they, if they decided to?"

Which cloud storage services cannot train AI on my files?

Genuinely zero-knowledge services cannot, because the encryption keys never leave your device and the server only holds ciphertext. Proton Drive and ShieldFive are zero-knowledge by default, and MEGA is too, with a documented 2022 cryptographic caveat that MEGA has since patched. Apple iCloud reaches that bar only if you turn on Advanced Data Protection, which is off by default. Google Drive, OneDrive, Dropbox, and standard pCloud storage all hold your keys.

Does Google Drive or Microsoft OneDrive use my files to train AI?

Both say organizational content is not used to train their foundation models: Google for Workspace, Microsoft for Microsoft 365 tenant data. But both hold your keys and can read your files, and both train on some consumer-surface data, Google through consumer Gemini when Keep Activity is on, Microsoft through Bing, MSN, and consumer Copilot unless you opt out. The protection is a policy, not the architecture.

Does Dropbox train AI on my files?

Dropbox pledges not to build generative AI on your content without consent, and says files sent to its third-party AI partner are not used to train that partner's models. Separately, its own privacy FAQ states that it trains its own non-generative machine-learning models on your documents and metadata to power search, organization, and summaries. Dropbox holds your keys, so the protection is a policy promise.

Does zero-knowledge encryption stop AI training?

Yes, at the level of architecture. In a zero-knowledge system, files are encrypted on your device before upload and the keys never reach the provider, so the server stores only ciphertext. There is nothing readable to scan or train on, and a legal order yields only ciphertext. That is a structural guarantee rather than a policy that can be quietly changed.

Does ShieldFive train AI on my files?

No, and it is not a matter of policy. Files are encrypted in your browser before upload, ShieldFive only ever holds ciphertext, and the keys never reach our servers, so there is nothing for us to read, scan, or train on. The crypto core is open source and its test vectors are public so you can verify the design, encrypted data is stored in the EU, and post-quantum encryption is the default for every new upload.

This comparison reflects each provider's public terms and documentation as of 2026-06-24 and will be updated as the terms change. Cloud terms of service change constantly, which is the entire point of this page. Before relying on any single line here, check the provider's current terms directly via the links. If you spot something out of date, tell us and we will fix it.