Related ToolsChatgptClaude

Stanford Study: Major AI Platforms Train on Your Chats by Default

AI news: Stanford Study: Major AI Platforms Train on Your Chats by Default

200 million people log into ChatGPT every week. Most of them have no idea their conversations are being used to train the next version of the model.

A Stanford University study that examined the privacy policies of six major AI platforms found a consistent pattern: these tools use your chat input for training by default, and half of them store conversations on company servers indefinitely.

The study looked at platforms including OpenAI's ChatGPT, Google's Gemini, Microsoft's Copilot, and Amazon's Nova AI. Its findings, recently analyzed in a Nextcloud blog post, are straightforward but worth stating plainly: unless you've manually changed your settings, your AI conversations are probably feeding a training pipeline.

Human Reviewers Are Reading Your Chats

The Stanford researchers found that human employees at these companies read chat transcripts as part of the model improvement process. Opt-out mechanisms exist but are buried in settings pages most users never visit. Documentation about whether personal information gets stripped from training data ranges from vague to nonexistent.

This matters because people stopped treating AI chatbots as search engines a long time ago. They paste in client proposals, financial data, medical symptoms, and proprietary code. That data doesn't stay between you and the chatbot - it can feed into the training pipeline and influence responses for millions of other users.

The real-world consequences are already visible. Microsoft's Copilot once hallucinated that a journalist was a convicted child molester. An AI recruiting tool automatically rejected job applicants based on age, resulting in a $356,000 discrimination settlement. A Brookings Institution study found that AI credit scoring systems reproduce racial disparities baked into historical data. These aren't theoretical concerns. They're what happens when systems train on massive, poorly controlled data pools.

How to Opt Out

Every major platform offers some form of training data opt-out, but the defaults work against you:

  • ChatGPT: Settings > Data Controls > turn off "Improve the model for everyone"
  • Claude: Paid plans don't use conversations for training by default. Free-tier usage may be used, with opt-out available.
  • Gemini: Pause Gemini Apps Activity in your Google account settings
  • API access: Data sent through APIs is typically excluded from training across all major platforms - unlike the consumer chat interfaces

Nextcloud, which published the analysis, has an obvious commercial interest here - they sell self-hosted alternatives to cloud-based AI. But the Stanford study they cite is independent research, and its core finding is hard to argue with: the default setting at most AI companies is to use your data, and most users never change the default.

The simplest protection is also the most obvious: treat every AI chat like a public conversation. If you wouldn't want it in a training dataset, don't paste it into a chatbot.