Therapy session transcripts are being used to train AI models. This is happening through a chain of data licensing arrangements that patients didn't meaningfully consent to, and the mechanics of how it works are worth understanding if you use any AI-powered mental health tool.
How Session Data Reaches Training Sets
Mental health apps, telehealth platforms, and AI-powered therapy tools collect session transcripts as a core part of their product. The terms of service users click past often include language permitting the company to use "anonymized" or "de-identified" data to improve products and services. That same data can be sublicensed to third parties. "Improve products and services" is broad enough to cover training machine learning models - the neural networks that power AI chatbots and assistants.
The anonymization step is less protective than it sounds. Therapy transcripts are often highly identifying. They contain specific details about a person's relationships, workplace, medical history, and location. Research has shown that re-identifying supposedly anonymized health data is feasible with relatively little external information. A transcript that doesn't contain your name can still describe you precisely enough to identify you.
The scale matters. Mental health and telehealth platforms have collectively processed millions of sessions over the past five years. If a fraction of that data flows into AI training pipelines, it represents an enormous body of intimate human experience being fed to models.
What U.S. Law Actually Protects
HIPAA, the main U.S. health privacy law, applies to healthcare providers and their business associates. It has a safe harbor for "de-identified" data - once data meets certain anonymization criteria, it falls outside HIPAA's protections and can be shared freely. Mental health app companies that aren't classified as covered healthcare providers may fall outside HIPAA entirely.
The result: a significant amount of mental health data is legally available for AI training, even if patients would object if they knew about it. The FTC has moved against some companies for deceptive data practices - BetterHelp paid $7.8 million in 2023 to settle charges that it shared user data with Facebook and Snapchat for ad targeting - but the specific question of using session data for model training remains largely unaddressed.
The EU's General Data Protection Regulation (GDPR) imposes stricter requirements on sensitive personal data including health records, requiring affirmative consent rather than buried opt-outs. But most AI development and most mental health app usage is concentrated in the U.S., where the framework is weaker.
The Practical Situation for Users
Reading a 40-page terms of service document before every mental health app download isn't realistic. And even if a user does read it, refusing to consent typically means not using the service.
The realistic options are limited. Check the privacy policy specifically for clauses about using data to train AI or third-party model development. Prefer services that explicitly commit to not using session data for training. Be more guarded about detail with any AI-assisted mental health tool than you would be with a licensed therapist operating under professional confidentiality rules.
For the AI industry broadly, this is a test of whether consent infrastructure catches up with data appetite. The companies building large language models need vast amounts of human-generated text. Therapy sessions are a uniquely detailed dataset about human psychology and language. The incentive to use that data exists, and the legal barriers are lower than most users assume.