Three months ago, Mercor hit a $10 billion valuation. Now the AI recruiting startup is confirming a major data breach that exposed what hackers claim is four terabytes of internal data, including source code, database records, and datasets used by clients like OpenAI, Anthropic, and Meta.
The breach didn't come through Mercor's own systems directly. It came through LiteLLM, an open-source library widely used by AI developers to route API calls between different language models. A hacking group called TeamPCP planted malicious code in LiteLLM designed to harvest credentials and propagate across any company using the library. The compromised code was discovered and removed within hours, but for Mercor, the damage was already done.
What Got Exposed
Mercor's business is recruiting human experts who generate training data for major AI companies. That means the breach potentially touches some of the most sensitive information in the AI industry.
According to reports, the compromised data includes datasets used by Mercor's customers for AI training, information about those customers' projects (projects that companies like OpenAI and Meta typically keep under strict confidentiality), internal Slack communications, ticketing system data, and videos of conversations between Mercor's AI systems and human contractors.
The hacking group Lapsus$ claimed responsibility and said they obtained approximately four terabytes of data. Mercor spokesperson Heidi Hagberg responded that the company was "one of thousands of companies" affected by the LiteLLM vulnerability and "moved promptly" to contain the incident. A third-party forensics investigation is underway.
The Supply Chain Problem
This is a supply chain attack, which means the malicious code wasn't in Mercor's own codebase. It was hidden inside a dependency (a third-party software package) that Mercor's developers pulled into their stack. Supply chain attacks are particularly dangerous because even companies with strong internal security practices can be compromised through the tools they rely on.
LiteLLM is popular specifically because it simplifies working with multiple AI providers. That same popularity makes it a high-value target. One compromised package update can reach thousands of companies simultaneously.
For AI companies specifically, this type of attack is a worst-case scenario. Training data, model configurations, and client project details are the crown jewels of the industry. Mercor sits at an intersection where all of that data flows through a single point.
Bigger Questions for the AI Supply Chain
The "one of thousands" framing from Mercor is technically accurate but undersells the story. Yes, the LiteLLM vulnerability affected many companies. But Mercor's specific position as a middleman between human data workers and the biggest AI labs means their breach has outsized implications. If training datasets were exposed, the downstream effects could touch models that millions of people use daily.
This also raises questions about how AI companies vet their open-source dependencies. LiteLLM has over 20,000 stars on GitHub and is treated as trusted infrastructure. The malicious code was caught quickly, but "quickly" still meant some companies were compromised. There's no easy fix here - modern software development depends on open-source libraries, and auditing every update to every dependency is practically impossible at scale.
What companies can do is limit blast radius: segment networks, encrypt data at rest, and avoid centralizing sensitive data from multiple clients in systems that share common dependencies. Whether Mercor did any of that is exactly what the forensics investigation should answer.