Related ToolsChatgptClaudeCocounsel

GPT-5.4 System Card: First OpenAI Model With Cyber Mitigations Built In

GPT-5.4 System Card: First OpenAI Model With Cyber Mitigations Built In
Image: OpenAI Blog

What Happened

Alongside the GPT-5.4 launch on March 5, 2026, OpenAI published the GPT-5.4 Thinking System Card detailing safety evaluations and mitigations for its new reasoning model.

The headline from the card: GPT-5.4 Thinking is OpenAI's first general-purpose model with implemented mitigations for high cybersecurity capability. Previous models flagged cyber capabilities as a risk area, but GPT-5.4 is the first where OpenAI actively built in safeguards. The approach extends what they developed for GPT-5.3 Codex (their coding-focused model from February 2026), now applied to both the ChatGPT and API versions.

On standard safety evaluations, GPT-5.4 Thinking performs roughly on par with GPT-5.2 Thinking, with two notable exceptions. The model shows statistically significant improvements on evals for illicit non-violent activity and self-harm content. Across all dynamic mental health evaluations, GPT-5.4 Thinking outperforms every previous model in the series.

The system card also addresses chain-of-thought controllability, testing GPT-5.4 Thinking with the CoT-Control evaluation suite released the same day in a separate paper. Results: GPT-5.4 Thinking's CoT controllability is low, consistent with GPT-5 Thinking and GPT-5.2 Thinking. The model can't effectively manipulate its own reasoning traces, which OpenAI frames as a positive safety property.

Why It Matters

The cybersecurity mitigation piece is significant for anyone building with GPT-5.4 through the API. As models get better at coding and computer use (GPT-5.4 scores 75% on OSWorld, above human performance), their potential for misuse in cyber operations increases proportionally. OpenAI acknowledging this by building active mitigations - rather than just flagging it in a risk assessment - shows the safety approach is maturing.

The mental health evaluation improvements matter for consumer-facing applications. ChatGPT handles millions of conversations daily, and some users discuss sensitive personal topics. Better performance on dynamic mental health evals means the model handles these conversations more carefully - avoiding harmful responses while still being useful.

For developers integrating GPT-5.4 into products, the system card is your reference for what's been tested and what guardrails exist. If your application touches sensitive domains like healthcare, legal, or financial advice, the system card tells you where OpenAI has and hasn't validated the model's behavior.

Our Take

System cards are one of those things that nobody reads but everybody should. OpenAI has been publishing them since GPT-4, and they've gotten more substantive with each generation.

The move from "we assessed cybersecurity risk" to "we built mitigations for cybersecurity risk" in GPT-5.4 reflects a broader shift in the industry. When your model can operate a computer autonomously and write code that scores 80.8% on SWE-Bench, pretending it doesn't have offensive cyber capabilities isn't credible. Building guardrails is the responsible path, and OpenAI is being transparent about taking it.

The mental health improvements are worth noting because they're the kind of safety work that doesn't generate headlines but directly affects real users. Someone using ChatGPT at 2 AM during a difficult moment benefits from a model that handles those conversations better, even if it never makes a benchmark chart.

What's missing from the card is detailed evaluation of computer use safety - how the model handles being asked to perform harmful actions through its new desktop navigation capabilities. Given that computer use is the marquee feature of GPT-5.4, the safety evaluation of that specific capability deserves deeper treatment than what's in this initial card. Expect follow-up publications as the feature rolls out more broadly.