OpenAI outlines controls to prevent cyber attackers exploiting its AI models

OpenAI says its next generation of AI models could reach “High” levels of cybersecurity capability, raising the risk they could help attackers as well as defenders, according to a security blog post.

The company bases its research on rapid gains in capture-the-flag benchmarks, where performance jumped from 27% on GPT-5 to 76% on a later GPT-5.1-Codex-Max model in a few months.

Under its Preparedness Framework, “High” cyber capability means a model could either develop working zero-day remote exploits against well-defended systems or meaningfully assist complex, stealthy intrusions into enterprise or industrial networks. OpenAI says it is planning and testing future systems as if each new model could reach that level, and is building safeguards before release.

To manage the risk, OpenAI outlines a layered “defense-in-depth” approach. That includes access controls, hardened infrastructure, egress controls and monitoring, plus detection systems and red-teaming designed to spot and block malicious cyber use.

At the same time, the company says it is training models to support defenders with code auditing, vulnerability discovery and patching workflows, aiming to give security teams “significant advantages” in day-to-day operations.

OpenAI will also launch a trusted access program that offers tiered access to enhanced capabilities for vetted customers working on cyber defense, and has put its Aardvark “agentic security researcher” into private beta to scan codebases and propose patches. In parallel, it is creating a Frontier Risk Council of experienced cyber defenders to advise on where to draw the line between useful capability and misuse.

The entire development has twofold implication potential: frontier models are becoming powerful enough to matter in real intrusion campaigns, but vendors are starting to expose explicit governance structures and access tiers that can be folded into enterprise threat models and control frameworks.