OpenAI sharpens Codex Security as AI code review race shifts to validation

OpenAI has introduced Codex Security, a research-preview application security agent built to find complex vulnerabilities, validate them and propose fixes inside real repositories.

In its launch post, the company said the tool was rolling out in Codex web to ChatGPT Pro, Enterprise, Business and Edu users, positioning it as a way to reduce noise and triage burden for security teams.

In a follow-up post, OpenAI claims Codex Security is not designed to begin with a static application security testing report.

Instead, the company said the system starts with repository architecture, trust boundaries and intended behavior, then validates whether apparent defenses actually hold before surfacing issues to humans.

Where GitHub and Anthropic are moving at the same time

GitHub said on March 5 that Copilot code review usage had grown 10x since launch and now accounts for more than one in five code reviews on GitHub.

In a separate post the same day, the company said the feature had moved to an agentic tool-calling architecture that gathers broader repository context to improve comment quality and reduce noise.

Anthropic is also moving into the same workflow. It’s Claude Code documentation says automated security reviews can run on demand or through GitHub Actions on pull requests, with findings surfaced for review before merge. Anthropic says those reviews are meant to complement, not replace, existing security practices and manual review.

Why all three vendors are moving toward review at the same time

Taken together, the releases show vendors putting more emphasis on review and validation as AI-generated code volumes rise. OpenAI’s said agents are accelerating software development and making security review a more critical bottleneck.

GitHub’s made a similar case in broader pull request review, framing code review as a growing layer of day-to-day engineering work as AI increases the volume of changes teams must assess.

The developer caution data the market is responding to

The push into review is landing against a backdrop of continued developer caution. Google Cloud’s 2024 DORA report said 39% of respondents had little or no trust in AI-generated code and found that greater AI adoption was associated with lower delivery stability.

Google Cloud’s 2025 DORA announcement said AI adoption showed a positive relationship with throughput and product performance, while continuing to show a negative relationship with delivery stability.

Stack Overflow’s 2025 developer survey found more respondents distrusted the accuracy of AI tools than trusted it, at 46% versus 33%. Veracode’s 2025 GenAI code security report said AI-generated code introduced risky security flaws in 45% of tests.

Where OpenAI is drawing its competitive line

That context helps explain why OpenAI used its post to narrow the claim. The company said SAST tools remain important, but argued that many serious vulnerabilities are not just source-to-sink problems.

Its examples focused instead on failures of constraints, sequencing and interpretation, where code appears to validate or sanitize input but does not preserve the security property the system depends on.

The narrower claim gives OpenAI a more specific position in a market where vendors are increasingly tying AI coding tools to review, validation and security workflows before code is merged.

OpenAI sharpens Codex Security as AI code review race shifts to validation

What businesses must fix before letting AI agents act

Small businesses report AI-related changes to roles and workload

Why Teradata merged its chief information, data and AI officer roles

Michigan secures Big Tech pledge on data-center grid costs

WatchGuard survey finds password reuse and shadow AI among SMB employees

AWS gives cloud customers annual water-withdrawal estimates