It is undeniable now that autonomous systems are capable of identifying and exploiting software vulnerabilities at record speed and scale.
Not only is Anthropic’s Mythos raising concern, with cautions that it will be able to find cyber vulnerabilities at a pace humans may not be able to compete with, but current accessible AI models are already doing so, according to Forescout Technologies.
New research from Forescout, which repeated and expanded on benchmark testing it conducted last year, found that one year ago, 55% of AI models failed basic vulnerability research tasks and 93% failed at exploit development.
However, today, every model tested completed vulnerability research tasks, and half were able to generate working exploits autonomously and without complex prompting.
The most capable models in Forescout’s testing, Claude Opus 4.6 and Kimi K2.5, were able to identify and exploit vulnerabilities with simple prompts, which the researchers say makes these vulnerabilities accessible to inexperienced attackers.
OpenAI’s latest model, GPT 5.3-codex, meanwhile, refused to perform exploitation tasks entirely, reflecting what Forescout described as a tightening of the company’s alignment policies.
A year of rapid gains
Speaking at a cybersecurity roundtable, Rik Ferguson, vice president of security intelligence at Forescout, warned that the industry’s core assumptions are being challenged. “Everything we have ever built in cybersecurity is predicated on the attacker being human,” he said. “That will no longer be the case.”
He said tools that once required specialist expertise can now be replicated cheaply using widely available AI models, lowering the barrier for attackers.
In at least one case, an AI model identified a vulnerability in a piece of code that Forescout’s own human researchers had previously examined and missed.
Using an agentic research framework, Forescout’s team discovered four new zero-day vulnerabilities in OpenNDS, open-source software widely used to implement captive portals in routers and network gateways. The vulnerabilities, which enable remote code execution or denial of service, have been disclosed to the project’s maintainers and are awaiting CVE assignments.
“It is not only equal to human capability, but it’s also demonstrably exceeding it,” he said. The cost of running these tests was less than 70 cents using Chinese AI tool DeepSeek, he said, highlighting how low the barrier to entry has become.
Zero-days in the wild
The scale of AI-driven vulnerability discovery now goes much further than Forescout’s lab.
Anthropic’s Project Glasswing, which uses a non-public frontier model, has already identified thousands of zero-day vulnerabilities across major operating systems and browsers, including one flaw that reportedly went undetected in OpenBSD for 27 years.
Ansgar Dodt, vice president of product management at Thales Group, said the shift had long been anticipated but is now accelerating rapidly in the wake of Mythos.
“We’ve been warning about this shift for a long time — AI is dramatically lowering the barrier to discovering and exploiting software weaknesses and accelerating it to a scale humans simply can’t match,” he said.
“The implication is clear: organizations now have to assume their software and applications will be continuously analyzed, deconstructed and stress-tested by adversarial AI.”
Dodt added that companies must rethink how they protect software, moving beyond patching to designing applications that are harder to analyze and exploit from the outset.
“The goal is to deny adversaries, and increasingly their AI tools, the visibility they rely on,” he said.
How attackers are adapting
Forescout’s research also found a shift in how cybercriminals are approaching AI.
Underground AI models, which were widely advertised in hacker forums a year ago, have largely been abandoned in favor of commercial models and locally run open-source deployments, it claimed.
More experienced threat actors are now actively coaching newcomers on using AI for phishing, infostealer delivery and penetration testing. Claude has emerged as a preferred tool among hackers since mid-2025, Forescout found, while newer ChatGPT models appear to have lost traction due to stricter alignment policies.
Nik Kairinos, chief executive of RAIDS AI, said the significance lies in the implications.
“What makes Mythos significant is not only the capability, but what Anthropic chose to do with it,” he said. “Restricting release … is the right call, but it only buys time.”
He added that the broader risk is already unfolding. “We are no longer debating whether frontier AI creates systemic risk. We are watching institutions scramble to catch up to capabilities that are already in the wild.”
The defender’s response
As AI continues to evolve, experts say the focus must shift from awareness to action. Forescout recommends that organizations compress patching timelines, prioritize real-time asset visibility across IT, operational technology and medical devices, and begin using the same agentic frameworks now available to attackers to find vulnerabilities in their own environments before adversaries do.
“You cannot prevent every zero-day from being found, by AI or otherwise,” Kairinos said. “What you can do is monitor every AI system in your estate for anomalous behavior, in real time, with a continuous evidence trail.”