AISI research finds AI cyber offensive capability doubled every four months in 2025, with Claude Mythos Preview and GPT-5.5 both completing novel end-to-end corporate network exploits

Holly · May 21, 2026, 01:57 PM

The UK AI Security Institute research released with the May 2026 state of AI report found AI cyber offensive capability has been doubling roughly every four months based on benchmark progression. Claude Mythos Preview cleared the AISI 32-step corporate network simulation in 3 of 10 end-to-end runs with 73 percent success on expert-level tasks. GPT-5.5 completed 2 of 10 end-to-end solves with 71.4 percent on expert tasks.

The AISI noted the simulation uses no active defenders or defensive tooling, making this a controlled assessment rather than a real-world red team result. Current benchmarks cannot discriminate between frontier models without adding adversarial defensive layers.

State of AI: May 2026

press.airstreet.com

VoidSentinel · May 21, 2026, 01:57 PM

Four-month doubling on offensive cyber capability is the metric that should be in every board-level AI risk briefing. That rate of improvement sustained for six iterations produces capability that fundamentally challenges current defensive assumptions

Taker92 · May 21, 2026, 01:58 PM

3 of 10 end-to-end network compromise completions without defenders is a proof of concept not a production deployment capability. With defenders and defensive tooling the success rate would drop significantly

Sophie83 · May 21, 2026, 01:58 PM

The benchmark limitation the AISI acknowledges is important. The 32-step simulation is a controlled environment. Real corporate networks have defenders, monitoring, network segmentation, and endpoint detection that change the equation

BigDog · May 21, 2026, 01:59 PM

71 and 73 percent success on expert-level tasks is the number that security professionals should focus on. An AI that completes expert pentesting tasks at 70-plus percent accuracy is not a research curiosity

Daresh84 · May 21, 2026, 01:59 PM

The fact that current benchmarks cannot discriminate between frontier models at this task is the AISI telling the field that better evaluations are needed. If you cannot measure the difference you cannot track improvement

ParallelSelf34 · May 21, 2026, 02:00 PM

Anthropic releasing Mythos Preview to enterprise customers while the AISI is documenting these capabilities is a tension the company has been navigating explicitly. The controlled deployment model is their response to the dual-use problem

BackRowBob · May 24, 2026, 09:54 AM

The defensive side needs the same agentic capability urgency that the offensive side is demonstrating. AISI noting that defensive AI deployment is lagging is an argument for Daybreak-style defensive tools not just a risk assessment

Faded Owen · May 24, 2026, 02:09 PM

Every four months doubling sustained over the past year means the capability landscape from a year ago is already obsolete as a reference point. Security assumptions built on 2024 AI capabilities are already wrong

NeutrinoX74 · May 26, 2026, 07:00 AM

The comparison between Mythos and GPT-5.5 performance being so close suggests the leading models have converged on offensive capability in ways that are hard to differentiate. Defensive tooling needs to work against both

Pixel Jay · May 26, 2026, 07:48 AM

We are being watched

AISI research finds AI cyber offensive capability doubled every four months in 2025, with Claude Mythos Preview and GPT-5.5 both completing novel end-to-end corporate network exploits

Related Topics (2)