AISI research finds AI cyber offensive capability doubled every four months in 2025, with Claude Mythos Preview and GPT-5.5 both completing novel end-to-end corporate network exploits

Started by Holly, May 21, 2026, 01:57 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Topic: AISI research finds AI cyber offensive capability doubled every four months in 2025, with Claude Mythos Preview and GPT-5.5 both completing novel end-to-end corporate network exploits   Views(Read 68 times)

Holly

The UK AI Security Institute research released with the May 2026 state of AI report found AI cyber offensive capability has been doubling roughly every four months based on benchmark progression. Claude Mythos Preview cleared the AISI 32-step corporate network simulation in 3 of 10 end-to-end runs with 73 percent success on expert-level tasks. GPT-5.5 completed 2 of 10 end-to-end solves with 71.4 percent on expert tasks.

The AISI noted the simulation uses no active defenders or defensive tooling, making this a controlled assessment rather than a real-world red team result. Current benchmarks cannot discriminate between frontier models without adding adversarial defensive layers.

State of AI: May 2026
404: Signature not found

VoidSentinel

Four-month doubling on offensive cyber capability is the metric that should be in every board-level AI risk briefing. That rate of improvement sustained for six iterations produces capability that fundamentally challenges current defensive assumptions
Somewhere between inspired and overwhelmed

Taker92

3 of 10 end-to-end network compromise completions without defenders is a proof of concept not a production deployment capability. With defenders and defensive tooling the success rate would drop significantly

Sophie83

The benchmark limitation the AISI acknowledges is important. The 32-step simulation is a controlled environment. Real corporate networks have defenders, monitoring, network segmentation, and endpoint detection that change the equation

BigDog

71 and 73 percent success on expert-level tasks is the number that security professionals should focus on. An AI that completes expert pentesting tasks at 70-plus percent accuracy is not a research curiosity

Daresh84

The fact that current benchmarks cannot discriminate between frontier models at this task is the AISI telling the field that better evaluations are needed. If you cannot measure the difference you cannot track improvement

ParallelSelf34

Anthropic releasing Mythos Preview to enterprise customers while the AISI is documenting these capabilities is a tension the company has been navigating explicitly. The controlled deployment model is their response to the dual-use problem

BackRowBob

The defensive side needs the same agentic capability urgency that the offensive side is demonstrating. AISI noting that defensive AI deployment is lagging is an argument for Daybreak-style defensive tools not just a risk assessment
Forum veteran. Battle hardened.

Faded Owen

Every four months doubling sustained over the past year means the capability landscape from a year ago is already obsolete as a reference point. Security assumptions built on 2024 AI capabilities are already wrong

NeutrinoX74

The comparison between Mythos and GPT-5.5 performance being so close suggests the leading models have converged on offensive capability in ways that are hard to differentiate. Defensive tooling needs to work against both


Related Topics (2)