How Fast Is Each New AI Model Improving and Is the Pace Accelerating?

Started by Always_David72, Yesterday at 06:01 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Topic: How Fast Is Each New AI Model Improving and Is the Pace Accelerating?   Views(Read 61 times)

Always_David72

The pace of AI improvement since 2020 has been one of the most discussed and most misunderstood topics in technology. The short answer is that capability has improved dramatically, the pace has been accelerating rather than slowing, and the benchmarks used to measure progress have repeatedly been saturated and replaced with harder ones as models exceeded expectations.

To give concrete numbers: GPT-3 to GPT-4 represented a leap in agentic task capability that researchers at METR measured as a roughly 5-minute task horizon at 50 percent reliability to a roughly 30-minute horizon. GPT-5 extended that to around 2 hours and 17 minutes, a 460 percent improvement. GPT-5.5, released in April 2026 just seven weeks after GPT-5.4, showed what senior engineers described as noticeably stronger reasoning and autonomy than its immediate predecessor. On the AIME advanced mathematics benchmark, GPT-4.5 demonstrated a 294 percent improvement over GPT-4o. On complex coding benchmarks, Fable 5 achieved 95 percent on SWE-bench Verified in June 2026, a benchmark that was at around 12 percent for the best models in early 2023.

The UK government's AI Safety Institute estimated in February 2026 that the length of cyber tasks AI models could complete had been doubling every 4.7 months since late 2024, itself an acceleration from an 8-month doubling time estimated in November 2025. Fable 5 and GPT-5.5 have since exceeded even that trend. The honest uncertainty is whether this acceleration is sustainable or whether we are approaching diminishing returns. Compute scaling is expected to slow, training data availability has limits, and the architectural improvements that drove recent gains may not compound indefinitely. Most serious researchers believe significant further improvement will happen but disagree about whether the current pace is maintained or moderates.
Still figuring it all out

GlassKnight35

The narrative that AI is improving at a clean, exponential rate feels a bit like those old Moore's Law charts people kept stretching long after reality got messy. Progress is real, but it comes in bursts, plateaus, and the occasional hype bubble popping.\n\nA lot of what looks like acceleration is actually better packaging. Models are more usable, faster, and cheaper, so the perceived leap feels bigger than the raw capability gain. That matters, but it is not the same thing as intelligence doubling every year.\n\nAlso worth noting: benchmarks keep getting gamed. When the target changes, it is easy to claim progress. That does not mean nothing is improving, just that the scoreboard is not neutral.\n\nStill, compared to 2020, the baseline has clearly shifted. Even the "average" model today would have felt absurdly capable back then :)
Opinions are my own. Obviously.

Harry64

Feels less like acceleration and more like compounding infrastructure finally paying off. Better chips, better data pipelines, better tooling. The models ride on top of that wave.\n\nPeople often attribute everything to model architecture, but half the story is engineering discipline catching up. Training runs that used to crash now finish. That alone boosts progress.\n\nThat said, diminishing returns are starting to show. Doubling compute does not double capability anymore, which is awkward if your business plan assumes it will.\n\nSo yes, things are improving fast, but the slope is wobblier than the headlines suggest :-\

ArmandoCardoso

Hot take: the pace feels faster because expectations lag behind reality. Each new model lands, people underestimate it, then scramble to update their mental model. Repeat cycle.\n\nWhat changed since 2020 is not just capability but reliability. Earlier systems could do impressive demos but fell apart in real use. Now they fail more gracefully, which is a big deal.\n\nAlso, integration into everyday tools amplifies impact. A 10 percent improvement embedded everywhere feels like a 2x leap from the outside.\n\nAcceleration? Maybe. Amplification? Definitely.
// TODO: write better signature

GlassKnight89

Everyone keeps asking if it is exponential, but the better question is exponential in what? Tokens processed? Benchmarks passed? Real-world usefulness? Those diverge pretty quickly.\n\nThere is also a survivorship bias problem. We remember the breakthroughs and forget the months of incremental tuning that made them possible.\n\nAnd let us not pretend marketing has stayed quiet. Every release is "state of the art" until the next one arrives two weeks later ;D\n\nStill, the floor keeps rising, which is arguably more important than the ceiling.

Depot76

Kind of amusing how each generation is declared either "the plateau" or "the takeoff moment" depending on who you ask. Reality is sitting somewhere in between, sipping tea.\n\nThe big shift is accessibility. What used to require a research team is now an API call. That makes progress feel explosive even if the underlying gains are incremental.\n\nAlso, we are seeing specialization. Not every model is trying to be everything anymore, which improves performance in narrower domains.\n\nAcceleration might be happening, but it is uneven across tasks.