Even frontier LLMs from GPT-5 onward lose up to 33% accuracy when you chat too long

Source: The Decoder·Sun, 1 Mar 2026, 04:11 am UTCRead original →

Relevance

AI Summary

According to The Decoder, even frontier large language models including GPT-5.2 and Claude 4.6 experience significant accuracy degradation during extended conversations, with performance losses of up to 33%. The research indicates this is a persistent limitation affecting the latest generation of AI chatbots, suggesting the problem has not been resolved despite continued model advancement. The findings apply broadly across newer frontier models, meaning the accuracy drop is not isolated to a single vendor or architecture. The Decoder reports that as conversation length increases, the quality of AI-generated responses measurably declines, raising questions about the reliability of these systems in complex, multi-turn use cases.

Why it matters

This finding has direct implications for enterprise AI adoption, particularly for use cases that depend on sustained, multi-turn interactions such as customer service automation, coding assistants, and agentic workflows — all of which represent major commercial revenue drivers for companies like OpenAI, Anthropic, and Microsoft. A documented 33% accuracy degradation in extended sessions could influence enterprise procurement decisions and highlights a fundamental technical limitation that remains unresolved even in the most advanced commercially available models. For the broader AI sector, this underscores the gap between benchmark performance and real-world reliability, a tension that analysts and institutional investors increasingly scrutinize when evaluating AI platform valuations.

Scoring rationale

This article directly addresses performance limitations of frontier LLMs including GPT-5 and Claude 4.6, which has significant market relevance as it highlights technical constraints affecting the competitive positioning of OpenAI and Anthropic's flagship products.

72/100

Impacted tickers

MSFTNASDAQGOOGLNASDAQAMZNNASDAQ

This summary was generated by AI from the original article published by The Decoder. AIMarketWire does not provide trading advice. Always refer to the original source for complete reporting.

Even frontier LLMs from GPT-5 onward lose up to 33% accuracy when you chat too long

AI Summary

Why it matters

Scoring rationale

Impacted tickers

Related articles