menu
close

Anthropic's Claude 4 Models Set New AI Coding Benchmark

Anthropic has launched Claude Opus 4 and Claude Sonnet 4, its most powerful AI models to date, on May 22, 2025. These hybrid reasoning models feature breakthrough capabilities in coding, extended task execution, and advanced memory functions. The release strengthens Anthropic's competitive position against OpenAI and Google, with Claude Opus 4 achieving industry-leading performance on key software engineering benchmarks.
Anthropic's Claude 4 Models Set New AI Coding Benchmark

Anthropic unveiled its next-generation AI models, Claude Opus 4 and Claude Sonnet 4, during its 'Code with Claude 2025' developer conference on May 22. These models represent the company's most significant technical advancement yet, particularly in software engineering and autonomous agent capabilities.

Claude Opus 4, positioned as "the world's best coding model," scored 72.5% on the SWE-bench coding benchmark, outperforming OpenAI's GPT-4.1 (54.6%) and Google's Gemini 2.5 Pro. In testing at Rakuten, Opus 4 demonstrated the ability to code autonomously for nearly seven hours—a dramatic leap beyond previous AI models' minutes-long attention spans.

Both models feature hybrid reasoning systems that allow for either near-instant responses or extended, step-by-step thinking. They can use multiple tools in parallel, including web search, and when given access to local files, can extract and store key information to build what Anthropic calls "tacit knowledge" over time.

Claude Sonnet 4, which improves upon Sonnet 3.7 released in February, delivers enhanced problem-solving capabilities and superior instruction following. It's available to all Claude users, including those on the free tier, while Opus 4 is restricted to Pro, Max, Team, and Enterprise plans.

The release comes amid rapid growth for Anthropic, with annualized revenue doubling to $2 billion in Q1 2025 and an eightfold increase in customers spending over $100,000 annually. The company recently secured a $2.5 billion credit line to fuel its AI development efforts.

Despite the technical achievements, Anthropic has implemented strict safety measures for Claude Opus 4, classifying it under its AI Safety Level 3 (ASL-3) protocol after internal testing revealed potential risks. Both models are available through Anthropic's API, Amazon Bedrock, and Google Cloud's Vertex AI, with pricing set at $15/$75 per million tokens for Opus 4 and $3/$15 for Sonnet 4.

Source:

Latest News