Anthropic has launched Claude Opus 4.5 in its apps, API and on all three major clouds, with upgrades aimed at long-running coding and enterprise agent workflows.

Benchmark shows Opus 4.5 achieving 80.9% on SWE-bench Verified (agentic coding), 59.3% on Terminal-bench 2.0, and strong results on agent/tool and computer-use tests; Anthropic also reports 90.8% on MMLU, 87.0% on GPQA Diamond, and 80.7% on MMMU (vision).

Benchmark tests evaluate AI models on different skills: SWE-bench and Terminal-bench cover coding and operations; MMLU and GPQA measure general and expert reasoning; and MMMU assesses vision-plus-text comprehension. Typically, higher scores indicate more reliable performance on those specific tasks.

Anthropic’s post adds that Opus 4.5 introduces an “effort” parameter to trade off latency/cost vs. capability; at medium effort, it matches Sonnet 4.5’s SWE-bench Verified score using 76% fewer output tokens, and at high effort it exceeds Sonnet 4.5 by 4.3 points while using 48% fewer tokens.

Anthropic describes Opus 4.5 as capable of multi-day software development in hours, with stronger planning and architecture across languages; targeted at enterprise developers and agentic use cases.

Availability spans major clouds: Google said Opus 4.5 is generally available on Vertex AI as of Nov. 25, while Microsoft said Opus 4.5 is in public preview in Microsoft Foundry, with access in GitHub Copilot paid plans and Copilot Studio.