OpenAI has released GPT-5.4 mini and GPT-5.4 nano, smaller versions of its latest flagship model aimed at coding and high-volume API workloads.
The move widens the GPT-5.4 family, giving developers cheaper options for tasks that do not need the full model while keeping them inside the same model generation.
GPT-5.4 mini is built for coding workflows, computer use and subagents, while GPT-5.4 nano is intended for simpler, faster tasks such as classification, data extraction, ranking and supporting sub-agent tasks.
Both models support text and image input, both have 400,000-token context windows, and both are being positioned as production models rather than experimental side releases.
How the three vendors are describing their smaller models
OpenAI claims GPT-5.4 mini is its strongest mini model yet for coding, computer use and subagents.
Anthropic describes Claude Haiku 4.5 as its fastest and most cost-efficient model, while Google describes Gemini 2.5 Flash-Lite as its smallest and most cost effective model, built for at scale usage.
Where the pricing lands across the three vendors
The competition is also showing up in pricing. OpenAI’s pricing page lists $0.75 per million input tokens and $4.50 per million output tokens for GPT-5.4 mini, and $0.20 and $1.25 for GPT-5.4 nano.
Anthropic’s pricing page lists Claude Haiku 4.5 at $1 per million input tokens and $2 per million output tokens, with additional prompt-caching and long-context pricing tiers.
Google lists Gemini 2.5 Flash-Lite at $0.10 input and $0.40 output on its paid tier, falling to $0.05 input and $0.20 output in batch mode. On listed standard prices, Google’s Gemini 2.5 Flash-Lite remains cheaper than OpenAI’s GPT-5.4 nano on both input and output pricing.
Smaller models as production-grade workhorses, not fallbacks
Vendor positioning suggests these smaller models are no longer being described as basic fallback options. Anthropic says Haiku 4.5 matches Sonnet 4 on coding, computer use and agent tasks and cites a 73.3% score on SWE-bench Verified.
Google’s docs also position Flash-Lite for high-volume use and show higher batch-enqueued token limits than Gemini 2.5 Flash in lower paid tiers.