Dec 2023 to March 2026

Gemini API Pricing History

How Google Gemini API costs have evolved since launch. Every model, every price point, and the trends shaping the future of AI pricing.

80%

Input price drop

1.0 Pro to 2.0 Flash

73%

Output price drop

1.0 Pro to 2.0 Flash

Major model releases

In 15 months

Pricing Timeline

December 2023Gemini 1.0 Pro

Input/1M: $0.50

Output/1M: $1.50

Context: 32K

Google's first competitive LLM API. Launched alongside Gemini 1.0 Ultra (limited access). Positioned against GPT-3.5 Turbo on price and GPT-4 on capability.

February 2024Gemini 1.5 Pro

Input/1M: $3.50

Output/1M: $10.50

Context: 1M

Massive leap in context window to 1 million tokens. Premium pricing reflected the flagship positioning. Also introduced Gemini 1.5 Flash at $0.075/$0.30 as a cost-effective alternative.

February 2024Gemini 1.5 Flash

Input/1M: $0.075

Output/1M: $0.30

Context: 1M

Google's answer to the growing demand for cheap, fast inference. Same 1M context window as 1.5 Pro at a fraction of the cost. Set the template for the Flash model tier.

December 2024Gemini 2.0 Flash

Input/1M: $0.10

Output/1M: $0.40

Context: 1M

Next-generation Flash model with improved reasoning and multimodal capabilities. Slightly higher than 1.5 Flash but significantly better quality. Free tier included from day one.

March 2025Gemini 2.5 Pro

Input/1M: $1.25

Output/1M: $10.00

Context: 1M

Google's most capable model. Advanced reasoning, coding, and analysis at a steep discount compared to 1.5 Pro ($3.50/$10.50). Demonstrates the trend: better models at lower prices.

March 2025Gemini 2.5 Flash

Input/1M: $0.15

Output/1M: $0.60

Context: 1M

Latest Flash model with thinking capabilities. Positioned between 2.0 Flash and 2.5 Pro. Offers strong reasoning at near-Flash pricing.

Key Pricing Trends

Several clear patterns have emerged since Gemini launched in late 2023. Understanding these trends helps forecast future pricing and plan your API budget accordingly.

1. Prices Drop with Each Generation

Each new model generation delivers better performance at a lower price point. Gemini 2.5 Pro costs 64% less on input than 1.5 Pro ($1.25 vs $3.50), while being substantially more capable. This pattern mirrors the broader semiconductor industry where performance per dollar doubles roughly every two years.

2. The Flash Tier Is the Main Event

Google has invested heavily in the Flash model tier. For most production workloads, Flash models offer the best cost-performance ratio. The gap between Flash and Pro quality has narrowed with each generation, while the price gap remains 8x to 12x. Expect Flash models to handle an increasingly broad set of tasks over time.

3. Context Gets Cheaper, Not Smaller

Rather than shrinking context windows to cut costs, Google has kept the 1M token context standard across all models while reducing per-token prices. This means long-context use cases (document analysis, code review, multi-turn conversations) are becoming increasingly affordable. Context caching (at 25% of input cost) further reduces the cost of reusing large contexts.

4. Free Tiers Are Getting More Generous

The free tier for Gemini 2.0 Flash (15 RPM, 1,500 RPD) is more generous than the original Gemini 1.0 Pro free tier was. Google uses free tiers strategically to drive developer adoption and build ecosystem lock-in. This trend benefits developers and startups who can build and test extensively before paying anything.

Comparison with OpenAI and Anthropic Trajectories

All three major providers have been cutting prices, but at different rates and with different strategies. Here is how their pricing trajectories compare.

Provider	Flagship (Input/Output)	Budget (Input/Output)	Price Trend
Google Gemini	$1.25 / $10.00	$0.10 / $0.40	Aggressive cuts
OpenAI	$2.50 / $10.00	$0.15 / $0.60	Moderate cuts
Anthropic	$3.00 / $15.00	$0.25 / $1.25	Moderate cuts

Google has been the most aggressive on pricing, particularly for Flash/budget models. Their $0.10 input price for 2.0 Flash undercuts both GPT-4o mini ($0.15) and Claude 3.5 Haiku ($0.25). On flagship models, Google is cheaper on input ($1.25 vs $2.50 for GPT-4o) but comparable on output. The overall trend across all three providers is clear: prices are falling 30% to 50% per year, and this pace shows no signs of slowing.

Future Pricing Predictions

Based on current trends, hardware roadmaps, and competitive dynamics, here is what we expect for Gemini pricing through the rest of 2026 and into 2027.

Flash models will break $0.05 per million input tokens

Google's TPU v5p and upcoming v6 hardware deliver significantly better inference throughput. Combined with model distillation improvements, we expect the next-generation Flash model to price input tokens at $0.05 or lower, making high-volume applications viable for even bootstrapped startups.

Pro models will approach current Flash pricing

As inference efficiency improves, flagship model pricing will converge toward what budget models cost today. We expect a Gemini 3.0 Pro to cost around $0.50/$4.00 per million tokens, roughly matching 2.5 Flash performance at 2.0 Flash prices.

Context caching and batching discounts will increase

Google will likely offer deeper discounts for committed usage, similar to reserved instance pricing in cloud compute. Expect batch processing discounts to grow from 50% to potentially 70%, and caching discounts to extend beyond the current hourly billing model.

Frequently Asked Questions

How much has Gemini API pricing dropped since launch?

Dramatically. Gemini 1.0 Pro launched at $0.50/$1.50 per million tokens (input/output) in December 2023. By December 2024, Gemini 2.0 Flash offered $0.10/$0.40, representing an 80% drop in input cost and 73% drop in output cost. And 2.0 Flash is a more capable model than 1.0 Pro was. The trend of better quality at lower cost continues with each release.

Is Gemini cheaper than GPT-4 now?

Yes, for most workloads. Gemini 2.0 Flash at $0.10/$0.40 per million tokens is substantially cheaper than GPT-4o at $2.50/$10.00. Even Gemini 2.5 Pro at $1.25/$10.00 is cheaper on input than GPT-4o, though output pricing is similar. The gap is largest for high-volume, latency-tolerant workloads where Flash models excel.

Will Gemini API prices continue to drop?

The trend strongly suggests yes. Each new model generation has been cheaper per token while offering better performance. Google has also aggressively priced Flash models to gain market share against OpenAI and Anthropic. Hardware improvements (TPU v5p and beyond), model distillation, and competitive pressure all point toward continued price reductions through 2026 and beyond.

See today's pricing in detail

Compare current Gemini model pricing or calculate your projected costs.

View Current Pricing Cost Calculator