How much does context caching cost for Gemini 2.5 Pro?

Context caching for Gemini 2.5 Pro charges $0.3125 per million tokens per hour for storage (25% of the $1.25 input price). Cached reads cost $0.9375 per million tokens (a 25% discount from the standard $1.25 input rate). The minimum cache size is 32,768 tokens.

Gemini 2.5 Pro Pricing

Flagship

Google's most capable AI model. Gemini 2.5 Pro delivers state-of-the-art reasoning, code generation, and multi-modal understanding with a 1 million token context window. Here is exactly what it costs and when it is worth the premium over Flash.

Gemini 2.5 Pro API Pricing

Under 200K context

Input

$1.25 / MTok

Output

$10.00 / MTok

Over 200K context

Input

$2.50 / MTok

Output

$10.00 / MTok

Context window: 1M tokens

Free tier: 25 req/day, 5 req/min

Context caching supported

Context Caching Prices for 2.5 Pro

Context caching lets you store frequently used prompt prefixes and pay a reduced rate on subsequent reads. For workloads that repeatedly send the same system instructions or knowledge base, caching can deliver meaningful savings.

Operation	Standard Price	Cached Price	Saving
Input read (under 200K)	$1.25 / MTok	$0.9375 / MTok	25%
Input read (over 200K)	$2.50 / MTok	$1.875 / MTok	25%
Cache storage	N/A	$0.3125 / MTok / hr	-

Storage fee is 25% of the standard input price per hour. Minimum cache size is 32,768 tokens. See our full context caching guide for break-even analysis.

What Is Gemini 2.5 Pro?

Gemini 2.5 Pro is Google's flagship large language model, designed for tasks that require deep reasoning, complex analysis, and high-quality output. It represents the cutting edge of Google DeepMind's research and is the model of choice when quality matters more than cost.

Advanced reasoning

Multi-step logical reasoning, mathematical problem solving, and complex analysis that lighter models struggle with.

Code generation

Production-quality code across dozens of languages. Understands large codebases, writes tests, and debugs complex issues.

Long document analysis

Process up to 1M tokens (roughly 750,000 words) in a single request. Analyse entire books, codebases, or research paper collections.

Multi-modal understanding

Process text, images, audio, and video natively. Analyse charts, diagrams, screenshots, and video content.

When to use Gemini 2.5 Pro

Use 2.5 Pro when the task genuinely benefits from superior reasoning. Common scenarios include: legal contract analysis where missed nuances have real consequences, research synthesis that requires connecting ideas across multiple sources, code architecture decisions that affect long-term maintainability, and creative writing that needs to maintain consistency across long narratives.

For simpler tasks like classification, extraction, translation, and basic Q&A, Gemini 2.5 Flash delivers comparable results at a fraction of the cost. The key question is always: would a human notice the difference in output quality? If not, use Flash.

5 Real-World Cost Examples

Concrete pricing for common 2.5 Pro use cases. All calculations use the standard pricing tier (under 200K context unless noted).

1. Code review of a pull request

Single request

Reviewing a 200-line diff with surrounding context. 8,000 input tokens and 2,000 output tokens.

Input: 8,000 / 1,000,000 x $1.25$0.010

Output: 2,000 / 1,000,000 x $10.00$0.020

Total per review$0.030

2. Legal contract analysis

100 docs/day

Analysing 30-page contracts (40,000 input tokens) with a 3,000-token risk summary. 100 contracts per day.

Input: 100 x 40,000 / 1,000,000 x $1.25$5.00

Output: 100 x 3,000 / 1,000,000 x $10.00$3.00

Daily total$8.00

$240/month for 3,000 contract analyses

3. Research paper synthesis

Single request

Feeding 5 academic papers (150,000 input tokens total) and asking for a 5,000-token synthesis report.

Input: 150,000 / 1,000,000 x $1.25$0.1875

Output: 5,000 / 1,000,000 x $10.00$0.050

Total per synthesis$0.2375

4. Full codebase analysis (long context)

Over 200K tier

Loading an entire 500K-token codebase and asking for an architecture review (4,000-token output). Uses the extended pricing tier.

Input (first 200K): 200,000 / 1M x $1.25$0.25

Input (remaining 300K): 300,000 / 1M x $2.50$0.75

Output: 4,000 / 1M x $10.00$0.04

Total per analysis$1.04

5. Content writing pipeline

500 articles/day

Generating 1,500-word articles from a 500-token brief. 500 input tokens and 2,000 output tokens per article, 500 articles daily.

Input: 500 x 500 / 1,000,000 x $1.25$0.3125

Output: 500 x 2,000 / 1,000,000 x $10.00$10.00

Daily total$10.31

$309.38/month for 15,000 articles. Note: output-heavy workloads like this are where Pro costs add up fast.

Gemini 2.5 Pro vs 2.5 Flash

The most common question teams face: when is Pro worth the extra cost over Flash? Pro costs 8.3x more on input ($1.25 vs $0.15) and 16.7x more on output ($10.00 vs $0.60). That premium only makes sense when the quality difference is measurable.

Factor	2.5 Pro	2.5 Flash
Input price / MTok	$1.25	$0.15
Output price / MTok	$10.00	$0.60
Context window	1M tokens	1M tokens
Complex reasoning	Excellent	Good
Code generation	Production-grade	Adequate
Speed (latency)	Slower	Fast
Best for	Quality-critical tasks	Volume workloads

Practical advice: Start with Flash for every task. Run an A/B test comparing Pro and Flash outputs on a sample of your actual prompts. If evaluators (human or automated) cannot reliably distinguish Pro from Flash outputs, stay on Flash and save 8x or more. Most teams find that 80-90% of their requests work equally well on Flash.

Gemini 2.5 Pro vs Claude Sonnet 4 vs GPT-4o

How does Gemini 2.5 Pro stack up against the flagship models from Anthropic and OpenAI? Pro is the cheapest on input and competitive on output, with a much larger context window.

Feature	Gemini 2.5 Pro	Claude Sonnet 4	GPT-4o
Input / MTok	$1.25	$3.00	$2.50
Output / MTok	$10.00	$15.00	$10.00
Context window	1M tokens	200K tokens	128K tokens
Free tier	25 req/day	None	Limited
Context caching	Yes (25% discount)	Yes (90% discount)	No
Coding strength	Strong	Strongest	Strong

Gemini 2.5 Pro is 58% cheaper than Claude Sonnet 4 on input ($1.25 vs $3.00) and 50% cheaper than GPT-4o on input ($1.25 vs $2.50). On output, Gemini matches GPT-4o at $10.00 and is 33% cheaper than Claude at $15.00. For input-heavy workloads like document analysis and long-context processing, Gemini offers the best value among flagship models. Claude's advantage is deeper prompt caching discounts (90% vs 25%) and a reputation for stronger code generation.

Frequently Asked Questions

How much does Gemini 2.5 Pro cost per request?

It depends on the request size. A typical request with 4,000 input tokens and 1,000 output tokens costs about $0.015 (input: 4K/1M x $1.25 = $0.005, output: 1K/1M x $10.00 = $0.01). A large 100K-token document analysis with 2K output costs about $0.145.

Why does the price increase over 200K context?

Processing very long contexts requires significantly more compute. Google charges $2.50 per million input tokens (double the standard rate) for input exceeding 200K tokens. Output pricing stays at $10.00 regardless of context length. This keeps shorter requests affordable while reflecting the true cost of ultra-long context.

Is Gemini 2.5 Pro cheaper than Claude Sonnet 4?

Yes, significantly. Pro costs $1.25 per million input tokens versus Sonnet 4 at $3.00 (58% cheaper on input). On output, Pro costs $10.00 versus $15.00 (33% cheaper). For input-heavy workloads, the savings are substantial.

Does Gemini 2.5 Pro have a free tier?

Yes. Google AI Studio provides 25 free requests per day and 5 requests per minute. No credit card required. This is enough for testing and prototyping but not for production volume.

How much does context caching cost for 2.5 Pro?

Cached reads cost $0.9375 per million tokens (25% discount from $1.25). Storage is $0.3125 per million tokens per hour (25% of input price). Minimum cache size is 32,768 tokens.

Gemini 2.5 Flash PricingThe budget alternative at $0.15/$0.60 per MTok Context Caching GuideReduce input costs by 25% with caching Cost CalculatorEstimate your monthly API spend