Updated March 2026

Gemini API Rate Limits & Quotas

Every rate limit for every Gemini model, across free and paid tiers. Plus how to handle 429 errors, request increases, and choose between AI Studio and Vertex AI.

Understanding Gemini Rate Limits

Google enforces rate limits on the Gemini API to ensure fair usage and system stability. These limits are measured in three dimensions:

RPM

Requests per minute

TPM

Tokens per minute

RPD

Requests per day

If any of these limits is exceeded, the API returns a 429 Too Many Requests error. The limits differ based on your tier (free or paid) and the model you are using. Higher-capability models have tighter limits because they consume more compute resources per request.

Free Tier Rate Limits

Available through Google AI Studio with no billing account required. Ideal for development, prototyping, and low-volume applications.

Model	RPM	TPM	RPD
Gemini 2.5 Pro	5	250,000	25
Gemini 2.5 Flash	10	500,000	500
Gemini 2.0 Flash	15	1,000,000	1,500
Gemini 2.0 Flash Lite	30	1,000,000	1,500

Free tier limits are per API key. Creating multiple keys does not increase your aggregate limit.

Paid Tier Rate Limits

Activated by enabling billing on your Google Cloud project. Paid tier removes daily request limits and significantly increases RPM and TPM.

Model	RPM	TPM	RPD
Gemini 2.5 Pro	1,000	4,000,000	Unlimited
Gemini 2.5 Flash	2,000	4,000,000	Unlimited
Gemini 2.0 Flash	2,000	4,000,000	Unlimited
Gemini 2.0 Flash Lite	4,000	4,000,000	Unlimited

Paid tier limits shown are defaults. They can be increased through quota requests in Google Cloud Console.

How to Increase Your Rate Limits

If the default paid tier limits are not sufficient for your application, there are several paths to get higher quotas.

1. Enable Paid Billing

The simplest way to get higher limits is to upgrade from the free tier. Enabling billing on your Google Cloud project automatically unlocks paid tier limits, which are 10x to 200x higher than the free tier depending on the model. No application or review process is required.

2. Request a Quota Increase

In the Google Cloud Console, navigate to IAM & Admin > Quotas. Find the Generative Language API quotas, select the limit you want to increase, and submit a request. Most requests are processed within 24 to 48 hours. Include your expected usage pattern and business justification for faster approval.

3. Contact Google Cloud Sales

For enterprise workloads requiring more than 10,000 RPM or custom SLAs, engage with Google Cloud sales directly. They can set up custom rate limits, dedicated capacity, and negotiated pricing. This is common for companies spending more than $10,000 per month on the Gemini API.

4. Use Multiple Regions (Vertex AI)

Vertex AI rate limits are per-region. If you deploy your application across multiple regions (e.g., us-central1 and europe-west4), each region has its own independent quota. This effectively multiplies your available capacity without requiring a quota increase.

Rate Limit Headers and Error Handling

When you hit a rate limit, the API returns a 429 Too Many Requests status code along with headers that tell you when you can retry.

HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Retry-After: 30
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1711929600

Header	Description
Retry-After	Seconds to wait before retrying the request
X-RateLimit-Limit	Your maximum allowed requests per minute
X-RateLimit-Remaining	Requests remaining in the current window
X-RateLimit-Reset	Unix timestamp when the rate limit window resets

Recommended Retry Strategy

Implement exponential backoff with jitter. This prevents all your retries from hitting the API at the exact same time, which would just trigger another rate limit. Here is the recommended approach:

1On first 429, wait 1 second plus random jitter (0 to 500ms)
2On second retry, wait 2 seconds plus jitter
3On third retry, wait 4 seconds plus jitter
4Cap at 5 retries or 32 seconds, whichever comes first
5If the Retry-After header is present, always prefer its value over your backoff calculation

Important: Never retry in a tight loop without delay. Doing so can extend your rate limit window and make the problem worse. Always wait at least 1 second between retries.

AI Studio vs Vertex AI Rate Limits

Google provides two ways to access the Gemini API: Google AI Studio and Vertex AI. They have different rate limit structures, pricing models, and scaling options.

Feature	Google AI Studio	Vertex AI
Free tier	Yes	Limited trial credits
Rate limit scope	Per API key	Per project, per region
Quota increase	Upgrade to paid tier	Custom via Cloud Console
Multi-region scaling	No	Yes
SLA available	No	Yes (99.9%)
Best for	Prototyping, small apps	Production, enterprise

For most startups and small teams, Google AI Studio is the right starting point. Its free tier is generous and the paid tier supports up to 2,000 RPM on Flash models. Migrate to Vertex AI when you need SLAs, fine-grained access control (IAM), private networking, or multi-region deployments. Read our full comparison at AI Studio vs Vertex AI.

Frequently Asked Questions

What are the Gemini API rate limits for the free tier?

Free tier rate limits vary by model. Gemini 2.0 Flash allows 15 requests per minute (RPM), 1 million tokens per minute (TPM), and 1,500 requests per day (RPD). Gemini 2.5 Flash allows 10 RPM and 500 RPD. Gemini 2.5 Pro is the most restricted at 5 RPM and 25 RPD. All free tier usage is through Google AI Studio.

How do I increase my Gemini API rate limits?

For Google AI Studio, upgrade to paid billing to get higher limits automatically. For Vertex AI, request a quota increase through the Google Cloud Console under IAM & Admin > Quotas. For very high-volume needs (over 1,000 RPM), contact Google Cloud sales for a custom agreement.

What happens when I hit the Gemini API rate limit?

The API returns HTTP 429 (Too Many Requests) with a Retry-After header indicating how long to wait. Best practice is to implement exponential backoff: wait 1 second on first retry, then 2, 4, 8 seconds, with jitter. Never retry immediately in a tight loop, as this can extend your rate limit window.

Are Vertex AI rate limits different from AI Studio?

Yes. Vertex AI generally offers higher rate limits than Google AI Studio, especially for enterprise customers. Vertex AI limits are configured per-project and per-region, and can be increased through quota requests. AI Studio limits are fixed per API key tier (free or paid). Vertex AI also supports multi-region deployments for effectively multiplied capacity.

Need help choosing between AI Studio and Vertex AI?

Read our detailed comparison or estimate your costs with the calculator.

AI Studio vs Vertex AI Cost Calculator