This site is independently operated and is not affiliated with Google or Alphabet Inc. Verify pricing on Google's official website.
Updated March 2026

Gemini API Rate Limits & Quotas

Every rate limit for every Gemini model, across free and paid tiers. Plus how to handle 429 errors, request increases, and choose between AI Studio and Vertex AI.

Understanding Gemini Rate Limits

Google enforces rate limits on the Gemini API to ensure fair usage and system stability. These limits are measured in three dimensions:

RPM

Requests per minute

TPM

Tokens per minute

RPD

Requests per day

If any of these limits is exceeded, the API returns a 429 Too Many Requests error. The limits differ based on your tier (free or paid) and the model you are using. Higher-capability models have tighter limits because they consume more compute resources per request.

Free Tier Rate Limits

Available through Google AI Studio with no billing account required. Ideal for development, prototyping, and low-volume applications.

ModelRPMTPMRPD
Gemini 2.5 Pro5250,00025
Gemini 2.5 Flash10500,000500
Gemini 2.0 Flash151,000,0001,500
Gemini 2.0 Flash Lite301,000,0001,500

Free tier limits are per API key. Creating multiple keys does not increase your aggregate limit.

Paid Tier Rate Limits

Activated by enabling billing on your Google Cloud project. Paid tier removes daily request limits and significantly increases RPM and TPM.

ModelRPMTPMRPD
Gemini 2.5 Pro1,0004,000,000Unlimited
Gemini 2.5 Flash2,0004,000,000Unlimited
Gemini 2.0 Flash2,0004,000,000Unlimited
Gemini 2.0 Flash Lite4,0004,000,000Unlimited

Paid tier limits shown are defaults. They can be increased through quota requests in Google Cloud Console.

How to Increase Your Rate Limits

If the default paid tier limits are not sufficient for your application, there are several paths to get higher quotas.

1. Enable Paid Billing

The simplest way to get higher limits is to upgrade from the free tier. Enabling billing on your Google Cloud project automatically unlocks paid tier limits, which are 10x to 200x higher than the free tier depending on the model. No application or review process is required.

2. Request a Quota Increase

In the Google Cloud Console, navigate to IAM & Admin > Quotas. Find the Generative Language API quotas, select the limit you want to increase, and submit a request. Most requests are processed within 24 to 48 hours. Include your expected usage pattern and business justification for faster approval.

3. Contact Google Cloud Sales

For enterprise workloads requiring more than 10,000 RPM or custom SLAs, engage with Google Cloud sales directly. They can set up custom rate limits, dedicated capacity, and negotiated pricing. This is common for companies spending more than $10,000 per month on the Gemini API.

4. Use Multiple Regions (Vertex AI)

Vertex AI rate limits are per-region. If you deploy your application across multiple regions (e.g., us-central1 and europe-west4), each region has its own independent quota. This effectively multiplies your available capacity without requiring a quota increase.

Rate Limit Headers and Error Handling

When you hit a rate limit, the API returns a 429 Too Many Requests status code along with headers that tell you when you can retry.

HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Retry-After: 30
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1711929600
HeaderDescription
Retry-AfterSeconds to wait before retrying the request
X-RateLimit-LimitYour maximum allowed requests per minute
X-RateLimit-RemainingRequests remaining in the current window
X-RateLimit-ResetUnix timestamp when the rate limit window resets

Recommended Retry Strategy

Implement exponential backoff with jitter. This prevents all your retries from hitting the API at the exact same time, which would just trigger another rate limit. Here is the recommended approach:

  1. 1On first 429, wait 1 second plus random jitter (0 to 500ms)
  2. 2On second retry, wait 2 seconds plus jitter
  3. 3On third retry, wait 4 seconds plus jitter
  4. 4Cap at 5 retries or 32 seconds, whichever comes first
  5. 5If the Retry-After header is present, always prefer its value over your backoff calculation
Important: Never retry in a tight loop without delay. Doing so can extend your rate limit window and make the problem worse. Always wait at least 1 second between retries.

AI Studio vs Vertex AI Rate Limits

Google provides two ways to access the Gemini API: Google AI Studio and Vertex AI. They have different rate limit structures, pricing models, and scaling options.

FeatureGoogle AI StudioVertex AI
Free tierYesLimited trial credits
Rate limit scopePer API keyPer project, per region
Quota increaseUpgrade to paid tierCustom via Cloud Console
Multi-region scalingNoYes
SLA availableNoYes (99.9%)
Best forPrototyping, small appsProduction, enterprise

For most startups and small teams, Google AI Studio is the right starting point. Its free tier is generous and the paid tier supports up to 2,000 RPM on Flash models. Migrate to Vertex AI when you need SLAs, fine-grained access control (IAM), private networking, or multi-region deployments. Read our full comparison at AI Studio vs Vertex AI.

Frequently Asked Questions

What are the Gemini API rate limits for the free tier?

Free tier rate limits vary by model. Gemini 2.0 Flash allows 15 requests per minute (RPM), 1 million tokens per minute (TPM), and 1,500 requests per day (RPD). Gemini 2.5 Flash allows 10 RPM and 500 RPD. Gemini 2.5 Pro is the most restricted at 5 RPM and 25 RPD. All free tier usage is through Google AI Studio.

How do I increase my Gemini API rate limits?

For Google AI Studio, upgrade to paid billing to get higher limits automatically. For Vertex AI, request a quota increase through the Google Cloud Console under IAM & Admin > Quotas. For very high-volume needs (over 1,000 RPM), contact Google Cloud sales for a custom agreement.

What happens when I hit the Gemini API rate limit?

The API returns HTTP 429 (Too Many Requests) with a Retry-After header indicating how long to wait. Best practice is to implement exponential backoff: wait 1 second on first retry, then 2, 4, 8 seconds, with jitter. Never retry immediately in a tight loop, as this can extend your rate limit window.

Are Vertex AI rate limits different from AI Studio?

Yes. Vertex AI generally offers higher rate limits than Google AI Studio, especially for enterprise customers. Vertex AI limits are configured per-project and per-region, and can be increased through quota requests. AI Studio limits are fixed per API key tier (free or paid). Vertex AI also supports multi-region deployments for effectively multiplied capacity.

Need help choosing between AI Studio and Vertex AI?

Read our detailed comparison or estimate your costs with the calculator.