Gemini API Rate Limits & Quotas
Every rate limit for every Gemini model, across free and paid tiers. Plus how to handle 429 errors, request increases, and choose between AI Studio and Vertex AI.
Understanding Gemini Rate Limits
Google enforces rate limits on the Gemini API to ensure fair usage and system stability. These limits are measured in three dimensions:
RPM
Requests per minute
TPM
Tokens per minute
RPD
Requests per day
If any of these limits is exceeded, the API returns a 429 Too Many Requests error. The limits differ based on your tier (free or paid) and the model you are using. Higher-capability models have tighter limits because they consume more compute resources per request.
Free Tier Rate Limits
Available through Google AI Studio with no billing account required. Ideal for development, prototyping, and low-volume applications.
| Model | RPM | TPM | RPD |
|---|---|---|---|
| Gemini 2.5 Pro | 5 | 250,000 | 25 |
| Gemini 2.5 Flash | 10 | 500,000 | 500 |
| Gemini 2.0 Flash | 15 | 1,000,000 | 1,500 |
| Gemini 2.0 Flash Lite | 30 | 1,000,000 | 1,500 |
Free tier limits are per API key. Creating multiple keys does not increase your aggregate limit.
Paid Tier Rate Limits
Activated by enabling billing on your Google Cloud project. Paid tier removes daily request limits and significantly increases RPM and TPM.
| Model | RPM | TPM | RPD |
|---|---|---|---|
| Gemini 2.5 Pro | 1,000 | 4,000,000 | Unlimited |
| Gemini 2.5 Flash | 2,000 | 4,000,000 | Unlimited |
| Gemini 2.0 Flash | 2,000 | 4,000,000 | Unlimited |
| Gemini 2.0 Flash Lite | 4,000 | 4,000,000 | Unlimited |
Paid tier limits shown are defaults. They can be increased through quota requests in Google Cloud Console.
How to Increase Your Rate Limits
If the default paid tier limits are not sufficient for your application, there are several paths to get higher quotas.
1. Enable Paid Billing
The simplest way to get higher limits is to upgrade from the free tier. Enabling billing on your Google Cloud project automatically unlocks paid tier limits, which are 10x to 200x higher than the free tier depending on the model. No application or review process is required.
2. Request a Quota Increase
In the Google Cloud Console, navigate to IAM & Admin > Quotas. Find the Generative Language API quotas, select the limit you want to increase, and submit a request. Most requests are processed within 24 to 48 hours. Include your expected usage pattern and business justification for faster approval.
3. Contact Google Cloud Sales
For enterprise workloads requiring more than 10,000 RPM or custom SLAs, engage with Google Cloud sales directly. They can set up custom rate limits, dedicated capacity, and negotiated pricing. This is common for companies spending more than $10,000 per month on the Gemini API.
4. Use Multiple Regions (Vertex AI)
Vertex AI rate limits are per-region. If you deploy your application across multiple regions (e.g., us-central1 and europe-west4), each region has its own independent quota. This effectively multiplies your available capacity without requiring a quota increase.
Rate Limit Headers and Error Handling
When you hit a rate limit, the API returns a 429 Too Many Requests status code along with headers that tell you when you can retry.
HTTP/1.1 429 Too Many Requests Content-Type: application/json Retry-After: 30 X-RateLimit-Limit: 1000 X-RateLimit-Remaining: 0 X-RateLimit-Reset: 1711929600
| Header | Description |
|---|---|
| Retry-After | Seconds to wait before retrying the request |
| X-RateLimit-Limit | Your maximum allowed requests per minute |
| X-RateLimit-Remaining | Requests remaining in the current window |
| X-RateLimit-Reset | Unix timestamp when the rate limit window resets |
Recommended Retry Strategy
Implement exponential backoff with jitter. This prevents all your retries from hitting the API at the exact same time, which would just trigger another rate limit. Here is the recommended approach:
- 1On first
429, wait 1 second plus random jitter (0 to 500ms) - 2On second retry, wait 2 seconds plus jitter
- 3On third retry, wait 4 seconds plus jitter
- 4Cap at 5 retries or 32 seconds, whichever comes first
- 5If the
Retry-Afterheader is present, always prefer its value over your backoff calculation
AI Studio vs Vertex AI Rate Limits
Google provides two ways to access the Gemini API: Google AI Studio and Vertex AI. They have different rate limit structures, pricing models, and scaling options.
| Feature | Google AI Studio | Vertex AI |
|---|---|---|
| Free tier | Yes | Limited trial credits |
| Rate limit scope | Per API key | Per project, per region |
| Quota increase | Upgrade to paid tier | Custom via Cloud Console |
| Multi-region scaling | No | Yes |
| SLA available | No | Yes (99.9%) |
| Best for | Prototyping, small apps | Production, enterprise |
For most startups and small teams, Google AI Studio is the right starting point. Its free tier is generous and the paid tier supports up to 2,000 RPM on Flash models. Migrate to Vertex AI when you need SLAs, fine-grained access control (IAM), private networking, or multi-region deployments. Read our full comparison at AI Studio vs Vertex AI.
Frequently Asked Questions
What are the Gemini API rate limits for the free tier?
Free tier rate limits vary by model. Gemini 2.0 Flash allows 15 requests per minute (RPM), 1 million tokens per minute (TPM), and 1,500 requests per day (RPD). Gemini 2.5 Flash allows 10 RPM and 500 RPD. Gemini 2.5 Pro is the most restricted at 5 RPM and 25 RPD. All free tier usage is through Google AI Studio.
How do I increase my Gemini API rate limits?
For Google AI Studio, upgrade to paid billing to get higher limits automatically. For Vertex AI, request a quota increase through the Google Cloud Console under IAM & Admin > Quotas. For very high-volume needs (over 1,000 RPM), contact Google Cloud sales for a custom agreement.
What happens when I hit the Gemini API rate limit?
The API returns HTTP 429 (Too Many Requests) with a Retry-After header indicating how long to wait. Best practice is to implement exponential backoff: wait 1 second on first retry, then 2, 4, 8 seconds, with jitter. Never retry immediately in a tight loop, as this can extend your rate limit window.
Are Vertex AI rate limits different from AI Studio?
Yes. Vertex AI generally offers higher rate limits than Google AI Studio, especially for enterprise customers. Vertex AI limits are configured per-project and per-region, and can be increased through quota requests. AI Studio limits are fixed per API key tier (free or paid). Vertex AI also supports multi-region deployments for effectively multiplied capacity.
Need help choosing between AI Studio and Vertex AI?
Read our detailed comparison or estimate your costs with the calculator.