AIFreeAPI Logo

Gemini API Token Pricing: Current March 2026 Cost Guide

A
16 min readAPI Pricing

Gemini API token pricing in March 2026 runs from $0.10 per 1M input tokens on Gemini 2.5 Flash-Lite to $2.00 on Gemini 3.1 Pro Preview for prompts up to 200K tokens. This guide shows the current rates, the batch discounts, and the billing rules that change your real spend.

Gemini API token pricing table for March 2026

Gemini API token pricing in March 2026 runs from $0.10 per 1M input tokens on Gemini 2.5 Flash-Lite to $2.00 per 1M input tokens on Gemini 3.1 Pro Preview for prompts up to 200K tokens. On the output side, the current range is $0.40 to $12.00 per 1M output tokens for the main text-model lanes, with higher long-context pricing on Pro-class models and additional charges for features such as grounding, context caching, and audio input on some models. If you want the short answer, Gemini 2.5 Flash-Lite is still the cheapest stable option, Gemini 3.1 Flash-Lite Preview is the cheapest Gemini 3 option, and Gemini 3.1 Pro Preview is the current premium text lane after Google shut down Gemini 3 Pro Preview on March 9, 2026 according to the models page.

Most of the current SERP already gives you a price table somewhere, but that is not the hard part. The hard part is knowing which table still reflects Google's live model lineup, which price belongs to the Gemini Developer API instead of Vertex or AI Studio, and which extra billing rules will move your real spend away from the neat top-row number. That is why this page stays narrow: current token rates first, then the billing mechanics that actually change the decision.

Key Takeaways

  • Cheapest stable text model: Gemini 2.5 Flash-Lite at $0.10 input and $0.40 output per 1M tokens.
  • Cheapest Gemini 3 text model: Gemini 3.1 Flash-Lite Preview at $0.25 input for text, image, and video, plus $1.50 output.
  • Current premium lane: Gemini 3.1 Pro Preview at $2.00 input and $12.00 output up to 200K tokens, then $4.00 and $18.00 above 200K.
  • Best default for many production apps: Gemini 2.5 Flash remains the safest balanced lane at $0.30 input and $2.50 output if you want a stable model with better reasoning than Flash-Lite.
  • Fastest way to cut costs: Batch pricing roughly halves the standard token rates on the current pricing page.
  • Most common billing mistake: developers compare headline token prices but forget cache charges, cache storage, audio premiums, grounding charges, or the 200K context threshold on Pro-class models.

Gemini API token pricing table for March 2026

Pricing matrix grouping Gemini API models into budget, balanced, and premium lanes with current March 2026 input and output token prices.
Pricing matrix grouping Gemini API models into budget, balanced, and premium lanes with current March 2026 input and output token prices.

The official Gemini Developer API pricing page is the source of truth, but it is not the fastest page to interpret if you just want the current token rates side by side. This table focuses on the live text-model lanes most developers actually compare today.

ModelStandard inputStandard outputBatch inputBatch outputNotes
Gemini 3.1 Pro Preview$2.00 per 1M tokens up to 200K, $4.00 above 200K$12.00 up to 200K, $18.00 above 200K$1.00 up to 200K, $2.00 above 200K$6.00 up to 200K, $9.00 above 200KPaid only, current top-end text lane
Gemini 3 Flash Preview$0.50 per 1M text, image, or video tokens; $1.00 audio$3.00$0.25 text, image, or video; $0.50 audio$1.50Fast Gemini 3 lane, free tier available
Gemini 3.1 Flash-Lite Preview$0.25 per 1M text, image, or video tokens; $0.50 audio$1.50$0.125 text, image, or video; $0.25 audio$0.75Cheapest Gemini 3 text lane
Gemini 2.5 Pro$1.25 up to 200K, $2.50 above 200K$10.00 up to 200K, $15.00 above 200K$0.625 up to 200K, $1.25 above 200K$5.00 up to 200K, $7.50 above 200KStrong lower-cost alternative to 3.1 Pro
Gemini 2.5 Flash$0.30 per 1M text, image, or video tokens; $1.00 audio$2.50$0.15 text, image, or video; $0.50 audio$1.25Best balanced stable lane for many apps
Gemini 2.5 Flash-Lite$0.10 per 1M text, image, or video tokens; $0.30 audio$0.40$0.05 text, image, or video; $0.15 audio$0.20Cheapest stable lane overall

Two details matter immediately.

First, Google's current lineup mixes stable 2.5 models with preview Gemini 3 models, so "latest" and "cheapest" are not the same thing. If your main goal is the lowest stable text cost, 2.5 Flash-Lite still beats the Gemini 3 preview options. If your goal is to stay on the Gemini 3 surface, then 3.1 Flash-Lite Preview is the real budget lane.

Second, several older pages still cite Gemini 3 Pro Preview. That is now stale. Google's live models page explicitly warns that Gemini 3 Pro Preview was deprecated and shut down on March 9, 2026, and points users to Gemini 3.1 Pro Preview instead. If you see an older comparison that treats Gemini 3 Pro Preview as active, assume the rest of its pricing page may also be stale.

Which Gemini model should you budget for?

Decision board mapping common Gemini API workload types to the most sensible budget, balanced, reasoning, and premium model choices.
Decision board mapping common Gemini API workload types to the most sensible budget, balanced, reasoning, and premium model choices.

The answer depends less on "which model is newest" and more on how much reasoning quality you really need per request.

If you care mostly about low cost, Gemini 2.5 Flash-Lite is still the cleanest answer. At $0.10 input and $0.40 output per 1M tokens, it remains the cheapest stable text lane in Google's API lineup. That makes it a strong fit for classification, extraction, translation, routing, lightweight chat, or any pipeline where you value throughput more than maximum reasoning depth.

If you want the safer middle ground, Gemini 2.5 Flash is still the practical default for many production teams. It costs more than Flash-Lite, but not dramatically more in the kinds of workloads that matter to startups and internal tools. At $0.30 input and $2.50 output per 1M tokens, it is much cheaper than the Pro lanes and usually good enough for customer support bots, internal copilots, document Q&A, or lightweight agent workflows. If you are not sure where to start, this is still the lane I would budget first.

If you specifically want to stay on the Gemini 3 family without paying Pro pricing, Gemini 3.1 Flash-Lite Preview is the budget path. It is not as cheap as 2.5 Flash-Lite, but it gives you the current Gemini 3 track at $0.25 input and $1.50 output. That matters if your organization prefers to stay on the newest family and can tolerate preview-model rate-limit and change risk.

If your workload is genuinely reasoning-heavy, the real choice is between Gemini 2.5 Pro and Gemini 3.1 Pro Preview. Gemini 2.5 Pro is materially cheaper at $1.25 and $10.00 up to 200K tokens, while Gemini 3.1 Pro Preview is the premium lane at $2.00 and $12.00. The price delta is not small enough to ignore. For code generation, long-form synthesis, or agent planning, you should assume 3.1 Pro is the premium decision rather than the obvious default.

That is the part most generic pricing guides do not state clearly: the Gemini lineup is not one clean staircase where the newest model is automatically the best buy. In March 2026, the sensible budgeting question is:

  • use 2.5 Flash-Lite if cost is everything
  • use 2.5 Flash if you want the safest stable default
  • use 3.1 Flash-Lite Preview if you want the cheapest Gemini 3 lane
  • use 2.5 Pro if you need strong reasoning without paying the maximum Gemini 3 premium
  • use 3.1 Pro Preview only when the extra reasoning quality is worth the higher token rate

If your project is still in the testing stage, also remember that the billing FAQ and pricing page both make a distinction between free access and paid usage. Some models still offer free-tier access, but that does not mean every Gemini API surface is free, and it does not mean your later production bill will resemble your AI Studio experiments.

What your Gemini bill actually includes

Layered billing graphic showing the main factors that change a Gemini API bill beyond basic input and output token rates.
Layered billing graphic showing the main factors that change a Gemini API bill beyond basic input and output token rates.

This is where many "Gemini pricing" pages stop too early. Google's billing page says Gemini API billing is based on input token count, output token count, cached token count, and cached token storage duration. In other words, you are not only paying for the text you typed and the text the model returned.

You also need a usable mental model for token counting. Google's token guide says one Gemini token is roughly 4 characters, and 100 tokens is roughly 60 to 80 English words. That is not precise enough for billing, but it is good enough to stop making bad back-of-the-envelope estimates. A short 300-word prompt is not expensive. A retrieval-heavy prompt that repeatedly attaches long system instructions, tool traces, or large document chunks is where your bill starts to move.

The next surprise is that not every token is priced the same way. On several models, audio input costs more than text input. On Pro-class models, requests above 200K prompt tokens move into a higher price tier. And if you use context caching, you may save on repeated input processing while also paying for cached tokens and storage time. That is why "Gemini costs $X per million tokens" is only the headline, not the whole budget.

Here is the short billing-modifier table most searchers actually need:

Billing modifierWhat changesWhy it matters
Pro-class prompts above 200K tokensGemini 3.1 Pro Preview jumps from $2.00 to $4.00 input and from $12.00 to $18.00 output; Gemini 2.5 Pro jumps from $1.25 to $2.50 input and from $10.00 to $15.00 outputLong-context requests can cost much more than the headline row suggests
Audio inputFlash and Flash-Lite models charge higher audio-input rates than text, image, or video inputVoice and multimodal apps are often under-budgeted if you ignore this
Batch modeBatch pricing cuts standard rates roughly in half on the main text-model lanesThis is the easiest savings lever for async workloads
Context cachingGoogle charges cached token count plus storage durationCaching can reduce repeated compute, but it is not "free memory"
GroundingSearch or Maps grounding can add separate per-query chargesRetrieval quality may improve, but your bill is no longer only token-based
Failed requests400 and 500 failures are not billed, but still count against quotaError storms hurt throughput even when they do not increase spend

Grounding is also not priced uniformly across the current Gemini lineup. On Google's live March 22, 2026 pricing page, Gemini 3.1 Pro Preview, Gemini 3.1 Flash-Lite Preview, and Gemini 3 Flash Preview show $14 per 1,000 Google Search or Maps queries after their free allowance. Gemini 2.5 Pro, Gemini 2.5 Flash, and Gemini 2.5 Flash-Lite show a higher pattern of $35 per 1,000 grounded Google Search prompts and $25 per 1,000 grounded Maps prompts after their free allowance. If grounding is central to your app, check the exact model row instead of assuming every Gemini model uses the same retrieval surcharge.

Two of those rows deserve extra emphasis.

Batch mode is the cleanest price lever for offline or asynchronous jobs. If you are generating reports, evaluating data, rewriting content, or running nightly backfills, batch pricing usually deserves to be your default estimate before you even compare vendors. Cutting the token rate in half often matters more than shaving a few cents off the standard row.

Context caching is the most misunderstood Gemini pricing feature. Teams sometimes talk about it as if it simply makes repeated prompts cheap. That is directionally true, but incomplete. Google charges for cached tokens and storage duration, so you should treat caching like an optimization feature, not free persistence. If your app reuses large prompt prefixes or shared context, caching can still be a major win. If it does not, forcing caching into the design will not magically lower your bill. If you want the deeper implementation side, the existing guide on Gemini API context caching and cost reduction is the better follow-up read.

When Gemini pricing changes faster than expected

Three patterns create the biggest gap between the rate people remember and the amount they actually pay.

The first is the 200K prompt threshold on the Pro lanes. Developers often remember the base number, then forget that large prompts move into a higher bracket. The practical consequence is simple: if you are attaching large RAG contexts, codebase chunks, or long multi-turn histories, you should not assume the cheaper Pro row applies. This is also why some long-context use cases that sound "Pro-shaped" still end up making more financial sense on Flash plus better retrieval discipline.

The second is free-tier confusion. Gemini's pricing surface makes it easy to blur Google AI Studio experimentation, model availability, and paid API billing into one story. But the billing FAQ is explicit: AI Studio usage remains free unless you link a paid API key for access to paid features, and free-tier behavior differs by model. That means "I tried it for free in AI Studio" is not a valid cost estimate for production.

The third is grounding and quota confusion. Token pricing is only one layer of Gemini cost planning. Google's rate-limits page says limits are applied per project, not per API key, and that rate limits depend on the model and usage tier. Once you enable billing and move into paid tiers, pricing and quota decisions start interacting. For example, if you batch async jobs to cut token cost, you may still need to plan around batch queue limits. If you hit 429s regularly, the operational question shifts from "what is the cheapest per-token rate?" to "which lane gives me the throughput I actually need?" The follow-up guide on fixing Gemini API 429 Resource Exhausted errors is more useful than another pricing table at that point.

This is also why I would not over-index on tiny rate differences between similar models unless the workload is huge. The bigger budget swings usually come from model choice, long-context behavior, batch usage, or preventable prompt bloat.

Gemini Developer API vs Vertex AI vs AI Studio pricing confusion

This keyword brings in a lot of mismatched pages because search results regularly collapse three different surfaces into one conversation.

The first surface is the Gemini Developer API pricing page, which is what this article is about. That is the cleanest answer for most developers who are using Gemini directly through Google's public developer stack. The second surface is Vertex AI, which exposes the Gemini models inside Google Cloud's broader enterprise platform. The third surface is Google AI Studio, which is an experimentation interface and not the same thing as a permanent production billing model.

The problem is that many ranking pages combine those surfaces with consumer Gemini subscription plans, Workspace add-ons, or even Google developer-seat products. That makes the page feel comprehensive, but it makes the exact token-pricing question harder to answer.

The practical rule is:

  • use the Gemini Developer API pricing page when you are pricing direct Gemini API calls
  • use the Vertex AI pricing page when your workload is actually being routed through Vertex and enterprise billing matters
  • use AI Studio only as an experimentation surface, not as your final production pricing model

In March 2026, the interesting detail is that Vertex AI broadly mirrors the same Gemini token price patterns, but it also exposes enterprise-oriented lanes such as priority or flex and batch pricing more explicitly. If you are reading a third-party page and it does not tell you which surface its price table belongs to, treat that page as suspicious until you confirm the official source.

This is also the cleanest place to say what page one often leaves implicit: a Gemini pricing article should not spend half the page on app subscriptions if the query is about token pricing. That is one of the main reasons a narrower page can beat the existing SERP average here.

Example monthly cost math for common workloads

The easiest way to make token pricing useful is to turn it into a few realistic workload estimates. These are not official quotes. They are simple planning examples using the current March 2026 standard rates.

Example 1: small customer-support bot on Gemini 2.5 Flash

Assume you process 30M input tokens and 10M output tokens per month. At Gemini 2.5 Flash standard pricing, that is:

  • input: 30 x $0.30 = $9.00
  • output: 10 x $2.50 = $25.00
  • estimated monthly total: $34.00

That is why 2.5 Flash is still such a strong default. It is cheap enough for production experimentation without forcing you all the way down to Flash-Lite.

Example 2: high-volume routing or extraction service on Gemini 2.5 Flash-Lite

Assume 200M input tokens and 40M output tokens per month. At 2.5 Flash-Lite standard pricing:

  • input: 200 x $0.10 = $20.00
  • output: 40 x $0.40 = $16.00
  • estimated monthly total: $36.00

That is an important reminder that the cheapest output lane can matter more than the cheapest input lane if your app produces a lot of text.

Example 3: premium coding or synthesis workload on Gemini 3.1 Pro Preview

Assume 20M input tokens and 4M output tokens per month, with prompts staying under 200K each:

  • input: 20 x $2.00 = $40.00
  • output: 4 x $12.00 = $48.00
  • estimated monthly total: $88.00

That is not absurdly expensive, but it is still meaningfully higher than using Gemini 2.5 Pro for the same token volume:

  • 2.5 Pro input: 20 x $1.25 = $25.00
  • 2.5 Pro output: 4 x $10.00 = $40.00
  • estimated monthly total: $65.00

In other words, the premium for 3.1 Pro is real even before you cross the 200K threshold.

Example 4: async backfill job with batch mode

Take the first example again, but run it through batch mode instead of standard mode on 2.5 Flash:

  • input: 30 x $0.15 = $4.50
  • output: 10 x $1.25 = $12.50
  • estimated monthly total: $17.00

That is why batch pricing is the first optimization lever worth checking before you redesign your whole stack or migrate vendors.

If you are still in the free-testing stage, the better companion read is the repo's Gemini API free quota 2026 guide. If you already know your workload shape and are only comparing the Gemini 3 fast lane, the narrower Gemini 3 Flash API price guide may be enough.

The bottom line is simple. The current Gemini API token-pricing question is not really "what is the price?" It is "which Gemini lane matches my workload, and which billing modifiers will change the real number?" Once you answer that, the rest of the Gemini pricing surface becomes much easier to navigate.

Nano Banana Pro

4K Image80% OFF

Google Gemini 3 Pro Image · AI Image Generation

Served 100K+ developers
$0.24/img
$0.05/img
Limited Offer·Enterprise Stable·Alipay/WeChat
Gemini 3
Native model
Direct Access
20ms latency
4K Ultra HD
2048px
30s Generate
Ultra fast
|@laozhang_cn|Get $0.05

200+ AI Models API

Jan 2026
GPT-5.2Claude 4.5Gemini 3Grok 4+195
Image
80% OFF
gemini-3-pro-image$0.05

GPT-Image-1.5 · Flux

Video
80% OFF
Veo3 · Sora2$0.15/gen
16% OFF5-Min📊 99.9% SLA👥 100K+