---
title: Model costs
description: Public, validated pricing and capability data for AI model providers (LLMs, STT, TTS).
license: CC-BY-4.0
version: 2
verified: 2026-05-26
generated_at: 2026-06-03T14:44:15.590Z
source: https://github.com/hail-hq/hail/tree/main/costs
---

# Model costs

Public, validated pricing and capability data for AI model providers — large language models, speech-to-text, and text-to-speech. Schema-validated, dual-licensed CC-BY-4.0 for free reuse.

## At a glance

- **Models:** 90 (43 LLM · 23 STT · 24 TTS)
- **Providers:** 25
- **LLM output range:** $0.28 – $60.00 / Mtok
- **STT range:** $0.000667 – $0.0167 / min
- **TTS range:** $15.00 – $160.00 / 1M chars
- **Verified:** 2026-05-26

## For agents

Fetch the raw JSON for programmatic use — this is the source of truth, schema-validated on every PR:

- LLMs: <https://raw.githubusercontent.com/hail-hq/hail/main/costs/llm.json>
- STT: <https://raw.githubusercontent.com/hail-hq/hail/main/costs/stt.json>
- TTS: <https://raw.githubusercontent.com/hail-hq/hail/main/costs/tts.json>

JSON Schemas: <https://github.com/hail-hq/hail/tree/main/costs/schema>

## Large language models

| Provider | Model | Output $/MTok | Input $/MTok | Cached $/MTok | Context | Out cap | Tools | Modalities | Verified | Source |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| Anthropic | Claude Opus 4.7 (`claude-opus-4-7`) | $25.00 | $5.00 | $0.5000 | 1,000,000 | 128,000 | yes | in: text,image / out: text | 2026-05-17 | [link](https://www.anthropic.com/pricing) |
| Anthropic | Claude Sonnet 4.6 (`claude-sonnet-4-6`) | $15.00 | $3.00 | $0.3000 | 1,000,000 | 64,000 | yes | in: text,image / out: text | 2026-05-17 | [link](https://www.anthropic.com/pricing) |
| OpenAI | GPT-5 (`gpt-5`) | $10.00 | $1.25 | $0.1250 | 400,000 | 128,000 | yes | in: text,image / out: text | 2026-05-17 | [link](https://developers.openai.com/api/docs/models/gpt-5) |
| Google | Gemini 2.5 Pro (`gemini-2.5-pro`) | $10.00 | $1.25 | $0.1250 | 1,048,576 | 65,536 | yes | in: text,image,audio,video / out: text | 2026-05-26 | [link](https://ai.google.dev/pricing) |
| DeepSeek | DeepSeek V4 Flash (`deepseek-v4-flash`) | $0.28 | $0.14 | $0.0028 | 1,048,576 | 384,000 | yes | in: text / out: text | 2026-05-18 | [link](https://api-docs.deepseek.com/quick_start/pricing) |
| DeepSeek | DeepSeek V4 Pro (`deepseek-v4-pro`) | $3.48 | $1.74 | $0.0145 | 1,048,576 | 384,000 | yes | in: text / out: text | 2026-05-26 | [link](https://api-docs.deepseek.com/quick_start/pricing) |
| DeepSeek | DeepSeek V3 (`deepseek-chat`) | $0.28 | $0.14 | $0.0028 | 1,048,576 | 384,000 | yes | in: text / out: text | 2026-05-18 | [link](https://api-docs.deepseek.com/quick_start/pricing) |
| DeepSeek | DeepSeek R1 (`deepseek-reasoner`) | $0.28 | $0.14 | $0.0028 | 1,048,576 | 384,000 | yes | in: text / out: text | 2026-05-18 | [link](https://api-docs.deepseek.com/quick_start/pricing) |
| Anthropic | Claude Haiku 4.5 (`claude-haiku-4-5-20251001`) | $5.00 | $1.00 | $0.1000 | 200,000 | 64,000 | yes | in: text,image / out: text | 2026-05-17 | [link](https://platform.claude.com/docs/en/about-claude/pricing) |
| Anthropic | Claude Sonnet 4.5 (`claude-sonnet-4-5-20250929`) | $15.00 | $3.00 | $0.3000 | 200,000 | 64,000 | yes | in: text,image / out: text | 2026-05-17 | [link](https://platform.claude.com/docs/en/about-claude/pricing) |
| Anthropic | Claude Opus 4.5 (`claude-opus-4-5-20251101`) | $25.00 | $5.00 | $0.5000 | 200,000 | 64,000 | yes | in: text,image / out: text | 2026-05-17 | [link](https://platform.claude.com/docs/en/about-claude/pricing) |
| Anthropic | Claude Sonnet 3.7 (`claude-3-7-sonnet-20250219`) | $15.00 | $3.00 | $0.3000 | 200,000 | 64,000 | yes | in: text,image / out: text | 2026-05-17 | [link](https://platform.claude.com/docs/en/about-claude/model-deprecations) |
| Anthropic | Claude Haiku 3.5 (`claude-3-5-haiku-20241022`) | $4.00 | $0.80 | $0.0800 | 200,000 | 8,192 | yes | in: text,image / out: text | 2026-05-17 | [link](https://platform.claude.com/docs/en/about-claude/pricing) |
| OpenAI | GPT-5 mini (`gpt-5-mini`) | $2.00 | $0.25 | $0.0250 | 400,000 | 128,000 | yes | in: text,image / out: text | 2026-05-17 | [link](https://developers.openai.com/api/docs/models/gpt-5-mini) |
| OpenAI | GPT-5 nano (`gpt-5-nano`) | $0.40 | $0.05 | $0.0050 | 400,000 | 128,000 | yes | in: text,image / out: text | 2026-05-17 | [link](https://developers.openai.com/api/docs/models/gpt-5-nano) |
| OpenAI | GPT-4.1 (`gpt-4.1`) | $8.00 | $2.00 | $0.5000 | 1,047,576 | 32,768 | yes | in: text,image / out: text | 2026-05-17 | [link](https://developers.openai.com/api/docs/models/gpt-4.1) |
| OpenAI | GPT-4o (`gpt-4o`) | $10.00 | $2.50 | $1.2500 | 128,000 | 16,384 | yes | in: text,image / out: text | 2026-05-17 | [link](https://developers.openai.com/api/docs/models/gpt-4o) |
| OpenAI | GPT-4o mini (`gpt-4o-mini`) | $0.60 | $0.15 | $0.0750 | 128,000 | 16,384 | yes | in: text,image / out: text | 2026-05-17 | [link](https://developers.openai.com/api/docs/models/gpt-4o-mini) |
| OpenAI | OpenAI o3 (`o3`) | $8.00 | $2.00 | $0.5000 | 200,000 | 100,000 | yes | in: text,image / out: text | 2026-05-17 | [link](https://developers.openai.com/api/docs/models/o3) |
| OpenAI | OpenAI o4-mini (`o4-mini`) | $4.40 | $1.10 | $0.2750 | 200,000 | 100,000 | yes | in: text,image / out: text | 2026-05-17 | [link](https://developers.openai.com/api/docs/models/o4-mini) |
| OpenAI | OpenAI o1 (`o1`) | $60.00 | $15.00 | $7.5000 | 200,000 | 100,000 | yes | in: text,image / out: text | 2026-05-17 | [link](https://developers.openai.com/api/docs/models/o1) |
| Google | Gemini 2.5 Flash (`gemini-2.5-flash`) | $2.50 | $0.30 | $0.0300 | 1,048,576 | 65,536 | yes | in: text,image,audio,video / out: text | 2026-05-26 | [link](https://ai.google.dev/gemini-api/docs/pricing) |
| Google | Gemini 2.5 Flash-Lite (`gemini-2.5-flash-lite`) | $0.40 | $0.10 | $0.0100 | 1,048,576 | 65,536 | yes | in: text,image,audio,video / out: text | 2026-05-26 | [link](https://ai.google.dev/gemini-api/docs/pricing) |
| Google | Gemini 2.0 Flash (`gemini-2.0-flash`) | $0.40 | $0.10 | $0.0250 | 1,048,576 | 8,192 | yes | in: text,image,audio,video / out: text | 2026-05-26 | [link](https://ai.google.dev/gemini-api/docs/pricing) |
| Meta | Llama 4 Maverick (`llama-4-maverick`) | $0.85 | $0.27 | — | 1,048,576 | 8,192 | yes | in: text,image / out: text | 2026-05-26 | [link](https://huggingface.co/meta-llama/Llama-4-Maverick-17B-128E-Instruct) |
| Meta | Llama 4 Scout (`llama-4-scout`) | $0.34 | $0.11 | — | 10,485,760 | 8,192 | yes | in: text,image / out: text | 2026-05-26 | [link](https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E-Instruct) |
| Meta | Llama 3.3 70B Instruct (`llama-3.3-70b`) | $0.79 | $0.59 | — | 131,072 | 8,192 | yes | in: text / out: text | 2026-05-26 | [link](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) |
| Mistral | Mistral Large 2 (24.11) (`mistral-large-2411`) | $6.00 | $2.00 | — | 131,072 | 8,192 | yes | in: text / out: text | 2026-05-18 | [link](https://mistral.ai/news/mistral-large-2407) |
| Mistral | Mistral Medium 3 (`mistral-medium-2505`) | $2.00 | $0.40 | — | 131,072 | 8,192 | yes | in: text / out: text | 2026-05-18 | [link](https://mistral.ai/news/mistral-medium-3) |
| Mistral | Mistral Small 3 (`mistral-small-2501`) | $0.30 | $0.10 | — | 32,768 | 8,192 | yes | in: text / out: text | 2026-05-18 | [link](https://mistral.ai/news/mistral-small-3) |
| Mistral | Codestral 25.08 (`codestral-2508`) | $0.90 | $0.30 | — | 262,144 | 8,192 | yes | in: text / out: text | 2026-05-18 | [link](https://mistral.ai/news/codestral-25-08) |
| Mistral | Pixtral Large (`pixtral-large-2411`) | $6.00 | $2.00 | — | 131,072 | 8,192 | yes | in: text,image / out: text | 2026-05-18 | [link](https://mistral.ai/news/pixtral-large) |
| Cohere | Command A (`command-a-03-2025`) | $10.00 | $2.50 | — | 256,000 | 8,000 | yes | in: text / out: text | 2026-05-18 | [link](https://docs.cohere.com/docs/command-a) |
| Cohere | Command R+ (`command-r-plus-08-2024`) | $10.00 | $2.50 | — | 128,000 | 4,000 | yes | in: text / out: text | 2026-05-18 | [link](https://cohere.com/pricing) |
| xAI | Grok 4 (`grok-4-0709`) | $15.00 | $3.00 | $0.7500 | 256,000 | 256,000 | yes | in: text,image / out: text | 2026-05-19 | [link](https://docs.x.ai/developers/migration/may-15-retirement) |
| xAI | Grok 3 (`grok-3`) | $15.00 | $3.00 | $0.7500 | 131,072 | 131,072 | yes | in: text / out: text | 2026-05-19 | [link](https://docs.x.ai/developers/migration/may-15-retirement) |
| xAI | Grok Code Fast 1 (`grok-code-fast-1`) | $1.50 | $0.20 | $0.0200 | 256,000 | 256,000 | yes | in: text / out: text | 2026-05-19 | [link](https://docs.x.ai/developers/migration/may-15-retirement) |
| xAI | Grok 4.3 (`grok-4.3`) | $2.50 | $1.25 | — | 1,000,000 | 1,000,000 | yes | in: text,image / out: text | 2026-05-19 | [link](https://docs.x.ai/docs/models) |
| Alibaba | Qwen3-Max (`qwen3-max`) | $6.00 | $1.20 | — | 262,144 | 32,768 | yes | in: text / out: text | 2026-05-19 | [link](https://www.alibabacloud.com/help/en/model-studio/model-pricing) |
| Alibaba | Qwen3-Coder-Plus (`qwen3-coder-plus`) | $5.00 | $1.00 | — | 1,048,576 | 65,536 | yes | in: text / out: text | 2026-05-19 | [link](https://www.alibabacloud.com/help/en/model-studio/model-pricing) |
| Perplexity | Sonar (`sonar`) | $1.00 | $1.00 | — | 128,000 | 8,000 | no | in: text / out: text | 2026-05-19 | [link](https://docs.perplexity.ai/getting-started/pricing) |
| Perplexity | Sonar Pro (`sonar-pro`) | $15.00 | $3.00 | — | 200,000 | 8,000 | no | in: text / out: text | 2026-05-19 | [link](https://docs.perplexity.ai/getting-started/pricing) |
| Perplexity | Sonar Reasoning Pro (`sonar-reasoning-pro`) | $8.00 | $2.00 | — | 128,000 | 8,000 | no | in: text / out: text | 2026-05-19 | [link](https://docs.perplexity.ai/getting-started/pricing) |

**Notes:**

- **Anthropic Claude Opus 4.7** — Cache hit pricing is 0.1x base input ($0.50/MTok); 5-minute cache write is 1.25x base ($6.25/MTok); 1-hour cache write is 2x ($10/MTok). Batch API discounts both input and output by 50%. Knowledge cutoff Jan 2026. Uses a new tokenizer vs prior Claude models (may use up to 35% more tokens for identical text). Supports adaptive thinking (no extended-thinking toggle); thinking output tokens are billed at the output rate.
- **Anthropic Claude Sonnet 4.6** — Cache hit pricing is 0.1x base input ($0.30/MTok); 5-minute cache write is 1.25x base ($3.75/MTok); 1-hour cache write is 2x ($6/MTok). Batch API discounts both input and output by 50%. Knowledge cutoff Aug 2025. Supports extended thinking and adaptive thinking; thinking output tokens are billed at the output rate.
- **OpenAI GPT-5** — Reasoning model with adjustable reasoning_effort; reasoning tokens are billed at the output rate. Context window 400k, max output 128k confirmed against developers.openai.com/api/docs/models/gpt-5 (previous row had max_output_tokens as low-confidence). Cached input at 10% of base ($0.125/MTok). Batch API at flat 50% off input and output. PDF input via the Files API; image input native; audio is NOT supported on this model_id. Knowledge cutoff Sept 2024. Not on the April 2026 deprecation list.
- **Google Gemini 2.5 Pro** — Input pricing shown is for prompts <=200k tokens; prompts >200k tokens are billed per pricing_tiers; cached input also tiers at 200k (the >200k rate is captured in pricing_tiers[0].cache_read_per_mtok_usd). Context window and max_output_tokens confirmed via ai.google.dev/gemini-api/docs/models/gemini-2.5-pro. Thinking is always on and cannot be disabled; thinking tokens are billed at the output rate. Knowledge cutoff January 2025. Batch Mode discount is a flat 50% off input/output. Audio input is billed at the standard input rate of $1.25/MTok (no separate audio premium, unlike 2.5 Flash/Flash-Lite/2.0 Flash); audio_input_per_mtok_usd omitted. Explicit context caching storage is captured in cache_storage_per_mtok_per_hour_usd. Free tier (AI Studio) is published as per-minute RPM/TPM only, not per-day; free_tier omitted. PDF input via the Files API; image, audio, and video native.
- **DeepSeek DeepSeek V4 Flash** — MoE architecture: 284B total parameters, 13B activated. Replaces deepseek-chat (V3-era alias). Cached input price ($0.0028/MTok) is 1/50th of base input, effective after DeepSeek reduced cache hit rates on 2026-04-26. Supports both non-thinking and thinking (default) modes; thinking output is billed at the same output rate, so reasoning_tokens_billed: true. deepseek-chat and deepseek-reasoner legacy aliases are scheduled for discontinuation on 2026-07-24; until then they route to V4 Flash non-thinking and thinking modes respectively.
- **DeepSeek DeepSeek V4 Pro** — Pro-tier sibling to deepseek-v4-flash, positioned for higher-quality responses at lower concurrency (500 vs Flash's 2500). Structured prices ($1.74 / $3.48 per MTok; cache hit $0.0145) are the list rates published by DeepSeek; a 75% launch promotion is in effect until 2026-05-31 15:59 UTC, during which the effective billed rates are $0.435 input / $0.87 output / $0.003625 cache hit. The DeepSeek pricing page also mentions post-expiration rates of 1/4 the original — re-verify after the promo lapses to confirm which interpretation lands. Cache miss is billed at the base input rate. Supports both non-thinking and thinking modes with tool calls and JSON output; thinking output is billed at the output rate, so reasoning_tokens_billed: true. DeepSeek did not publish an explicit launch date; last_changed_at set to verification date.
- **DeepSeek DeepSeek V3** — Canonical API model_id for the DeepSeek V3 lineage (V3 launched 2024-12-26 as deepseek-chat; upgraded through V3-0324, V3.1, V3.1-Terminus, V3.2 by 2025-12-01). Deprecated 2026-04-24 when V4 launched; still callable until scheduled discontinuation 2026-07-24, currently routing to deepseek-v4-flash non-thinking mode (prices captured here reflect that routing). DeepSeek's pricing page no longer publishes V3-era historical rates; standalone deepseek-v3 model_id was never exposed by the API.
- **DeepSeek DeepSeek R1** — Canonical API model_id for the DeepSeek R1 reasoning lineage (R1 launched 2025-01-20 as deepseek-reasoner; R1-0528 update 2025-05-28). Deprecated 2026-04-24 when V4 launched; still callable until scheduled discontinuation 2026-07-24, currently routing to deepseek-v4-flash thinking mode (prices captured here reflect that routing). Reasoning output is billed at the standard output rate (reasoning_tokens_billed: true). DeepSeek's pricing page no longer publishes R1-era historical rates; standalone deepseek-r1 model_id was never exposed by the API.
- **Anthropic Claude Haiku 4.5** — Anthropic's fastest model with near-frontier intelligence; positioned for high-volume agentic workloads. Cache hit is 0.1x base input ($0.10/MTok); 5-minute cache write is 1.25x ($1.25/MTok); 1-hour cache write is 2x ($2/MTok). Batch API discounts both input and output by 50%. Supports extended thinking; thinking output tokens are billed at the output rate. Reliable knowledge cutoff Feb 2025; training data cutoff Jul 2025.
- **Anthropic Claude Sonnet 4.5** — Legacy listing in Anthropic's models overview but still active. Pricing identical to Sonnet 4.6, but 200k context window (vs 1M on Sonnet 4.6). Cache hit is 0.1x base input ($0.30/MTok); 5-minute cache write is 1.25x ($3.75/MTok); 1-hour cache write is 2x ($6/MTok). Batch API discounts both input and output by 50%. Supports extended thinking; thinking output tokens are billed at the output rate. Reliable knowledge cutoff Jan 2025.
- **Anthropic Claude Opus 4.5** — Legacy listing in Anthropic's models overview but still active. Pricing identical to Opus 4.6/4.7, but 200k context window (vs 1M on 4.6/4.7) and 64k max output (vs 128k). Cache hit is 0.1x base input ($0.50/MTok); 5-minute cache write is 1.25x ($6.25/MTok); 1-hour cache write is 2x ($10/MTok). Batch API discounts both input and output by 50%. Supports extended thinking; thinking output tokens are billed at the output rate. Reliable knowledge cutoff May 2025.
- **Anthropic Claude Sonnet 3.7** — RETIRED on the Claude API on 2026-02-19; still available on Amazon Bedrock and Google Vertex AI under partner retirement schedules. Anthropic's first reasoning model with extended thinking; can output up to 64k tokens in thinking mode (128k with the output-128k-2025-02-19 beta header). Prices are no longer listed on Anthropic's current pricing page; values sourced from OpenRouter and pricepertoken.com (confidence: medium). Cache and batch pricing inferred from Anthropic's standard multipliers (1.25x 5-min write, 0.1x cache read, 0.5x batch).
- **Anthropic Claude Haiku 3.5** — RETIRED on the Claude API on 2026-02-19; still listed on Anthropic's pricing page as available on Amazon Bedrock and Google Vertex AI only. No extended-thinking support. max_output_tokens=8192 sourced from Anthropic legacy model card and OpenRouter (confidence: medium — not present in current docs). Cache hit is 0.1x base input ($0.08/MTok); 5-minute cache write is 1.25x ($1/MTok); 1-hour cache write is 2x ($1.60/MTok). Batch API discounts both input and output by 50%.
- **OpenAI GPT-5 mini** — Faster, more cost-efficient GPT-5 variant for low-latency, high-volume workloads. Cached input at 10% of base ($0.025/MTok). Batch API at flat 50% off input and output. Reasoning model with adjustable reasoning_effort; reasoning tokens are billed at the output rate. PDF input via the Files API; image input native. Knowledge cutoff May 2024.
- **OpenAI GPT-5 nano** — Smallest GPT-5 variant; OpenAI model card lists 'Reasoning model: No' with 'Average' reasoning capability — reasoning_tokens_billed set to false on that basis (confidence: medium because other GPT-5 family members are reasoning models). Cached input at 10% of base ($0.005/MTok). Batch API at flat 50% off input and output. Knowledge cutoff May 2024.
- **OpenAI GPT-4.1** — Non-reasoning flagship with a ~1M-token context window (1,047,576). Cached input at 25% of base ($0.50/MTok). Batch API at flat 50% off input and output. PDF input via the Files API; image input native. Knowledge cutoff June 2024.
- **OpenAI GPT-4o** — Deprecated 2026-04-22 (dated alias gpt-4o-2024-05-13 scheduled for shutdown 2026-10-23 per OpenAI deprecations page); still serving on the API as of 2026-05-17. Audio input/output are NOT supported on this model_id — they live on a sibling gpt-4o-audio-preview model card with separate pricing (audio input $40/MTok, audio output $80/MTok); text-mode prices captured here. Cached input at 50% of base ($1.25/MTok). Batch API at flat 50% off input and output. Knowledge cutoff Oct 2023.
- **OpenAI GPT-4o mini** — Not on the April 2026 deprecation list; remains active. Audio input/output are NOT supported on this model_id — they live on a sibling gpt-4o-mini-audio-preview model card; text-mode prices captured here. Cached input at 50% of base ($0.075/MTok). Batch API at flat 50% off input and output. Knowledge cutoff Oct 2023.
- **OpenAI OpenAI o3** — Reasoning model for complex tasks; reasoning tokens are billed at the output rate. Bare 'o3' alias remains active (dated o3-mini-2025-01-31 is deprecated but a separate model). OpenAI's model card notes 'o3 is succeeded by GPT-5' but does not list o3 itself as deprecated. Cached input at 25% of base ($0.50/MTok). Batch API at flat 50% off input and output. Knowledge cutoff June 2024.
- **OpenAI OpenAI o4-mini** — Fast, cost-efficient reasoning model; reasoning tokens are billed at the output rate. Deprecated 2026-04-22 (dated alias o4-mini-2025-04-16 scheduled for shutdown 2026-10-23 per OpenAI deprecations page); still serving as of 2026-05-17. OpenAI's model card notes 'succeeded by GPT-5 mini'. Cached input at 25% of base ($0.275/MTok). Batch API at flat 50% off input and output. Knowledge cutoff June 2024.
- **OpenAI OpenAI o1** — First-generation reasoning model; reasoning tokens are billed at the output rate. Deprecated 2026-04-22 (dated alias o1-2024-12-17 scheduled for shutdown 2026-10-23 per OpenAI deprecations page); still serving as of 2026-05-17. OpenAI's recommended replacement on the developer community post is gpt-5.5; replaced_by_model_id set to in-file successor 'o3' (gpt-5.5 not yet in dataset). Cached input at 50% of base ($7.50/MTok). Batch API at flat 50% off input and output. Knowledge cutoff Oct 2023.
- **Google Gemini 2.5 Flash** — Hybrid reasoning model with dynamic thinking on by default; thinking can be disabled via thinkingBudget=0. When thinking is on, response pricing is the sum of output and thinking tokens (both billed at the output rate). Audio input is priced separately (captured in audio_input_per_mtok_usd); cached audio input is $0.10/MTok vs $0.03/MTok for text/image/video. Batch Mode at flat 50% off; audio batch input is $0.50/MTok. No long-context tier. Context window and max_output_tokens confirmed via ai.google.dev/gemini-api/docs/models/gemini-2.5-flash. Knowledge cutoff January 2025. Explicit context caching storage is captured in cache_storage_per_mtok_per_hour_usd. Free tier (AI Studio) is published as per-minute RPM/TPM only, not per-day; free_tier omitted. Cross-verified prompt/completion price against openrouter.ai/google/gemini-2.5-flash.
- **Google Gemini 2.5 Flash-Lite** — Hybrid reasoning model; thinking is OFF by default (unlike 2.5 Flash/Pro) but can be enabled by setting thinkingBudget. When thinking is enabled, response pricing is the sum of output and thinking tokens at the output rate, so reasoning_tokens_billed is true. Audio input is priced separately (captured in audio_input_per_mtok_usd); cached audio input is $0.03/MTok vs $0.01/MTok for text/image/video. Batch Mode at flat 50% off; audio batch input is $0.15/MTok. No long-context tier. Context window and max_output_tokens confirmed via ai.google.dev/gemini-api/docs/models/gemini-2.5-flash-lite. Knowledge cutoff January 2025. Explicit context caching storage is captured in cache_storage_per_mtok_per_hour_usd. Free tier (AI Studio) is published as per-minute RPM/TPM only, not per-day; free_tier omitted. Cross-verified prompt/completion price against openrouter.ai/google/gemini-2.5-flash-lite.
- **Google Gemini 2.0 Flash** — Deprecated; scheduled shutdown 2026-06-01. Google's documented migration target is a Gemini 3 Flash preview model not yet in this dataset; replaced_by_model_id points at gemini-2.5-flash as the closest in-file successor. Standard production 2.0 Flash does not support thinking (thinking exists only on Gemini 2.5+ and 3 series per ai.google.dev/gemini-api/docs/thinking); reasoning_tokens_billed=false. Audio input is priced separately (captured in audio_input_per_mtok_usd); cached audio input is $0.175/MTok vs $0.025/MTok for text/image/video. Batch Mode at flat 50% off. No long-context tier. supports_pdf=false since the 2.0 Flash model card lists supported inputs as audio/images/video/text (PDF not enumerated). Confidence medium because Vertex AI's published pricing for the same model name differs ($0.15 input / $0.60 output) from AI Studio's $0.10/$0.40; AI Studio primary value retained per spec, and OpenRouter (openrouter.ai/google/gemini-2.0-flash-001) cross-confirms $0.10/$0.40. Explicit context caching storage is captured in cache_storage_per_mtok_per_hour_usd. Free tier (AI Studio) is published as per-minute RPM/TPM only, not per-day; free_tier omitted.
- **Meta Llama 4 Maverick** — Multi-host pricing snapshot 2026-05-18: Together $0.27/$0.85 per MTok (meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8) is the row's structured price as the lowest direct-host rate. OpenRouter aggregator routes at $0.15/$0.60 (informational only; structured input/output stay at the lowest direct-host price per the PR4 convention) and is structured as aggregators["openrouter"]. Groq does not list Maverick on its public pricing page; Fireworks does not offer Maverick on serverless (on-demand deployments only). Context window is Meta's published 1M (1048576 tokens); OpenRouter advertises 1.05M but the HuggingFace model card spec is 1M. max_output_tokens not published on the model card; defaulted to 8192. 17B activated / 400B total MoE with 128 experts. deployment_options[] omits hosts whose canonical slug or full pricing could not be confirmed in a single primary source on this date (Fireworks no-serverless, Groq absent).
- **Meta Llama 4 Scout** — Multi-host pricing snapshot 2026-05-18: Together $0.18/$0.59 per MTok (meta-llama/Llama-4-Scout-17B-16E-Instruct); Fireworks $0.15/$0.60 (accounts/fireworks/models/llama4-scout-instruct-basic); Groq $0.11/$0.34 (meta-llama/llama-4-scout-17b-16e-instruct) is the lowest direct-host rate and is the row's structured price. OpenRouter aggregator routes at $0.08/$0.30 (informational only; structured input/output stay at the lowest direct-host price per the PR4 convention) and is structured as aggregators["openrouter"]. Context window is Meta's published 10M (10485760 tokens); most hosts cap below Meta's spec (e.g. Groq, Fireworks typically expose ~128K-1M at the API). max_output_tokens not published on the model card; defaulted to 8192. 17B activated / 109B total MoE with 16 experts.
- **Meta Llama 3.3 70B Instruct** — Multi-host pricing snapshot 2026-05-18: Together $0.88/$0.88 per MTok (meta-llama/Llama-3.3-70B-Instruct-Turbo); Groq $0.59/$0.79 (llama-3.3-70b-versatile) is the lowest direct-host rate and is the row's structured price. Fireworks publishes $0.90 input on accounts/fireworks/models/llama-v3p3-70b-instruct but output price was not captured from a single primary source on this date so Fireworks is omitted from deployment_options. OpenRouter aggregator routes at $0.10/$0.32 (informational only; structured input/output stay at the lowest direct-host price per the PR4 convention) and is structured as aggregators["openrouter"]. Context window 128K per Meta's spec (131072 tokens). Text-only; no vision. max_output_tokens not published on the model card; defaulted to 8192. Knowledge cutoff December 2023 per model card.
- **Mistral Mistral Large 2 (24.11)** — La Plateforme rates ($2.00 / $6.00 per MTok) are the row's structured price. Multi-host availability: Bedrock (mistral.mistral-large-2407-v1:0 in us-west-2), Vertex AI, Azure AI Foundry, IBM watsonx. Bedrock published the 24.07 build only, not 24.11. Deprecated on La Plateforme 2026-02-27 (retirement 2026-05-31 per Mistral's legacy table); `mistral-large-latest` now resolves to Mistral Large 3 (not in this dataset). Text-only; no vision (Pixtral Large is the multimodal sibling). max_output_tokens not published on the model card; defaulted to 8192. Batch API is a 50% discount where available but per-model availability is not confirmed from a single primary source on this date, so batch fields are unset. Knowledge cutoff not published by Mistral.
- **Mistral Mistral Medium 3** — La Plateforme rates ($0.40 / $2.00 per MTok) are the row's structured price. Mistral's launch post (2025-05-07) lists La Plateforme and Amazon SageMaker at GA with IBM watsonx, NVIDIA NIM, Azure AI Foundry, and Google Cloud Vertex as forthcoming; SageMaker is not in the deployment_options enum and Bedrock has not been confirmed, so deployment_options is restricted to native. Optimized for agentic and coding use cases. max_output_tokens not published on the model card; defaulted to 8192. Batch API discount per-model availability not confirmed on this date, so batch fields are unset. Knowledge cutoff not published by Mistral.
- **Mistral Mistral Small 3** — La Plateforme rates ($0.10 / $0.30 per MTok) are the row's structured price; per Mistral's launch post, half the price of the previous mistral-small ($0.20 / $0.60). 24B-parameter latency-optimized model under Apache 2.0; text-only. Context window 32K per Mistral's spec (33000 tokens rounded; 32768 used here). Deprecated on La Plateforme 2025-11-06 and retired 2025-11-30 per Mistral's legacy table; successor (mistral-small-2503 / 3.1 with vision and 128K context, and later Small 3.x builds) not yet in this dataset. max_output_tokens not published on the model card; defaulted to 8192. Batch API discount per-model availability not confirmed on this date, so batch fields are unset. Knowledge cutoff not published by Mistral.
- **Mistral Codestral 25.08** — La Plateforme rates ($0.30 / $0.90 per MTok) are the row's structured price. Code-specialized model optimized for fill-in-the-middle (FIM), code completion, code correction, and test generation; supports tool use and structured output per the 25.08 release. 256K context (262144 tokens). Also available on Google Cloud Vertex AI Model Garden as `codestral-2` under the `mistralai` publisher (Mistral Docs: Vertex AI cloud deployments page). max_output_tokens not published on the model card; defaulted to 8192. Batch API discount per-model availability not confirmed on this date, so batch fields are unset. Knowledge cutoff not published by Mistral.
- **Mistral Pixtral Large** — La Plateforme rates ($2.00 / $6.00 per MTok) are the row's structured price; pricing parity with Mistral Large 2 since Pixtral Large is the multimodal 124B-parameter open-weight model built on top of Mistral Large 2. Vision-capable: handles documents, charts, and natural images alongside text. Context window 128K (131072 tokens). Bedrock publishes the 25.02 refresh (`mistral.pixtral-large-2502-v1:0`, also routed via `us.mistral.pixtral-large-2502-v1:0`), not the 24.11 build. Deprecated on La Plateforme 2026-02-27 (retirement 2026-05-31 per Mistral's legacy table); Mistral's own news page now carries a "this model is deprecated" banner. Successor multimodal capability is absorbed by Mistral Large 3 (not in this dataset). max_output_tokens not published on the model card; defaulted to 8192. Vertex/Azure availability not confirmed for Pixtral Large on this date. Batch API discount per-model availability not confirmed on this date, so batch fields are unset. Knowledge cutoff not published by Mistral.
- **Cohere Command A** — Cohere's flagship 111B-parameter model: 256K context, text-only, optimized for tool use, RAG, agents, and 23-language multilingual workloads. Price ($2.50 / $10.00 per MTok) per artificialanalysis.ai citing Cohere's API; Command A is not listed on cohere.com/pricing as of 2026-05-18 (the public table still shows the older Command R/R+ tier), so confidence is medium. Cohere docs note AWS Bedrock availability as "Coming Soon" (no Bedrock SKU yet, so `bedrock` is omitted from deployment_options); Azure AI Foundry availability is published but uses per-deployment IDs, so no Azure alias is encoded. Oracle OCI exposes it as `cohere.command-a-03-2025` (kept as alias). Cache and batch pricing not published by Cohere. Knowledge cutoff not published on the Cohere model card.
- **Cohere Command R+** — Cohere Platform rates ($2.50 / $10.00 per MTok) are the row's structured price, listed on cohere.com/pricing as "Command R+ 08-2024". 128K context, text-only, optimized for complex RAG and multi-step tool use. Cohere's deprecations page sunsetted only the predecessor `command-r-plus-04-2024` on 2025-09-15 and names this 08-2024 build as the recommended replacement, so it is active on Cohere Platform. Bedrock SKU `cohere.command-r-plus-v1:0` launched Aug 2024 with a Mar 2024 knowledge cutoff (per Bedrock model card), matching this row; Bedrock marks the model "Legacy" with an EOL of 2026-08-19, which is a Bedrock-side lifecycle marker, not a Cohere deprecation. Azure AI Foundry availability published by Cohere; per-deployment IDs there, so no Azure alias is encoded. Cache and batch pricing not published by Cohere.
- **xAI Grok 4** — xAI native rates ($3.00 / $15.00 per MTok, $0.75 cached input) are the row's structured price; prompts above 128K total tokens are billed at the higher pricing_tiers rate ($6.00 / $30.00) per xAI's documented long-context tiering. Grok 4 (snapshot `grok-4-0709`, released 2025-07-09) was xAI's flagship reasoning model: reasoning is always on (thinking tokens billed at the output rate, hence reasoning_tokens_billed: true), parallel tool calling and structured outputs supported, accepts text and image inputs. max_output_tokens of 256000 reflects xAI's documented "up to 256K tokens of output" within the shared 256K prompt+response context. Retired from the xAI API on 2026-05-15 12:00 PM PT alongside seven other legacy slugs; requests to `grok-4-0709` and `grok-4` continue to resolve but are now redirected to `grok-4.3` with `low` reasoning effort and billed at grok-4.3 rates. Successor is `grok-4.3`, captured in this dataset and referenced via `replaced_by_model_id`. xAI's API is native-only (not on Bedrock/Vertex/Azure). Batch API not published for this model.
- **xAI Grok 3** — xAI native rates ($3.00 / $15.00 per MTok, $0.75 cached input) are the row's structured price (xAI's pricing page and mem0/pricepertoken aggregator both report $3/$15; artificialanalysis.ai reports a higher $4/$20 — choosing xAI-aligned figures). Grok 3 (released 2025-02-19) was xAI's flagship non-reasoning chat model; text-only inputs, function calling and structured outputs supported, 131,072-token combined prompt+response context window. Not a reasoning model (direct responses, no extended chain-of-thought; the reasoning sibling was `grok-3-mini`, not in this dataset). max_output_tokens defaulted to the documented context cap; xAI does not publish a separate max-output limit beyond the shared 131,072-token window. Retired from the xAI API on 2026-05-15 12:00 PM PT; requests to `grok-3` continue to resolve but are now redirected to `grok-4.3` with `none` reasoning effort and billed at grok-4.3 rates. Successor is `grok-4.3`, captured in this dataset and referenced via `replaced_by_model_id`. xAI's API is native-only. Batch API not published for this model.
- **xAI Grok Code Fast 1** — xAI native rates ($0.20 / $1.50 per MTok, $0.02 cached input) are the row's structured price. Grok Code Fast 1 (released 2025-08-26) was xAI's speedy, economical coding-specialized reasoning model: 314B-parameter MoE architecture, 256K combined prompt+response context, agentic coding focus, visible reasoning traces (`reasoning_content` field in streaming responses), function calling and structured outputs supported, text-only. Reasoning is enabled by default so reasoning tokens are billed at the output rate. max_output_tokens defaulted to the documented 256K context cap; xAI does not publish a separate max-output limit beyond the shared window. Retired from the xAI API on 2026-05-15 12:00 PM PT; requests to `grok-code-fast-1` continue to resolve but are now redirected to `grok-4.3` with `low` reasoning effort and billed at grok-4.3 rates. Successor is `grok-4.3`, captured in this dataset and referenced via `replaced_by_model_id`. xAI's API is native-only. Batch API not published for this model. Knowledge cutoff not published by xAI.
- **xAI Grok 4.3** — xAI native rates ($1.25 / $2.50 per MTok) are the row's structured price, listed on docs.x.ai/docs/models and docs.x.ai/docs/pricing. Grok 4.3 is xAI's current flagship: positioned as "the most intelligent and fastest model" recommended for chat, coding, and general use across the Grok API. 1M-token combined prompt+response context window (a 4x expansion over the 256K window on Grok 4 / Grok Code Fast 1). Successor to `grok-4-0709`, `grok-3`, and `grok-code-fast-1`, all of which xAI retired on 2026-05-15 12:00 PM PT and now redirect to this SKU at varying default `reasoning_effort` levels (`low` for grok-4-0709 / grok-code-fast-1, `none` for grok-3). Thinking mode is exposed via the `reasoning_effort` parameter (`none` / `low` / `medium` / `high`); when reasoning is on, thinking tokens are billed at the output rate, hence reasoning_tokens_billed: true. Accepts text and image inputs, text output; parallel tool calling and structured outputs supported. max_output_tokens reflects the documented 1M-token shared window — xAI does not publish a separate max-output limit. Cached input price, knowledge cutoff, and release date are not published by xAI on the models or pricing pages as of last_verified; omitted rather than guessed. xAI's API is native-only (not on Bedrock / Vertex / Azure). Batch API not published for this model.
- **Alibaba Qwen3-Max** — Alibaba DashScope (International) tiered pricing by input-token bucket: 0-32K = $1.20 input / $6.00 output per MTok (base row rate); 32K-128K = $2.40 / $12.00; 128K-252K = $3.00 / $15.00 (captured in pricing_tiers). Batch API discounts both input and output by 50% ($0.60 / $3.00 at the base tier). Context window 262,144 tokens (DashScope publishes 252K as the top-tier pricing ceiling; Qwen team and OpenRouter publish the full 262,144 model context). Max output 32,768 tokens. Released 2025-09-23 (`qwen3-max-2025-09-23`); knowledge cutoff 2025-06-30. Hybrid thinking model: thinking mode disabled by default but available via `/think` (and disabled via `/no_think`); when enabled, thinking tokens are billed at the output rate, hence reasoning_tokens_billed: true. Text-only inputs and outputs (the Qwen3-VL family is a separate set of model_ids). Tool calling and structured outputs supported via the DashScope and OpenAI-compatible endpoints. Explicit context cache discounts cached input tokens to 10% of the standard rate, but DashScope does not publish a single cache_read figure across the tiered input rates, so cache_read_per_mtok_usd is omitted rather than guessed. Deployment via DashScope (Model Studio) only at last_verified; not on Bedrock, Vertex, Together, Fireworks, or Groq as a first-party offering.
- **Alibaba Qwen3-Coder-Plus** — Alibaba DashScope (International) tiered pricing by input-token bucket: 0-32K = $1.00 / $5.00 per MTok (base row rate); 32K-128K = $1.80 / $9.00; 128K-256K = $3.00 / $15.00; 256K-1M = $6.00 / $60.00 (captured in pricing_tiers). Batch API discounts both input and output by 50% ($0.50 / $2.50 at the base tier). 1,000,000-token context window with 65,536 max output tokens. Released 2025-09-23 (`qwen3-coder-plus-2025-09-23`). Built on the Qwen3-Coder 480B-A35B MoE base; positioned for agentic coding (robust tool calling and environment interaction). Not a thinking/reasoning SKU (no chain-of-thought billing semantics), so reasoning_tokens_billed is false. Text-only modalities. Explicit context cache discounts cached input to 10% of the standard rate; implicit cache to 20%; DashScope does not publish a single cache_read figure across tiered input rates, so cache_read_per_mtok_usd is omitted rather than guessed. Knowledge cutoff not published. Deployment via DashScope (Model Studio) only at last_verified.
- **Perplexity Sonar** — Perplexity native rates: $1.00 input / $1.00 output per MTok. Web search is built into the API as a first-class capability rather than a user-defined tool, so total cost per query = token costs + a per-request fee. per_request_usd captures the low-context tier ($5 / 1,000 requests = $0.005); medium and high search-context tiers add $8 / $12 per 1,000 requests respectively (not captured structurally — single-value field). Perplexity does not publish a separate per-search fee for this SKU (per-search metering applies to `sonar-deep-research`), so per_search_usd is omitted. 128,000-token context window; max_output_tokens 8,000 per Perplexity's documented Sonar limits. Sonar (released January 2025) is built on a fine-tuned Llama 3.3 70B base optimized for web-grounded question answering with inline citations. Knowledge cutoff is intentionally omitted: Sonar fetches the live web at query time, so a static cutoff date does not meaningfully describe its answer space. Not a reasoning SKU (no chain-of-thought tokens), hence reasoning_tokens_billed is false. supports_tool_use is set conservatively to false because web search — the model's primary capability — is exposed as a built-in feature of the Sonar endpoint, not a user-defined tool; Perplexity recommends its separate Agent API for production tool-using agents. Structured outputs supported via response_format. Perplexity API is native-only (no Bedrock / Vertex / Azure / Together / Fireworks / Groq first-party deployment). Cache and batch APIs are not published.
- **Perplexity Sonar Pro** — Perplexity native rates: $3.00 input / $15.00 output per MTok. Web search is built into the API, not exposed as a user-defined tool; total cost per query = token costs + a per-request fee. per_request_usd captures the low-context tier ($6 / 1,000 requests = $0.006); medium and high search-context tiers add $10 / $14 per 1,000 requests respectively (not captured structurally — single-value field). 200,000-token context window (largest of the Sonar family); max_output_tokens 8,000 per Perplexity's documented Sonar limits. Positioned as Perplexity's advanced search SKU for complex multi-source queries and follow-ups. Knowledge cutoff intentionally omitted: Sonar Pro fetches the live web at query time. Not a reasoning SKU, hence reasoning_tokens_billed is false. supports_tool_use set conservatively to false because Perplexity exposes web search as the built-in capability and recommends the separate Agent API for production tool-using agents. Structured outputs supported via response_format. Perplexity API is native-only (no Bedrock / Vertex / Azure / Together / Fireworks / Groq first-party deployment). Cache and batch APIs are not published.
- **Perplexity Sonar Reasoning Pro** — Perplexity native rates: $2.00 input / $8.00 output per MTok. Web search is built into the API; total cost per query = token costs + a per-request fee. per_request_usd captures the low-context tier ($6 / 1,000 requests = $0.006); medium and high search-context tiers add $10 / $14 per 1,000 requests respectively (not captured structurally — single-value field). 128,000-token context window; max_output_tokens 8,000 per Perplexity's documented Sonar limits. Reasoning SKU built on DeepSeek R1 with Chain-of-Thought; responses include a leading `<think>` reasoning block followed by the answer, and those reasoning tokens are billed at the output rate, hence reasoning_tokens_billed: true. Knowledge cutoff intentionally omitted: Sonar Reasoning Pro fetches the live web at query time. supports_tool_use set conservatively to false because Perplexity exposes web search as the built-in capability and recommends the separate Agent API for production tool-using agents. Structured outputs supported via response_format. Perplexity API is native-only (no Bedrock / Vertex / Azure / Together / Fireworks / Groq first-party deployment). Cache and batch APIs are not published. Note: the older `sonar-reasoning` SKU is no longer listed in Perplexity's current model lineup at last_verified.


## Speech-to-text

| Provider | Model | $/min | $/min batch | Streaming | Realtime | Languages | Diarization | Verified | Source |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| Deepgram | Nova-3 Monolingual (`nova-3-monolingual`) | $0.004800 | $0.004300 | yes | yes | en | extra-cost | 2026-05-05 | [link](https://deepgram.com/pricing) |
| Deepgram | Nova-3 Multilingual (`nova-3-multilingual`) | $0.005800 | $0.005200 | yes | yes | 61+ | extra-cost | 2026-05-05 | [link](https://deepgram.com/pricing) |
| Deepgram | Nova-3 Medical (`nova-3-medical`) | $0.004800 | $0.004300 | yes | yes | en, en-US, en-AU, en-CA, en-GB, en-IE, en-IN, en-NZ | extra-cost | 2026-05-19 | [link](https://deepgram.com/learn/introducing-nova-3-medical-speech-to-text-api) |
| AssemblyAI | Universal-2 (`universal-2`) | $0.002500 | — | no | no | 99+ | extra-cost | 2026-05-05 | [link](https://www.assemblyai.com/pricing) |
| AssemblyAI | Universal-Streaming (`universal-streaming`) | $0.002500 | — | yes | yes | en | extra-cost | 2026-05-05 | [link](https://www.assemblyai.com/pricing) |
| AssemblyAI | Universal-3 Pro (`universal-3-pro`) | $0.003500 | — | no | no | en, es, pt, de, fr, it | extra-cost | 2026-05-26 | [link](https://www.assemblyai.com/pricing) |
| AssemblyAI | Universal-3 Pro Streaming (`universal-3-pro-streaming`) | $0.007500 | — | yes | yes | en, es, pt, de, fr, it | extra-cost | 2026-05-26 | [link](https://www.assemblyai.com/pricing) |
| AssemblyAI | Universal-Streaming Multilingual (`universal-streaming-multilingual`) | $0.002500 | — | yes | yes | en, es, pt, de, fr, it | extra-cost | 2026-05-26 | [link](https://www.assemblyai.com/pricing) |
| AssemblyAI | Whisper-Streaming (`whisper-streaming`) | $0.005000 | — | yes | yes | 99+ | extra-cost | 2026-05-26 | [link](https://www.assemblyai.com/pricing) |
| Cartesia | Ink (`ink-1`) | $0.003000 | — | yes | yes | 42+ | unsupported | 2026-05-05 | [link](https://cartesia.ai/pricing) |
| OpenAI | Whisper (`whisper-1`) | $0.006000 | — | no | no | 99+ | unsupported | 2026-05-05 | [link](https://openai.com/api/pricing/) |
| Groq | Whisper V3 Large (Groq) (`whisper-large-v3`) | $0.001850 | — | no | no | 99+ | unsupported | 2026-05-05 | [link](https://groq.com/pricing/) |
| Groq | Whisper Large v3 Turbo (Groq) (`whisper-large-v3-turbo`) | $0.000667 | — | no | no | 99+ | unsupported | 2026-05-05 | [link](https://groq.com/pricing/) |
| Microsoft Azure | Azure Speech (Real-time) (`azure-speech-realtime`) | $0.016667 | — | yes | yes | 100+ | included | 2026-05-19 | [link](https://azure.microsoft.com/en-us/pricing/details/speech/) |
| Microsoft Azure | Azure Speech (Batch) (`azure-speech-batch`) | $0.006000 | — | no | no | 100+ | included | 2026-05-19 | [link](https://azure.microsoft.com/en-us/pricing/details/speech/) |
| Google | Chirp 2 (`chirp_2`) | $0.016000 | — | yes | yes | 20+ | included | 2026-05-19 | [link](https://cloud.google.com/speech-to-text/pricing) |
| Google | Chirp 3 (`chirp_3`) | $0.016000 | — | yes | yes | 98+ | included | 2026-05-19 | [link](https://cloud.google.com/speech-to-text/pricing) |
| Speechmatics | Speechmatics Enhanced (`enhanced`) | $0.004000 | — | yes | yes | 55+ | included | 2026-05-19 | [link](https://www.speechmatics.com/pricing) |
| Rev.ai | Rev.ai Whisper Fusion (`whisper-fusion`) | $0.005000 | — | yes | yes | en | included | 2026-05-19 | [link](https://www.rev.ai/pricing) |
| Rev.ai | Rev.ai Reverb (`reverb`) | $0.003300 | — | no | no | en | included | 2026-05-19 | [link](https://www.rev.ai/pricing) |
| Gladia | Gladia Solaria-1 (`solaria-1`) | $0.012500 | $0.010170 | yes | yes | 100+ | included | 2026-05-19 | [link](https://www.gladia.io/pricing) |
| Soniox | Soniox STT Real-time v4 (`stt-rt-v4`) | $0.002000 | — | yes | yes | 60+ | included | 2026-05-19 | [link](https://soniox.com/pricing) |
| Soniox | Soniox STT Async v4 (`stt-async-v4`) | $0.001670 | — | no | no | 60+ | included | 2026-05-19 | [link](https://soniox.com/pricing) |

**Notes:**

- **Deepgram Nova-3 Monolingual** — Pay-as-you-go tier (English). Growth tier (volume commitment) is $0.0042/min streaming, $0.0036/min pre-recorded. Diarization add-on is captured in diarization_per_minute_usd.
- **Deepgram Nova-3 Multilingual** — Pay-as-you-go tier (multilingual, ~61 languages). Growth tier is $0.0050/min streaming, $0.0043/min pre-recorded. Diarization add-on is captured in diarization_per_minute_usd.
- **Deepgram Nova-3 Medical** — Medical-tuned Nova-3 for clinical transcription. English variants only (en, en-US, en-AU, en-CA, en-GB, en-IE, en-IN, en-NZ). Invoked via `model=nova-3-medical` in the Deepgram API. Pricing is not separately listed on the public pricing page; Deepgram's launch announcement quotes $0.0043/min pre-recorded, which matches the Nova-3 Monolingual batch rate. Streaming rate assumed equal to Nova-3 Monolingual ($0.0048/min PAYG); verify with Deepgram sales for production commitments.
- **AssemblyAI Universal-2** — Pre-recorded (file-based) only — broadest AssemblyAI language coverage. Published as $0.15/hr. Diarization add-on +$0.02/hr. For real-time use, see universal-streaming.
- **AssemblyAI Universal-Streaming** — English-only streaming model. Published as $0.15/hr. Higher-tier streaming (Universal-3 Pro Streaming) is $0.45/hr ($0.0075/min).
- **AssemblyAI Universal-3 Pro** — AssemblyAI's highest-accuracy pre-recorded model with native code-switching across EN/ES/PT/DE/FR/IT. Published as $0.21/hr. For real-time use see universal-3-pro-streaming.
- **AssemblyAI Universal-3 Pro Streaming** — Real-time streaming variant of Universal-3 Pro, positioned for voice agents. Multilingual with native code-switching across EN/ES/PT/DE/FR/IT. Published as $0.45/hr.
- **AssemblyAI Universal-Streaming Multilingual** — Multilingual streaming variant covering EN/ES/PT/DE/FR/IT at the same $0.15/hr rate as the English-only universal-streaming. Good balance of cost and latency for voice agents.
- **AssemblyAI Whisper-Streaming** — OpenAI Whisper served via AssemblyAI's streaming infrastructure with 99+ language coverage. Published as $0.30/hr.
- **Cartesia Ink** — Cartesia's STT model. Published as $12 per 4000 minutes ($0.003/min). Same provider as the Sonic TTS model (real-time focused).
- **OpenAI Whisper** — Billed to the nearest second. Pre-recorded only. openai.com pricing page returned 403 during automated fetch; rate cross-verified via OpenRouter (openai/whisper-1).
- **Groq Whisper V3 Large (Groq)** — Whisper V3 Large hosted on Groq. Published as $0.111/hr. Speed factor 217x realtime.
- **Groq Whisper Large v3 Turbo (Groq)** — Whisper Large v3 Turbo hosted on Groq — faster variant. Published as $0.04/hr. Speed factor 228x realtime.
- **Microsoft Azure Azure Speech (Real-time)** — Azure Speech Standard (S0) real-time speech-to-text. Published as $1.00/hr pay-as-you-go (= $0.016667/min); custom real-time endpoint is $1.20/hr ($0.02/min). Commitment tiers reduce effective rate (2,000 hrs/mo $0.80/hr; 10,000 hrs/mo $0.65/hr; 50,000 hrs/mo $0.50/hr). Real-time diarization is included up to 240 min/session per Microsoft Learn quotas/limits doc. Languages: 100+ per Azure language support docs. Cross-verified via https://learn.microsoft.com/en-us/answers/questions/2155625/speech-to-text-costing-1-hr-is-crazy-no-bulk-avail.
- **Microsoft Azure Azure Speech (Batch)** — Azure Speech Standard (S0) batch transcription. Published as $0.36/hr (= $0.006/min); custom batch endpoint is $0.45/hr ($0.0075/min). Fast transcription (REST API, sync) is a separate $0.66/hr ($0.011/min) tier, not modelled here. Batch diarization is included (up to 240 min/file). Cross-verified via https://learn.microsoft.com/en-us/answers/questions/2155625/speech-to-text-costing-1-hr-is-crazy-no-bulk-avail.
- **Google Chirp 2** — Google Cloud Speech-to-Text v2 multilingual model. Standard tier $0.016/min for both real-time and batch (down from v1's $0.024/min, per https://cloud.google.com/blog/products/ai-machine-learning/google-cloud-speech-to-text-v2-api). Dynamic Batch tier (up to 24h SLA) is 75% lower at $0.004/min — not modelled as price_per_minute_batch_usd because standard batch is the same as real-time. Standard volume tiers can reduce effective rate to as low as $0.004/min. Supports StreamingRecognize (~20 languages), Recognize, and BatchRecognize (broadest language coverage) per https://docs.cloud.google.com/speech-to-text/docs/models/chirp-2. GA in us-central1, europe-west4, asia-southeast1.
- **Google Chirp 3** — Google Cloud Speech-to-Text v2 latest-generation generative ASR model. Standard tier $0.016/min for both real-time and batch; Dynamic Batch tier (up to 24h SLA) at $0.004/min (75% off) — not modelled as price_per_minute_batch_usd because standard batch is the same as real-time. Adds automatic language detection and diarization vs Chirp 2 per https://docs.cloud.google.com/speech-to-text/docs/models/chirp-3. 98+ languages and locales (24 GA + 74 preview); supports StreamingRecognize and BatchRecognize.
- **Speechmatics Speechmatics Enhanced** — Speechmatics offers two operating points — `enhanced` (highest accuracy) and `standard` (faster/cheaper) — selected via the `operating_point` API parameter on both Batch and Real-time APIs. Pricing page lists Pro tier from $0.24/hr ($0.004/min) on PAYG with volume discounts above 500 hrs/month; the same tier is used for both real-time and batch. Free plan includes 480 minutes/month (not modelled as `free_tier` because schema expects per-day/per-token quotas). Confidence is medium because the pricing page exposes tier names rather than per-model SKU rates; verify against contract for production. Cross-verified product structure via https://docs.speechmatics.com/.
- **Rev.ai Rev.ai Whisper Fusion** — Rev.ai's streaming transcription product, branded `Whisper Fusion` on the pricing page at $0.005/min; the parallel Whisper Large streaming tier is also $0.005/min. Free credits equivalent to 5 hours of Reverb ASR (cross-applicable across products). Reverb (batch) is modelled separately; see https://docs.rev.ai/ for the full API surface. English-primary; foreign language support is a distinct Reverb Foreign Language product line.
- **Rev.ai Rev.ai Reverb** — Rev.ai's async/batch ASR model branded `Reverb` at $0.20/hr ($0.0033/min). A `Reverb Turbo` tier exists at $0.10/hr ($0.0017/min) — not modelled as a separate row since it's a latency/quality dial on the same product; `Reverb Foreign Language` ($0.30/hr, $0.005/min, 57+ languages) is also priced separately and could be added if needed. Free credits equivalent to 5 hours of Reverb ASR. See https://docs.rev.ai/ for the async transcription API.
- **Gladia Gladia Solaria-1** — Gladia's first-generation universal STT model `solaria-1`, supporting 100+ languages with automatic language detection and code-switching. Starter (PAYG) pricing: real-time $0.75/hr ($0.0125/min), async $0.61/hr (~$0.01017/min). Growth (committed) plan lowers real-time to $0.25/hr ($0.0042/min) and async to $0.20/hr ($0.0033/min). Sub-300ms streaming latency claimed on the pricing page. Speaker diarization and word-level timestamps included on all tiers. Starter includes 10 free hours per month. Model name confirmed via https://docs.gladia.io/.
- **Soniox Soniox STT Real-time v4** — Soniox real-time STT model `stt-rt-v4`. Primary billing metric is input audio tokens at $2.00 per 1M tokens; vendor approximates ~$0.12/hour which we use as $0.002/min for comparability. Aliased from `stt-rt-v3` (deprecated 2026-02-05; removed 2026-02-28 per https://soniox.com/docs/stt/models). 60+ languages with automatic language detection. Confidence medium because per-minute is an approximation of token-based pricing; actual cost varies with audio content density.
- **Soniox Soniox STT Async v4** — Soniox async (file) STT model `stt-async-v4`. Primary billing metric is input audio tokens at $1.50 per 1M tokens; vendor approximates ~$0.10/hour which we use as $0.00167/min for comparability. Aliased from `stt-async-v3` (deprecated 2026-02-05; removed 2026-02-28 per https://soniox.com/docs/stt/models). Supports up to 5 hours of audio per request. 60+ languages with automatic language detection. Confidence medium because per-minute is an approximation of token-based pricing.


## Text-to-speech

| Provider | Model | $/1M chars | Quality | Cloning | Languages | TTFB | SSML | Verified | Source |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| ElevenLabs | Eleven Flash v2.5 (`eleven_flash_v2_5`) | $50.00 | neural | yes | 32+ | 75ms | no | 2026-05-05 | [link](https://elevenlabs.io/pricing/api) |
| ElevenLabs | Eleven Multilingual v2 (`eleven_multilingual_v2`) | $100.00 | neural | yes | 29+ | — | no | 2026-05-19 | [link](https://elevenlabs.io/docs/overview/models) |
| ElevenLabs | Eleven v3 (`eleven_v3`) | $100.00 | neural | yes | 70+ | — | no | 2026-05-19 | [link](https://elevenlabs.io/docs/overview/models) |
| ElevenLabs | Eleven Turbo v2.5 (`eleven_turbo_v2_5`) | $50.00 | neural | yes | 32+ | — | no | 2026-05-19 | [link](https://elevenlabs.io/docs/overview/models) |
| OpenAI | TTS-1 (`tts-1`) | $15.00 | neural | no | en | — | no | 2026-05-05 | [link](https://openai.com/api/pricing/) |
| OpenAI | TTS-1 HD (`tts-1-hd`) | $30.00 | neural | no | en | — | no | 2026-05-05 | [link](https://openai.com/api/pricing/) |
| OpenAI | GPT-4o mini TTS (`gpt-4o-mini-tts`) | $20.00 | neural | no | en | — | no | 2026-05-19 | [link](https://developers.openai.com/api/docs/models/gpt-4o-mini-tts) |
| Cartesia | Sonic 3.5 (`sonic-3.5`) | $50.00 | neural | yes | 42+ | 90ms | no | 2026-05-05 | [link](https://cartesia.ai/pricing) |
| Cartesia | Sonic 2 (`sonic-2`) | $50.00 | neural | yes | 15+ | 90ms | no | 2026-05-19 | [link](https://docs.cartesia.ai/build-with-cartesia/tts-models/older-models) |
| Cartesia | Sonic Turbo (`sonic-turbo`) | $50.00 | neural | yes | 15+ | 40ms | no | 2026-05-19 | [link](https://docs.cartesia.ai/build-with-cartesia/tts-models/older-models) |
| Groq | Canopy Labs Orpheus English (Groq) (`canopy-labs-orpheus-english`) | $22.00 | neural | — | en | — | no | 2026-05-05 | [link](https://groq.com/pricing/) |
| Groq | Canopy Labs Orpheus Arabic Saudi (Groq) (`canopy-labs-orpheus-arabic-saudi`) | $40.00 | neural | — | ar-SA | — | no | 2026-05-05 | [link](https://groq.com/pricing/) |
| Google | Google Cloud TTS — Studio (`google-tts-studio`) | $160.00 | neural | no | 40+ | — | yes | 2026-05-19 | [link](https://cloud.google.com/text-to-speech/pricing) |
| Google | Google Cloud TTS — Neural2 (`google-tts-neural2`) | $16.00 | neural | no | 40+ | — | yes | 2026-05-19 | [link](https://cloud.google.com/text-to-speech/pricing) |
| Google | Google Cloud TTS — WaveNet (`google-tts-wavenet`) | $16.00 | neural | no | 40+ | — | yes | 2026-05-19 | [link](https://cloud.google.com/text-to-speech/pricing) |
| Google | Google Cloud TTS — Chirp 3: HD (`google-tts-chirp-3-hd`) | $30.00 | neural | no | 30+ | — | no | 2026-05-19 | [link](https://cloud.google.com/text-to-speech/pricing) |
| Microsoft Azure | Azure AI Speech — Neural (`azure-tts-neural`) | $16.00 | neural | no | 100+ | — | yes | 2026-05-19 | [link](https://azure.microsoft.com/en-us/pricing/details/cognitive-services/speech-services/) |
| Microsoft Azure | Azure AI Speech — Neural HD (DragonHD) (`azure-tts-hd`) | $22.00 | neural | no | en-US, zh-CN, de-DE, es-ES, fr-FR, ja-JP | — | no | 2026-05-19 | [link](https://azure.microsoft.com/en-us/pricing/details/cognitive-services/speech-services/) |
| Inworld | Inworld Realtime TTS-2 (`inworld-tts-2`) | $35.00 | neural | yes | 100+ | — | no | 2026-05-19 | [link](https://inworld.ai/pricing) |
| Smallest.ai | Lightning v3.1 (`lightning-v3.1`) | $25.00 | neural | yes | en, hi, es, mr, kn, ta, bn, gu, te, ml, pa, or | 200ms | no | 2026-05-19 | [link](https://smallest.ai/pricing) |
| Rime | Rime Mist v3 (`mistv3`) | $30.00 | neural | no | en | 100ms | no | 2026-05-19 | [link](https://rime.ai/pricing) |
| LMNT | LMNT Blizzard (`blizzard`) | $50.00 | neural | yes | 31+ | — | no | 2026-05-19 | [link](https://www.lmnt.com/pricing) |
| Deepgram | Deepgram Aura 2 (`aura-2`) | $30.00 | neural | no | en, es, de, fr, nl, it, ja | — | no | 2026-05-19 | [link](https://deepgram.com/pricing) |
| Resemble AI | Resemble Chatterbox Turbo (`chatterbox-turbo`) | — | neural | yes | en | — | no | 2026-05-26 | [link](https://www.resemble.ai/pricing) |

**Notes:**

- **ElevenLabs Eleven Flash v2.5** — Current low-latency flagship; eleven_turbo_v2_5 is deprecated and replaced by Flash v2.5. Pay-as-you-go rate $0.05/1K chars. ~10K voices available. Plain text input only (no SSML).
- **ElevenLabs Eleven Multilingual v2** — High-quality professional model for audiobooks, video narration, and rich emotional expression. 29 languages, max 10,000 chars per request. Pay-as-you-go billed at 1 credit per character; Flash/Turbo v2.5 are billed at 0.5 credits/char (hence 2x the Flash $50/1M-chars rate). Higher latency than Flash; not recommended for real-time agents.
- **ElevenLabs Eleven v3** — Most expressive ElevenLabs TTS model (GA after alpha). 70+ languages, max 5,000 chars per request. Supports inline audio tags ([whispers], [sighs], [laughs], [happily]) for emotion/delivery control instead of SSML. Higher latency than Flash/Turbo v2.5 — ElevenLabs explicitly recommends v2.5 Flash/Turbo for real-time use. Pay-as-you-go billed at 1 credit/char (same multiplier as Multilingual v2). PVCs (professional voice clones) not yet fully optimized for v3.
- **ElevenLabs Eleven Turbo v2.5** — Deprecated per ElevenLabs models page — outclassed by and replaced by eleven_flash_v2_5. Still callable but not recommended for new applications. No official sunset date published; deprecated_at reflects verification date. Pay-as-you-go billed at 0.5 credits/char (same as Flash v2.5).
- **OpenAI TTS-1** — Standard quality. The newer gpt-4o-mini-tts model is also available in OpenAI's API; consider adding when its pricing structure stabilizes.
- **OpenAI TTS-1 HD** — High-definition tier — 2x the price of tts-1 for higher quality output.
- **OpenAI GPT-4o mini TTS** — OpenAI's newer GPT-4o-based TTS. Native pricing is token-based — $0.60/1M text-input tokens + $12/1M audio-output tokens — not per character. OpenAI's published estimate is ~$0.015 per minute of audio; converted to ~$20/1M chars assuming ~150 WPM (~750 chars/min) for consistency with the Cartesia row. Actual $/1M chars varies with speech rate and language. Supports voice steering via natural-language instructions (style/emotion) instead of SSML. Latest snapshot gpt-4o-mini-tts-2025-12-15. Max input 2,000 tokens per request.
- **Cartesia Sonic 3.5** — Cartesia publishes Sonic pricing as $1 per 25 minutes of audio output (~$0.04/min). Converted to ~$50/1M chars assuming ~150 WPM (~750 chars/min). Actual $/1M chars varies with speech rate; verify by sampling. IVC voice cloning included (no clone fee). 90ms TTFB. SSML tags (speed/volume/break/spell/emotion) are documented on sonic-3 but temporarily disabled on sonic-3.5 per https://docs.cartesia.ai/build-with-cartesia/sonic-3/ssml-tags (checked 2026-05-16); flip ssml_supported back to true once upstream re-enables.
- **Cartesia Sonic 2** — Predecessor to sonic-3.5; still stable and callable, but Cartesia recommends sonic-3.5 for new builds. Latest snapshot sonic-2-2025-06-11. 8 core stable languages (en, fr, de, es, pt, zh, ja, ko); 7 additional languages reach EOL 2026-06-01. 90ms model latency. Higher-fidelity voice cloning capability. Pricing assumed equal to sonic-3.5 (15 credits/sec of audio); confidence medium because Cartesia did not publish a separate per-model rate. Verify by sampling if cost-critical.
- **Cartesia Sonic Turbo** — Lowest-latency Sonic variant (~40ms TTFB). Still stable and callable, but Cartesia recommends sonic-3.5 for new builds. Latest snapshot sonic-turbo-2025-06-04. 9 stable languages; 6 additional languages reach EOL 2026-06-01. Pricing assumed equal to sonic-3.5 (15 credits/sec of audio); confidence medium because Cartesia did not publish a separate per-model rate. Verify by sampling if cost-critical.
- **Groq Canopy Labs Orpheus English (Groq)** — Hosted on Groq. Output speed ~100 characters/second.
- **Groq Canopy Labs Orpheus Arabic Saudi (Groq)** — Hosted on Groq. Saudi Arabic variant. Output speed ~100 characters/second.
- **Google Google Cloud TTS — Studio** — Google's premium TTS tier for professional media production (long-form narration, advertising). Vendor price $0.000160/char = $160/1M chars; the single-speaker Studio class is GA and the multispeaker class is experimental per https://docs.cloud.google.com/text-to-speech/docs/voices. SSML supported except <mark>, <emphasis>, <prosody pitch>, and <lang>. Model_id is a Hail-coined tier slug (Google bills per-voice-tier rather than per API model name). Free tier: first 100K chars/month included (not representable as tokens_per_day in schema).
- **Google Google Cloud TTS — Neural2** — Google's recommended general-purpose neural tier; same per-char rate as WaveNet but newer architecture. Vendor price $0.000016/char = $16/1M chars. SSML fully supported. Model_id is a Hail-coined tier slug (Google bills per-voice-tier rather than per API model name). Free tier: first 1M chars/month included (not representable as tokens_per_day in schema).
- **Google Google Cloud TTS — WaveNet** — Original neural-net voice family from DeepMind; not deprecated as of 2026-05-19 per Cloud TTS release notes (https://docs.cloud.google.com/text-to-speech/docs/release-notes) but Google recommends Neural2 for new projects at the same $16/1M chars rate. SSML fully supported. Model_id is a Hail-coined tier slug. Free tier: first 1M chars/month included.
- **Google Google Cloud TTS — Chirp 3: HD** — Google's newest-generation TTS family with 30 voice styles in 30+ languages. Vendor price $0.000030/char = $30/1M chars. Per https://docs.cloud.google.com/text-to-speech/docs/chirp3-hd, Chirp 3: HD explicitly does NOT support SSML, speaking-rate adjustments, or pitch parameters; streaming synthesis IS supported. Model_id is a Hail-coined tier slug. Free tier: first 1M chars/month included.
- **Microsoft Azure Azure AI Speech — Neural** — Azure's standard neural TTS tier (called 'Neural' on the pricing page; 'Standard voice' in docs). 500+ prebuilt voices across 100+ locales per https://learn.microsoft.com/en-us/azure/ai-services/speech-service/text-to-speech. S0 pay-as-you-go price $16/1M chars for both real-time and batch synthesis (HD, AOAI, Custom Neural Voice, and Personal Voice priced separately). Full SSML support. Chinese characters counted as 2 chars for billing. Free tier (F0): 500K chars/month.
- **Microsoft Azure Azure AI Speech — Neural HD (DragonHD)** — Azure's premium HD neural tier (DragonHD architecture, 30+ GA voices). Per https://techcommunity.microsoft.com/blog/azure-ai-foundry-blog/azure-speech-%E2%80%93-neural-hd-text-to-speech-recent-voice-updates/4505380, Azure reduced Neural HD pricing to $22/1M chars effective March 2026 (down from $30/1M). Latency <300ms, real-time only. SSML support is partial (no <prosody>, <emphasis>, <break>); we mark ssml_supported=false because the elements most callers want are unsupported. Automatic emotion/sentiment detection drives delivery (emotion_control_supported=true). DragonHDOmni (700+ voices, mixed GA/preview) and DragonHDFlash (en-US/zh-CN only) are distinct models tracked separately if added later. Confidence medium because the pricing-page value was sourced via third-party recap (techcommunity blog) — verify on the live Azure pricing page before high-volume use.
- **Inworld Inworld Realtime TTS-2** — Inworld's newest TTS model (Research Preview) with natural-language steering and 100+ language support. Base PAYG rate $35/1M chars; Developer tier $30, Growth $25, Enterprise as low as $10/1M. ~200ms latency. Instant voice cloning, custom pronunciation, timestamp alignment, and zero data retention included. The older inworld-tts-1 and inworld-tts-1-max are deprecated per https://docs.inworld.ai/tts/tts-models — migrate to inworld-tts-1.5-max or inworld-tts-1.5-mini if -2's research-preview status is a concern.
- **Smallest.ai Lightning v3.1** — Smallest.ai's current TTS model (44.1 kHz native, ~200ms TTFB). Vendor rate ~$0.25/10k chars = $25/1M chars. 217 voices across 12 languages: English, Hindi, Spanish, and 9 Indian languages (Marathi, Kannada, Tamil, Bengali, Gujarati, Telugu, Malayalam, Punjabi, Odia). Lightning v2 and lightning-large are deprecated per https://docs.smallest.ai/waves/documentation/getting-started/models — new integrations should use lightning-v3.1. WebSocket streaming for real-time/conversational use. Instant + professional voice cloning supported. On-prem available on Enterprise plan.
- **Rime Rime Mist v3** — Rime's current Mist-family flagship; English-only with sub-100ms TTFB. Pricing-page rate $0.03/1K chars = $30/1M chars (the 'Mist' line on https://rime.ai/pricing). 94 voices. Custom pronunciation is on mistv2 but not yet on mistv3 per https://docs.rime.ai/api-reference/models. Coda ($0.05/1K = $50/1M, 184 voices, 6 languages incl. ES/FR/PT/DE/JA) and Arcana ($0.04/1K = $40/1M, multilingual, 94 voices) are distinct higher-tier models tracked separately if added later. Voice cloning not documented for Mist; available via Enterprise plan for custom voices.
- **LMNT LMNT Blizzard** — LMNT's flagship Blizzard 2.0 model (canonical model_id 'blizzard' per https://docs.lmnt.com/models/overview). 31 languages with accent control, word timestamps, streaming, voice cloning, and speech sessions. Confidence medium because LMNT publishes plan-bundled pricing (Indie $10/mo for 200K chars + $0.05/1K overage; Pro $49/mo + $0.045/1K overage; Premium $199/mo + $0.035/1K overage) rather than a standalone PAYG per-char rate — $50/1M shown here is the Indie-tier overage rate. Free tier includes 15K characters/month with no overage rate (not representable as tokens_per_day in schema). Premium tier overage is $0.035/1K = $35/1M — large customers should benchmark on their own plan.
- **Deepgram Deepgram Aura 2** — Deepgram's current TTS family, addressed as 'aura-2-<voice>-<lang>' (e.g., aura-2-thalia-en) per https://developers.deepgram.com/docs/tts-models. Vendor rate $0.030/1K chars = $30/1M chars on Pay-As-You-Go; Growth tier $0.027/1K = $27/1M. Voice counts by language: en 40+ (incl. Aura 1 legacy), es 15+ (Early Access), nl 8, fr 2, de 7, it 10, ja 4. Free tier ships $200 of signup credit applicable to all products (not representable as tokens_per_day in schema). Aura 1 voices remain callable but Deepgram recommends Aura 2 for new integrations.
- **Resemble AI Resemble Chatterbox Turbo** — Resemble bills TTS per second of generated audio at a flat $0.0005/sec on their Flex (pay-as-you-go) plan; the rate is not split per model. Chatterbox Turbo is Resemble's flagship English TTS per https://www.resemble.ai (also open-sourced at https://github.com/resemble-ai/chatterbox, 'SoTA open-source TTS'). Confidence medium because the pricing page lists the rate against the service category 'Text-to-speech' rather than naming Chatterbox Turbo specifically, and the API's exact 'model' parameter slug was not verified against live docs. Multilingual ('Chatterbox Multilingual') and dramatic-read ('DramaBox') variants are marketed separately; modeled here as the English flagship only.


---

_This page is generated from the JSON datasets at 2026-06-03T14:44:15.590Z. Verify any price against the provider's official pricing page before billing decisions; dataset is updated via [GitHub PRs](https://github.com/hail-hq/hail/tree/main/costs)._
