| Output $/MTok | $0.60 |
|---|---|
| Input $/MTok | $0.15 |
| Cached input $/MTok | $0.0750 |
| Context window | 128,000 |
| Output cap | 16,384 |
| Modalities | in: text, image / out: text |
| Tool use | ✓ |
| Structured output | ✓ |
| Family | GPT-4o |
| Knowledge cutoff | 2023-10-01 |
| Verified | 2026-05-17 |
| Source | provider page ↗ |
| $/1M chars | $25.00 | $50.00 |
|---|---|---|
| Voice quality | neural | neural |
| Voice cloning | ✓ included | ✓ included |
| Voice count | 217 | — |
| Languages | en, hi, es, mr, kn, ta, bn, gu, te, ml, pa, or | 42+ |
| SSML support | — | — |
| TTFB | 200ms | 90ms |
| Output formats | pcm, wav, mp3, mulaw | raw/pcm_f32le, raw/pcm_s16le, raw/pcm_mulaw, raw/pcm_alaw, wav, mp3 |
| Verified | 2026-05-19 | 2026-05-05stale 32d |
| Source | provider page ↗ | provider page ↗ |