OpenAI: GPT-4o Audio
Analysis Summary
GPT-4o Audio is OpenAI's audio-native variant of GPT-4o, supporting text and audio in both directions with tool use and function calling across a 128K context window. The audio modality is a genuine differentiator for voice-first applications, customer service automation, and real-time transcription workflows. Reasoning and coding benchmarks are modest, with an intelligence index of 12.8 and a coding index of 13.1.
For businesses building voice interfaces, call centre automation, or audio content pipelines, this model fills a specific gap that text-only models cannot address. The instruction-following and general reasoning capability is adequate for conversational tasks, though it is not suited for complex analytical or coding workloads.
At $2.50/1M input and $10/1M output, it is expensive for what the benchmarks suggest in terms of raw reasoning power. Adoption is best justified where the audio modality is a core requirement, not as a general-purpose reasoning model.
Assessed June 6, 2026
Editorial notes
GPT-4o Audio from OpenAI adds native audio input and output to the GPT-4o family, with tool use and function calling, but reasoning and coding benchmarks are modest and pricing is high.
Rankings consider pricing, capabilities, benchmarks, and real-world applicability and are refreshed as new models launch. Feedback?
Performance Profile
How OpenAI: GPT-4o Audio compares
OpenAI: GPT-4o Audio ranks #270 of 378 AI models we track for overall intelligence, #209 of 315 for coding. Its 128K-token context window is larger than 43% of the models we list. At $2.50 per million input tokens it is cheaper than 13% of comparable models.
About OpenAI: GPT-4o Audio
The gpt-4o-audio-preview model adds support for audio inputs as prompts. This enhancement allows the model to detect nuances within audio recordings and add depth to generated user experiences. Audio outputs..
Capabilities
Performance Indices
Source: Artificial Analysis
Benchmark Scores
Intelligence
Technical
Benchmark data from Artificial Analysis and Hugging Face
How does OpenAI: GPT-4o Audio stack up?
Compare side-by-side with other professional models.
Model Information
Pricing
| Token Type | Cost per 1M tokens | Cost per 1K tokens |
|---|---|---|
| Input | $2.50 | $0.002500 |
| Output | $10.00 | $0.010000 |
Leaderboard Categories
External Resources
Explore Related Models
Frequently asked questions about OpenAI: GPT-4o Audio
How much does OpenAI: GPT-4o Audio cost?
OpenAI: GPT-4o Audio costs $2.50 per million input tokens and $10.00 per million output tokens.
What is the context window of OpenAI: GPT-4o Audio?
OpenAI: GPT-4o Audio has a context window of 128,000 tokens (128K).
Is OpenAI: GPT-4o Audio good for coding?
On our coding benchmark index, OpenAI: GPT-4o Audio ranks #209 of 315 models, placing it in the broader range of the field for code generation and debugging.
What can OpenAI: GPT-4o Audio do?
OpenAI: GPT-4o Audio supports tool use and function calling.
Who created OpenAI: GPT-4o Audio?
OpenAI: GPT-4o Audio is developed by OpenAI and was released on August 15, 2025.
Data sourced from OpenRouter API, Artificial Analysis and Hugging Face Open LLM Leaderboard. Scores are editorially curated by our team.
Last updated: June 11, 2026 8:38 pm