OpenAI: GPT-4o Audio

OpenAI: GPT-4o Audio

openai · Released Aug 15, 2025 Professional
Intelligence #10 / 576
82.0 Our Score
Speed #253 / 271
32.9 tokens / sec
Input #504 / 577
$2.50 per 1M tokens
Output #493 / 577
$10.00 per 1M tokens
Context #328 / 577
128,000 tokens

Analysis Summary

GPT-4o Audio is OpenAI's audio-native variant of GPT-4o, supporting text and audio in both directions with tool use and function calling across a 128K context window. The audio modality is a genuine differentiator for voice-first applications, customer service automation, and real-time transcription workflows. Reasoning and coding benchmarks are modest, with an intelligence index of 12.8 and a coding index of 13.1.

For businesses building voice interfaces, call centre automation, or audio content pipelines, this model fills a specific gap that text-only models cannot address. The instruction-following and general reasoning capability is adequate for conversational tasks, though it is not suited for complex analytical or coding workloads.

At $2.50/1M input and $10/1M output, it is expensive for what the benchmarks suggest in terms of raw reasoning power. Adoption is best justified where the audio modality is a core requirement, not as a general-purpose reasoning model.

Assessed June 6, 2026

Editorial notes

GPT-4o Audio from OpenAI adds native audio input and output to the GPT-4o family, with tool use and function calling, but reasoning and coding benchmarks are modest and pricing is high.

Rankings consider pricing, capabilities, benchmarks, and real-world applicability and are refreshed as new models launch. Feedback?

Performance Profile

Intelligence2.6Technical2.5Value6.3Content0
Intelligence 2.6/10
Technical 2.5/10
Content 0/10
Value 6.3/10

How OpenAI: GPT-4o Audio compares

OpenAI: GPT-4o Audio ranks #270 of 378 AI models we track for overall intelligence, #209 of 315 for coding. Its 128K-token context window is larger than 43% of the models we list. At $2.50 per million input tokens it is cheaper than 13% of comparable models.

About OpenAI: GPT-4o Audio

The gpt-4o-audio-preview model adds support for audio inputs as prompts. This enhancement allows the model to detect nuances within audio recordings and add depth to generated user experiences. Audio outputs..

Capabilities

Tool Use Function Calling

Performance Indices

Source: Artificial Analysis

12.8 Intelligence Index
13.1 Coding Index
6 Math Index

Benchmark Scores

Intelligence

GPQA Diamond 54.3% Graduate-level scientific reasoning
HLE 3.3% Humanity's Last Exam
MMLU Pro 74.8% Multi-task language understanding
MATH 500 75.9% Mathematical problem-solving
AIME 15% Competition mathematics
SciCode 33.3% Scientific computing

Technical

LiveCodeBench 30.9% Live coding evaluation

Benchmark data from Artificial Analysis and Hugging Face

How does OpenAI: GPT-4o Audio stack up?

Compare side-by-side with other professional models.

Compare Models

Model Information

OpenRouter ID openai/gpt-4o-audio-preview
Provideropenai
Model FamilyGPT-4o
Release Date August 15, 2025
Context Length128,000 tokens
Max Completion16,384 tokens
Status Active

Pricing

Token Type Cost per 1M tokens Cost per 1K tokens
Input $2.50 $0.002500
Output $10.00 $0.010000

Leaderboard Categories

Frequently asked questions about OpenAI: GPT-4o Audio

How much does OpenAI: GPT-4o Audio cost?

OpenAI: GPT-4o Audio costs $2.50 per million input tokens and $10.00 per million output tokens.

What is the context window of OpenAI: GPT-4o Audio?

OpenAI: GPT-4o Audio has a context window of 128,000 tokens (128K).

Is OpenAI: GPT-4o Audio good for coding?

On our coding benchmark index, OpenAI: GPT-4o Audio ranks #209 of 315 models, placing it in the broader range of the field for code generation and debugging.

What can OpenAI: GPT-4o Audio do?

OpenAI: GPT-4o Audio supports tool use and function calling.

Who created OpenAI: GPT-4o Audio?

OpenAI: GPT-4o Audio is developed by OpenAI and was released on August 15, 2025.