Skip to main content
openai avatar

openai / whisper

Transcribe or translate speech audio into text across 99 languages, with segment/word timestamps and optional speaker diarization.

0.01

Model Input

Input

URL of the audio file to transcribe. Supported formats: mp3, mp4, mpeg, mpga, m4a, wav or webm.

Additional Settings

Customize your input with more control.

Whether to transcribe the audio in its spoken language or translate it into English text.

ISO code of the spoken language. Leave unset to auto-detect.

Annotate which speaker said each chunk. Adds processing time (and therefore cost).

Timestamp granularity: none, segment (default), or word.

Min: 1 - Max: 64

Internal batch size for inference.

Optional text hint to bias transcription toward specific terms or spelling.

Expected number of speakers. Only used when diarize is true; leave unset to auto-detect.

You need to be logged in to run this model and view results.
Log in

Model Output

Output

Breaking news this evening, scientists at the Coastal Research Station have recorded the highest ocean temperatures in more than 40 years, raising fresh concerns about the health of nearby coral reefs.

Generated in 1.675 seconds
Logs (1 lines)

Model Example Requests

Examples

Model Details

Model Details

Whisper Large v3 is OpenAI's open-source, multilingual speech recognition model. Pass a URL to an audio file (mp3, mp4, mpeg, mpga, m4a, wav, or webm) and it returns the full transcript as plain text. It handles 99 languages with automatic language detection, can translate any spoken language into English text, and returns segment- or word-level timestamps plus optional speaker diarization. It is a robust, general-purpose default for turning recordings into searchable, usable text.

## Best for - Transcribing meetings, interviews, podcasts, lectures, and voice notes into text - Multilingual transcription where the spoken language is unknown or mixed (99 languages, auto-detected) - Translating non-English audio directly into English text in one call (`task: "translate"`) - Generating timestamped segments or word-level chunks for captioning and subtitling - Speaker-attributed transcripts of multi-person recordings via diarization

## Choose another model when - You want to generate speech from text rather than transcribe it — use a text-to-speech model - You need live, streaming transcription of an in-progress call — this processes a complete uploaded file and returns a finished transcript - You want maximum-accuracy English transcription with built-in audio-event tagging (laughter, applause) — consider `elevenlabs/scribe-v1`

## Tips - Leave `language` unset to auto-detect the spoken language; set an ISO code (e.g. `en`, `es`, `fr`, `de`, `ja`) only when the language is known, to skip detection. - Set `task` to `translate` to force English output regardless of the source language; keep `transcribe` (default) to output in the spoken language. - Use `chunk_level` to control timestamp granularity: `segment` (default) returns sentence-level chunks, `word` returns per-word timestamps, `none` skips timestamp tokens for a small speed-up. - Enable `diarize` only for multi-speaker audio; it labels who spoke each chunk but adds processing time (and therefore cost, since billing tracks compute time).

## Advanced Configuration - `task` (default `transcribe`): `transcribe` keeps the spoken language; `translate` outputs English. - `chunk_level` (default `segment`): timestamp granularity — `none`, `segment`, or `word`. - `diarize` (boolean, default `false`): annotate which speaker said each chunk. Requires more compute time. - `num_speakers` (default auto): hint the expected speaker count; only used when `diarize` is `true`. - `prompt` (default empty): a text hint to bias transcription toward specific terms or spelling. - `batch_size` (default 64): internal batching; leave at the default unless tuning throughput.

Note on cost: this model bills per transcription request. Pricing is flat per output regardless of audio length.

To run via the ModelRunner JavaScript client: ```js import { modelrunner } from "@modelrunner/client";

const result = await modelrunner.subscribe("openai/whisper", { input: { audio_url: "https://media.modelrunner.ai/iuneUX0YY4AtcsceV9HHp.mp3", task: "transcribe", chunk_level: "segment", }, }); ```