ElevenLabs Scribe v1 API

Transcribe speech audio into accurate text with word-level timestamps, speaker labels, and audio-event tags across 99 languages.

0.03

OpenAPI

Input

Output

And so my fellow Americans ask not what your country can do for you, ask what you can do for your country. (applause)

{
  "error": "",
  "inferenceTime": 3961,
  "output": "And so my fellow Americans ask not what your country can do for you, ask what you can do for your country. (applause)",
  "input": {
    "diarize": true,
    "audio_url": "https://media.modelrunner.ai/CQfRLPYP1JYJ7QkD-jfk_speech.wav",
    "tag_audio_events": true
  },
  "logs": "Generated 1 output(s)"
}

Generated in 3.961 seconds

Logs (1 lines)

Examples

And so my fellow Americans ask not what your country can do for you, ask what you can do for your country. (applause)

ElevenLabs Scribe v1 API

ElevenLabs Scribe v1 is a speech-to-text AI model by elevenlabs. On ModelRunner it runs through a REST API or via MCP from any AI assistant, at $0.03 per response.

POST https://queue.modelrunner.run/elevenlabs/scribe-v1

cURL

# Submit a request to the queue. Input fields go at the top level of the
# body. The optional reserved "metadata" object holds your own flat string
# tags — stored on the request, never sent to the model; filter later with
# GET https://queue.modelrunner.run/requests?metadata=<url-encoded JSON>.
curl -X POST https://queue.modelrunner.run/elevenlabs/scribe-v1 \
  -H "Authorization: Key $MRUN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "diarize": true,
    "audio_url": "https://media.modelrunner.ai/CQfRLPYP1JYJ7QkD-jfk_speech.wav",
    "tag_audio_events": true,
    "metadata": {
      "project": "my-project"
    }
  }'
# → { "request_id": "...", "status_url": "...", "response_url": "..." }

# Poll status_url until "COMPLETED", then fetch the result
curl "https://queue.modelrunner.run/elevenlabs/scribe-v1/requests/$REQUEST_ID/status" \
  -H "Authorization: Key $MRUN_API_KEY"
curl "https://queue.modelrunner.run/elevenlabs/scribe-v1/requests/$REQUEST_ID" \
  -H "Authorization: Key $MRUN_API_KEY"

JavaScript

import { modelrunner } from "@modelrunner/client";

const result = await modelrunner.subscribe("elevenlabs/scribe-v1", {
  input: {
    "diarize": true,
    "audio_url": "https://media.modelrunner.ai/CQfRLPYP1JYJ7QkD-jfk_speech.wav",
    "tag_audio_events": true
  },
});
console.log(result);

Python

import os
import requests

headers = {"Authorization": f"Key {os.environ['MRUN_API_KEY']}"}

submitted = requests.post(
    "https://queue.modelrunner.run/elevenlabs/scribe-v1",
    headers=headers,
    json={
      "diarize": true,
      "audio_url": "https://media.modelrunner.ai/CQfRLPYP1JYJ7QkD-jfk_speech.wav",
      "tag_audio_events": true
    },
).json()

# Poll submitted["status_url"] until "COMPLETED", then:
result = requests.get(submitted["response_url"], headers=headers).json()

Input parameters

Name	Type	Required	Description
audio_url	string (uri)	yes	URL of the audio file to transcribe. Supported formats: mp3, wav, m4a, ogg, aac.
language_code	string	no	ISO-639 language code of the audio (e.g. eng, spa, fra, deu, jpn). Leave unset to auto-detect the spoken language.
tag_audio_events	boolean	no	Tag non-speech audio events like laughter and applause inline in the transcript. Default: true.
diarize	boolean	no	Annotate which speaker said each word. Default: true.

Machine-readable: OpenAPI schema · llms.txt

Use ElevenLabs Scribe v1 from Claude & Cursor (MCP)

Point Claude Code, Claude Desktop, Cursor, or any MCP client at the ModelRunner MCP server and ElevenLabs Scribe v1 becomes a tool your assistant can call directly — it authorizes via OAuth (no API key in config) and runs this model with the run_model tool using the endpoint elevenlabs/scribe-v1.

MCP client config (Claude Desktop, Cursor)

{
  "mcpServers": {
    "modelrunner": {
      "command": "npx",
      "args": ["-y", "mcp-remote", "https://mcp.modelrunner.run/mcp"]
    }
  }
}

Claude Code

claude mcp add --transport http modelrunner https://mcp.modelrunner.run/mcp

Then ask your assistant, for example: “Run elevenlabs/scribe-v1 on ModelRunner to generate text”. MCP setup guide.

Model Details

ElevenLabs Scribe v1 turns spoken-audio files into accurate written text. Pass a URL to an audio recording (mp3, wav, m4a, ogg, or aac) and it returns the full transcript as a plain string, with state-of-the-art accuracy across 99 languages. It auto-detects the spoken language, can label who is speaking (diarization), and tags non-speech audio events like laughter and applause — making it a strong default for turning recordings into usable, searchable text.

## Best for - Transcribing meetings, interviews, podcasts, and voice notes into text - Captioning and subtitling source audio with reliable word boundaries - Multilingual transcription where the spoken language is unknown or mixed (99 languages, auto-detected) - Speaker-attributed transcripts of multi-person conversations using diarization - Building searchable archives or downstream NLP from spoken-audio content

## Choose another model when - You want to generate speech from text rather than transcribe it — use a text-to-speech model - You need to translate audio into a different language's text — this transcribes in the spoken language, it is not a speech translator - You need live, streaming transcription of an in-progress call — this processes a complete uploaded file and returns a finished transcript

## Tips - Leave `language_code` unset to auto-detect the spoken language; set it to an ISO-639 code (e.g. `eng`, `spa`, `fra`, `deu`, `jpn`) only when you already know the language and want to skip detection. - Keep `diarize` enabled (default) for multi-speaker recordings; the model attributes each word to a speaker. Set it to `false` for single-speaker audio to skip speaker labeling. - Keep `tag_audio_events` enabled (default) to mark non-speech sounds (laughter, applause) inline; set it to `false` for a clean speech-only transcript. - Use clear, reasonably loud source audio — heavy background noise and overlapping speech reduce accuracy.

## Advanced Configuration - `language_code` (default auto-detect): an ISO-639 language code that forces the transcription language instead of detecting it. Useful when the audio is short or the language is known in advance. - `tag_audio_events` (boolean, default `true`): when `true`, non-speech events such as laughter and applause are tagged inline in the transcript. - `diarize` (boolean, default `true`): when `true`, annotates which speaker said each word.

To run via the ModelRunner JavaScript client: ```js import { modelrunner } from "@modelrunner/client";

const result = await modelrunner.subscribe("elevenlabs/scribe-v1", { input: { audio_url: "https://storage.googleapis.com/falserverless/web-examples/elevenlabs/sample.mp3", diarize: true, tag_audio_events: true, }, }); ```

ElevenLabs Scribe v1 API

Model Input

Input

Additional Settings

Model Output

Output

Model Example Requests

Examples

ElevenLabs Scribe v1 API

cURL

JavaScript

Python

Input parameters

Use ElevenLabs Scribe v1 from Claude & Cursor (MCP)

MCP client config (Claude Desktop, Cursor)

Claude Code

Model Details

Model Details