Model Details
Stable Audio 2.5 turns a written description into a finished audio clip — a piece of music, a soundscape, ambience, or a sound effect — up to about 190 seconds long, returned as a WAV file. Write what you want in the `prompt` ("driving synthwave with a punchy kick and arpeggiated bass", "gentle rain on a window with distant thunder", "upbeat corporate acoustic bed, no vocals") and set `seconds_total` to control the length. Its standout strength is long-form generation: unlike short sound-effect models capped at a few seconds, a single call can produce minutes-long tracks suitable for full backing music, and it was trained on a fully licensed dataset for commercial-safe output.
## Best for - Background music and instrumental beds for videos, ads, podcasts, and games - Long-form tracks and loops up to about three minutes from a single text prompt - Ambience and soundscapes (rain, cafe noise, forest, room tone) for scenes - One-off sound effects and foley described in plain language - Royalty-conscious audio where a commercially-safe, licensed-data model matters
## Choose another model when - You want to transform or restyle an existing audio clip rather than generate from text — use an audio-to-audio model - You need natural spoken narration or a specific voice — use a text-to-speech model - You only need a very short one-shot effect and want per-second billing on tiny clips — a per-second sound-effect model may be cheaper
## Tips - `seconds_total` accepts 1–190 seconds; billing is a flat rate per generation, so longer clips cost the same as short ones - Describe genre, instrumentation, mood, and tempo in the prompt for music; describe the source, environment, and materials for sound effects - Say "no vocals" or "instrumental" in the prompt when you want a clean music bed - Raise `num_inference_steps` (up to 8) for a quality bump; raise `guidance_scale` for stricter prompt adherence
## Limitations - Output is a single WAV clip per call; there is no multi-track or stem separation - Very short durations can produce less musically-developed results than longer clips
To run via the ModelRunner JavaScript client: ```js import { modelrunner } from "@modelrunner/client";
const result = await modelrunner.subscribe("stability-ai/stable-audio-2.5/text-to-audio", { input: { prompt: "upbeat lofi hip hop instrumental with a warm vinyl texture", seconds_total: 60, }, }); ```
