Profile

meta / musicgen

A fast, controllable auto-regressive Transformer for high-fidelity music generation.

Model Input

Input

Model to use for generation

A description of the music you want to generate.

An audio file that will influence the generated music. If `continuation` is `True`, the generated music will be a continuation of the audio file. Otherwise, the generated music will mimic the audio file's melody.

Duration of the generated audio in seconds.

If `True`, generated music will continue from `input_audio`. Otherwise, generated music will mimic `input_audio`'s melody.

Min: 0

Start time of the audio file to use for continuation.

Min: 0

End time of the audio file to use for continuation. If -1 or None, will default to the end of the audio clip.

If `True`, the EnCodec tokens will be decoded with MultiBand Diffusion. Only works with non-stereo models.

Strategy for normalizing audio.

Reduces sampling to the k most likely tokens.

Reduces sampling to tokens with cumulative probability of p. When set to `0` (default), top_k sampling is used.

Controls the 'conservativeness' of the sampling process. Higher temperature means more diversity.

Increases the influence of inputs on the output. Higher values produce lower-varience outputs that adhere more closely to inputs.

Output format for generated audio.

Seed for random number generator. If None or -1, a random seed will be used.

You need to be logged in to run this model and view results.
Log in

Model Output

Output

Fill in the input form and click submit to see the output
Logs (1 lines)

Model Details

Model Details

### Musicgen Model Description MusicGen by AudioCraft is a single-stage, auto-regressive Transformer that generates high-quality music at 32 kHz using a 4-codebook EnCodec tokenizer. Trained on 20 000 hours of licensed tracks, it predicts all codebooks in parallel with only 50 autoregressive steps per second—no separate semantic embeddings required.

### Models & Demos - **Scales:** small, medium, large, melody, stereo - **Demos:** Hugging Face Space, Colab notebook, local Gradio app, Jupyter examples

### Key Benefits - **Efficiency:** Parallel codebook prediction for faster inference - **Flexibility:** Text-only or text+melody conditioning across multiple model sizes - **Accessibility:** Available via 🤗 Transformers (v4.31.0+) with minimal dependencies