meta / musicgen

A fast, controllable auto-regressive Transformer for high-fidelity music generation.

A100

GitHub

License

Paper

Input

model_version

Model to use for generation

Prompt

A description of the music you want to generate.

Input Audio

Duration

Duration of the generated audio in seconds.

Continuation

If `True`, generated music will continue from `input_audio`. Otherwise, generated music will mimic `input_audio`'s melody.

Continuation Start

Min: 0

Start time of the audio file to use for continuation.

Continuation End

Min: 0

End time of the audio file to use for continuation. If -1 or None, will default to the end of the audio clip.

Multi Band Diffusion

If `True`, the EnCodec tokens will be decoded with MultiBand Diffusion. Only works with non-stereo models.

normalization_strategy

Strategy for normalizing audio.

Top K

Reduces sampling to the k most likely tokens.

Top P

Reduces sampling to tokens with cumulative probability of p. When set to `0` (default), top_k sampling is used.

Temperature

Controls the 'conservativeness' of the sampling process. Higher temperature means more diversity.

Classifier Free Guidance

Increases the influence of inputs on the output. Higher values produce lower-varience outputs that adhere more closely to inputs.

output_format

Output format for generated audio.

Seed

Seed for random number generator. If None or -1, a random seed will be used.

Output

Readme

### Musicgen Model Description MusicGen by AudioCraft is a single-stage, auto-regressive Transformer that generates high-quality music at 32 kHz using a 4-codebook EnCodec tokenizer. Trained on 20 000 hours of licensed tracks, it predicts all codebooks in parallel with only 50 autoregressive steps per second—no separate semantic embeddings required.

### Models & Demos - **Scales:** small, medium, large, melody, stereo - **Demos:** Hugging Face Space, Colab notebook, local Gradio app, Jupyter examples

### Key Benefits - **Efficiency:** Parallel codebook prediction for faster inference - **Flexibility:** Text-only or text+melody conditioning across multiple model sizes - **Accessibility:** Available via 🤗 Transformers (v4.31.0+) with minimal dependencies

meta / musicgen

Model Input

Input

Model Output

Output

Model Readme

Readme