Profile

meta / musicgen

A fast, controllable auto-regressive Transformer for high-fidelity music generation.

Model Input

Input

Model to use for generation

A description of the music you want to generate.

Duration of the generated audio in seconds.

If `True`, generated music will continue from `input_audio`. Otherwise, generated music will mimic `input_audio`'s melody.

Min: 0

Start time of the audio file to use for continuation.

Min: 0

End time of the audio file to use for continuation. If -1 or None, will default to the end of the audio clip.

If `True`, the EnCodec tokens will be decoded with MultiBand Diffusion. Only works with non-stereo models.

Strategy for normalizing audio.

Reduces sampling to the k most likely tokens.

Reduces sampling to tokens with cumulative probability of p. When set to `0` (default), top_k sampling is used.

Controls the 'conservativeness' of the sampling process. Higher temperature means more diversity.

Increases the influence of inputs on the output. Higher values produce lower-varience outputs that adhere more closely to inputs.

Output format for generated audio.

Seed for random number generator. If None or -1, a random seed will be used.

Model Output

Output

Model Readme

Readme

### Musicgen Model Description MusicGen by AudioCraft is a single-stage, auto-regressive Transformer that generates high-quality music at 32 kHz using a 4-codebook EnCodec tokenizer. Trained on 20 000 hours of licensed tracks, it predicts all codebooks in parallel with only 50 autoregressive steps per second—no separate semantic embeddings required.

### Models & Demos - **Scales:** small, medium, large, melody, stereo - **Demos:** Hugging Face Space, Colab notebook, local Gradio app, Jupyter examples

### Key Benefits - **Efficiency:** Parallel codebook prediction for faster inference - **Flexibility:** Text-only or text+melody conditioning across multiple model sizes - **Accessibility:** Available via 🤗 Transformers (v4.31.0+) with minimal dependencies