meta / musicgen

A fast, controllable auto-regressive Transformer for high-fidelity music generation.

A100

0.07

Input

model_version

Model to use for generation

Prompt

A description of the music you want to generate.

Input Audio

Upload a file

Drag and drop a file here, or click to select a file

An audio file that will influence the generated music. If `continuation` is `True`, the generated music will be a continuation of the audio file. Otherwise, the generated music will mimic the audio file's melody.

Duration

Duration of the generated audio in seconds.

Continuation

If `True`, generated music will continue from `input_audio`. Otherwise, generated music will mimic `input_audio`'s melody.

Continuation Start

Min: 0

Start time of the audio file to use for continuation.

Continuation End

Min: 0

End time of the audio file to use for continuation. If -1 or None, will default to the end of the audio clip.

Multi Band Diffusion

If `True`, the EnCodec tokens will be decoded with MultiBand Diffusion. Only works with non-stereo models.

normalization_strategy

Strategy for normalizing audio.

Top K

Reduces sampling to the k most likely tokens.

Top P

Reduces sampling to tokens with cumulative probability of p. When set to `0` (default), top_k sampling is used.

Temperature

Controls the 'conservativeness' of the sampling process. Higher temperature means more diversity.

Classifier Free Guidance

Increases the influence of inputs on the output. Higher values produce lower-varience outputs that adhere more closely to inputs.

output_format

Output format for generated audio.

Seed

Seed for random number generator. If None or -1, a random seed will be used.

You need to be logged in to run this model and view results.

Output

Fill in the input form and click submit to see the output

Logs (1 lines)

Model Details

### Musicgen Model Description MusicGen by AudioCraft is a single-stage, auto-regressive Transformer that generates high-quality music at 32 kHz using a 4-codebook EnCodec tokenizer. Trained on 20 000 hours of licensed tracks, it predicts all codebooks in parallel with only 50 autoregressive steps per second—no separate semantic embeddings required.

### Models & Demos - **Scales:** small, medium, large, melody, stereo - **Demos:** Hugging Face Space, Colab notebook, local Gradio app, Jupyter examples

### Key Benefits - **Efficiency:** Parallel codebook prediction for faster inference - **Flexibility:** Text-only or text+melody conditioning across multiple model sizes - **Accessibility:** Available via 🤗 Transformers (v4.31.0+) with minimal dependencies

meta / musicgen

Model Input

Input

Upload a file

Model Output

Output

Model Details

Model Details