Mel reconstruction loss (top) and Fréchet Audio Distance (FAD, bottom) for VampNet samples taken with varying numbers of sampling steps, taken using a periodic prompt of . The samples were generated by de-compressing tokens at an extremely low bitrate (50 bps), effectively generating musical variations of the input signals.

Mel reconstruction loss (top) and Fréchet Audio Distance (FAD, bottom) for VampNet samples taken with varying numbers of sampling steps, taken using a periodic prompt of P=16. The samples were generated by de-compressing tokens at an extremely low bitrate (50 bps), effectively generating musical variations of the input signals.

Input

Input used as prompt for the model.

in.mp3

1 Sampling Step

Output after 1 step of sampling.

1.wav

4 Sampling Steps

Output after 4 steps of sampling.

4.wav

12 Sampling Steps

Output after 12 steps of sampling.

12.wav

36 Sampling Steps

Output after 36 steps of sampling

36.wav

64 Sampling Steps

Output after 64 steps of sampling.

64.wav