Mel reconstruction loss (top) and Fréchet Audio Distance (FAD, bottom) for VampNet samples taken with varying numbers of sampling steps, taken using a periodic prompt of P=16
. The samples were generated by de-compressing tokens at an extremely low bitrate (50 bps), effectively generating musical variations of the input signals.
Input used as prompt for the model.
Output after 1 step of sampling.
Output after 4 steps of sampling.
Output after 12 steps of sampling.
Output after 36 steps of sampling
Output after 64 steps of sampling.