Multiscale Mel-spectrogram loss (top) and Fréchet Audio Distance (FAD, bottom) for VampNet samples taken with a different types of prompts.
Here, we can examine the effect of different token prompting techniques on the outputs generated by VampNet.
Input used as prompt to the model.
P=16
)Here, a periodic prompt of 16 is used to condition the model. P=16 means that one every 16 tokens in a sequence are unmasked, meaning that about 6% of the tokens in the sequence are unmasked, while the remaining 94% are masked.
Masked Prompt
Output
P=32
)Here, a periodic prompt of 32 is used to condition the model. P=32 means that one every 32 tokens in a sequence are unmasked, meaning that about 3% of the tokens in the sequence are unmasked, while the remaining 97% are masked.
Masked Prompt
Output