@drewharwell I find this bit just perplexing: >...

@drewharwell I find this bit just perplexing:

> For VALL-E to generate a good result, the voice in the three-second sample must closely match a voice in the training data.

It’s not that far off saying “I can play any tune on the piano, as long as it sounds just like chopsticks”

Like 10 Jan 2023 at 16:41 | Wall-to-wall | Open on mastodon.social