VCTK speaker : Six spoons of fresh snow peas, five thick slabs of blue cheese, and maybe a snack for her brother Bob.
GT
GT mel + MelGAN
Baseline (Spk Emb)
Baseline (Decoder)
AdaSpeech
VCTK speaker : The Greeks used to imagine that it was a sign from the gods to foretell war or heavy rain.
GT
GT mel + MelGAN
Baseline (Spk Emb)
Baseline (Decoder)
AdaSpeech
LJSpeech speaker : Especially as no more time is occupied or cost incurred in casting setting or printing beautiful letters.
GT
GT mel + MelGAN
Baseline (Spk Emb)
Baseline (Decoder)
AdaSpeech
LJSpeech speaker : Printing in the only sense with which we are at present concerned differs from most if not from all the arts and crafts represented in the exhibition.
GT
GT mel + MelGAN
Baseline (Spk Emb)
Baseline (Decoder)
AdaSpeech
Libritts speaker : And so, howsoever reluctantly, she had gone.
GT
GT mel + MelGAN
Baseline (Spk Emb)
Baseline (Decoder)
AdaSpeech
Libritts speaker : All that I am doing is to use its logical tenability as a help in the analysis of what occurs when we remember.
GT
GT mel + MelGAN
Baseline (Spk Emb)
Baseline (Decoder)
AdaSpeech
Ablation Studies
Audios of Ablation Study on VCTK
VCTK speaker : There is , according to legend, a boiling pot of gold at one end.
AdaSpeech
AdaSpeech w/o CLN
AdaSpeech w/o PL-ACM
AdaSpeech w/o UL-ACM
VCTK speaker : Six spoons of fresh snow peas, five thick slabs of blue cheese, and maybe a snack for her brother Bob.
AdaSpeech
AdaSpeech w/o CLN
AdaSpeech w/o PL-ACM
AdaSpeech w/o UL-ACM
Audios of Utterance-level Visualization Analysis
Pink Point in Brown Circle
You little scamp!
Well! why do you not enter?
Blue Point in Brown Circle
The Fairy.
Audios of Finetune CLN and Finetune Other Decoder Parameters
VCTK speaker : Ask her to bring these things with her from the store.
Finetune CLN
Finetune Other Decoder Parameters
Audios of Varying Adaptation Data on AdaSpeech
LJSpeech speaker : especially as no more time is occupied or cost incurred in casting setting or printing beautiful letters.
1 Adaptation Sample
2 Adaptation Samples
5 Adaptation Samples
10 Adaptation Samples
20 Adaptation Samples
VCTK speaker : Six spoons of fresh snow peas, five thick slabs of blue cheese, and maybe a snack for her brother Bob.
1 Adaptation Sample
2 Adaptation Samples
5 Adaptation Samples
10 Adaptation Samples
20 Adaptation Samples
Demo Audio for ICLR 2021 Response
[speaker embedding with the utterance-level vector extracted from a different speaker] for AnonReviewer5
VCTK speaker : Ask her to bring these things with her from the store.
speaker embedding 306F with reference speech 306F
speaker embedding 306F with reference speech 361F
speaker embedding 306F with reference speech 345M
reference speech 306F
reference speech 361F
reference speech 345M
VCTK speaker : She can scoop these things into three red bags, and we will go meet her Wednesday at the train station.
speaker embedding 345M with reference speech 345M
speaker embedding 345M with reference speech 360M
speaker embedding 345M with reference speech 306F
reference speech 345M
reference speech 360M
reference speech 306F
[exp1] for AnonReviewer5
VCTK speaker : Ask her to bring these things with her from the store.
Adaspeech With Noisy Reference Speech
Noisy Reference Speech
Adaspeech With Clean Reference Speech
Clean Reference Speech
VCTK speaker : We also need a small plastic snake and a big toy frog for the kids.
Adaspeech With Noisy Reference Speech
Noisy Reference Speech
Adaspeech With Clean Reference Speech
Clean Reference Speech
[exp2] for AnonReviewer5
Finetune DataSet
Speech 1
Speech 2
VCTK speaker : Ask her to bring these things with her from the store.
Adaspeech With Clean Reference Speech
Clean Reference Speech
VCTK speaker : When the sunlight strikes raindrops in the air, they act as a prism and form a rainbow.
Adaspeech With Clean Reference Speech
Clean Reference Speech
[exp3] for AnonReviewer5
VCTK speaker : Six spoons of fresh snow peas, five thick slabs of blue cheese, and maybe a snack for her brother Bob.
With Phoneme-level
Without Phoneme-level
Libritts speaker : It is not logically necessary to the existence of a memory belief that the event remembered should have occurred, or even that the past should have existed at all.
With Phoneme-level
Without Phoneme-level
[Some speakers don't sound so good] for AnonReviewer2
VCTK speaker : People look, but no one ever finds it.
GT
GT mel + MelGAN
Baseline (Spk Emb)
Baseline (Decoder)
AdaSpeech
VCTK speaker : Please call Stella.
GT
GT mel + MelGAN
Baseline (Spk Emb)
Baseline (Decoder)
AdaSpeech
VCTK speaker : Throughout the centuries people have explained the rainbow in various ways.
GT
GT mel + MelGAN
Baseline (Spk Emb)
Baseline (Decoder)
AdaSpeech
VCTK speaker : Some have accepted it as a miracle without physical explanation.