MaskVCT: Masked Voice Codec Transformer for Zero-Shot Voice Conversion

Zero-Shot Voice Conversion Samples

LibriTTS→LibriTTS Samples

Sample Set 0

Source

Target

Diff-HierVC

FACodec

MaskGCT-S2A

FreeVC

GenVC

MaskVCT-All

MaskVCT-Spk

Sample Set 1

Source

Target

Diff-HierVC

FACodec

MaskGCT-S2A

FreeVC

GenVC

MaskVCT-All

MaskVCT-Spk

Sample Set 2

Source

Target

Diff-HierVC

FACodec

MaskGCT-S2A

FreeVC

GenVC

MaskVCT-All

MaskVCT-Spk

Sample Set 3

Source

Target

Diff-HierVC

FACodec

MaskGCT-S2A

FreeVC

GenVC

MaskVCT-All

MaskVCT-Spk

Sample Set 4

Source

Target

Diff-HierVC

FACodec

MaskGCT-S2A

FreeVC

GenVC

MaskVCT-All

MaskVCT-Spk

LibriTTS→L2 Samples

Sample Set 0

Source

Target

FACodec

MaskGCT-S2A

FreeVC

MaskVCT-Spk

Sample Set 1

Source

Target

FACodec

MaskGCT-S2A

FreeVC

MaskVCT-Spk

Sample Set 2

Source

Target

FACodec

MaskGCT-S2A

FreeVC

MaskVCT-Spk

Sample Set 3

Source

Target

FACodec

MaskGCT-S2A

FreeVC

MaskVCT-Spk

Sample Set 4

Source

Target

FACodec

MaskGCT-S2A

FreeVC

MaskVCT-Spk

L2→L2 Samples

Sample Set 0

Source

Target

FACodec

MaskGCT-S2A

FreeVC

MaskVCT-Spk

Sample Set 1

Source

Target

FACodec

MaskGCT-S2A

FreeVC

MaskVCT-Spk

Sample Set 2

Source

Target

FACodec

MaskGCT-S2A

FreeVC

MaskVCT-Spk

Sample Set 3

Source

Target

FACodec

MaskGCT-S2A

FreeVC

MaskVCT-Spk

Sample Set 4

Source

Target

FACodec

MaskGCT-S2A

FreeVC

MaskVCT-Spk