OD-SLA-5.10

Towards Unseen Speakers Zero-Shot Voice Conversion with Generative Adversarial Networks

WeiRui Lu, Xiaofen Xing, Xiangmin Xu, South China University of Technology, China; Weibin Zhang, Shenzhen VoiceAI Technology Co. Ltd., China

Session:
Speech Synthesis and Voice Conversion

Track:
Speech, Language, and Audio (SLA)

Session Time:
Thu, 16 Dec, 16:20 - 18:20 Japan Standard Time (UTC +9)
Thu, 16 Dec, 07:20 - 09:20 Coordinated Universal Time
Thu, 16 Dec, 02:20 - 04:20 Eastern Standard Time (UTC -4)
Wed, 15 Dec, 23:20 - 01:20 Pacific Standard Time (UTC -7)

Session Chair:
Takuma Okamoto, NICT
Presentation
Not logged in.
Discussion
Not logged in.
Resources
Not logged in.
Session OD-SLA-5
TH3.OD-A.1: EMOTION-CONTROLLABLE SPEECH SYNTHESIS USING EMOTION SOFT LABELS AND FINE-GRAINED PROSODY FACTORS
Xuan Luo, Shinnosuke Takamichi, Tomoki Koriyama, Yuki Saito, Hiroshi Saruwatari, The University of Tokyo, Japan
TH3.OD-A.2: CA-VC: A NOVEL ZERO-SHOT VOICE CONVERSION METHOD WITH CHANNEL ATTENTION
Ruitong Xiao, Xiaofen Xing, Jichen Yang, Xiangmin Xu, south China university of technology, China
TH3.OD-A.3: CONDITIONAL DEEP HIERARCHICAL VARIATIONAL AUTOENCODER FOR VOICE CONVERSION
Kei Akuzawa, The University of Tokyo, Japan; Kotaro Onishi, The University of Electro-Communications, Tokyo, Japan; Keisuke Takiguchi, Kohki Mametani, Koichiro Mori, DeNA Co., Ltd., Japan
TH3.OD-A.4: NOISY-TO-NOISY VOICE CONVERSION FRAMEWORK WITH DENOISING MODEL
Chao Xie, Yi-Chiao Wu, Patrick Lumban Tobing, Wen-Chin Huang, Tomoki Toda, Nagoya University, Japan
TH3.OD-A.5: ACOUSTIC SIMULATION OF BODY-CONDUCTED SPEECH AND ITS USE TO CONVERT ONE'S RECORDED VOICES TO ONE'S OWN VOICES
Ruiyan Chen, Tazuko Nishimura, Nobuaki Minematsu, Daisuke Saito, The University of Tokyo, Japan
TH3.OD-A.6: SPEECH RECONSTRUCTION FROM THE LARYNX VIBRATION FEATURE CAPTURED BY LASER-DOPPLER VIBROMETER SENSOR
Yi-Chieh Lin, Ji-Yan Han, Yu-Min Lin, Wei-Zhong Zheng, Ying-Hui Lai, National Yang Ming Chiao Tung University, Taiwan; Shuenn-Tsong Young, MacKay Medical College, Taiwan
TH3.OD-A.7: StarGAN-based Emotional Voice Conversion for Japanese Phrases
Asuka Moritani, Shoki Sakamoto, Ryo Ozaki, Tadahiro Taniguchi, Ritsumeikan University, Japan; Hirokazu Kameoka, NTT Corporation, Japan
TH3.OD-A.8: Understanding the Tradeoffs in Client-side Privacy for Downstream Speech Tasks
Peter Wu, Paul Liang, Jiatong Shi, Ruslan Salakhutdinov, Shinji Watanabe, Louis-Philippe Morency, Carnegie Mellon University, United States of America
TH3.OD-A.9: MULTI-SPEAKER TTS SYSTEM FOR LOW-RESOURCE LANGUAGE USING CROSS-LINGUAL TRANSFER LEARNING AND DATA AUGMENTATION
Zolzaya Byambadorj, Ryota Nishimura, Tokushima University, Japan; Altangerel Ayush, Mongolian University of Science and Technology, Mongolia; Kengo Ohta, National Institute of Technology, Anan College, Japan; Norihide Kitaoka, Toyohashi University of Technology, Japan
TH3.OD-A.10: Towards Unseen Speakers Zero-Shot Voice Conversion with Generative Adversarial Networks
WeiRui Lu, Xiaofen Xing, Xiangmin Xu, South China University of Technology, China; Weibin Zhang, Shenzhen VoiceAI Technology Co. Ltd., China
TH3.OD-A.11: LOW-RESOURCE MANDARIN PROSODIC STRUCTURE PREDICTION USING SELF-TRAINING
Xingrui Wang, Bowen Zhang, Takahiro Shinozaki, Tokyo Institute of Technology, Japan
TH3.OD-A.12: SPTTS: PARALLEL SPEECH SYNTHESIS WITHOUT EXTRA ALIGNER MODEL
Zeqing Zhao, Xi Chen, Hui Liu, XuYang Wang, Lin Yang, Junjie Wang, Lenovo Research, China
TH3.OD-A.13: INVESTIGATION OF TEXT-TO-SPEECH-BASED SYNTHETIC PARALLEL DATA FOR SEQUENCE-TO-SEQUENCE NON-PARALLEL VOICE CONVERSION
Ding Ma, Wen-chin Huang, Tomoki Toda, Nagoya University, Japan