Technical Program

Click on the icon to view the manuscript on IEEE XPlore in the IEEE ICASSP 2020 Open Preview.

SPE-P3: Machine Learning for Speech Synthesis I

Session Type: Poster
Time: Tuesday, 5 May, 16:30 - 18:30
Location: On-Demand
Virtual Session: View on Virtual Platform
Session Chairs: Jianhua Tao, Chinese Academy of Sciences and Thomas Drugman, Amazon
 
 SPE-P3.1: SCALABLE MULTILINGUAL FRONTEND FOR TTS
         Alistair Conkie; Apple
         Andrew Finch; Apple
 
 SPE-P3.2: A UNIFIED SEQUENCE-TO-SEQUENCE FRONT-END MODEL FOR MANDARIN TEXT-TO-SPEECH SYNTHESIS
         Junjie Pan; ByteDance
         Xiang Yin; ByteDance
         Zhiling Zhang; Shanghai Jiao Tong University
         Shichao Liu; ByteDance
         Yang Zhang; ByteDance
         Zejun Ma; ByteDance
         Yuxuan Wang; ByteDance
 
 SPE-P3.3: A HYBRID TEXT NORMALIZATION SYSTEM USING MULTI-HEAD SELF-ATTENTION FOR MANDARIN
         Junhui Zhang; Bytedance
         Junjie Pan; Bytedance
         Xiang Yin; Bytedance
         Chen Li; Bytedance
         Shichao Liu; Bytedance
         Yang Zhang; Bytedance
         Yuxuan Wang; Bytedance
         Zejun Ma; Bytedance
 
 SPE-P3.4: GENERATING DIVERSE AND NATURAL TEXT-TO-SPEECH SAMPLES USING A QUANTIZED FINE-GRAINED VAE AND AUTOREGRESSIVE PROSODY PRIOR
         Guangzhi Sun; Cambridge University
         Yu Zhang; Google
         Ron Weiss; Google
         Yuan Cao; Google
         Heiga Zen; Google
         Andrew Rosenberg; Google
         Bhuvana Ramabhadran; Google
         Yonghui Wu; Google
 
 SPE-P3.5: IMPROVING PROSODY WITH LINGUISTIC AND BERT DERIVED FEATURES IN MULTI-SPEAKER BASED MANDARIN CHINESE NEURAL TTS
         Yujia Xiao; Microsoft China
         Lei He; Microsoft China
         Huaiping Ming; Microsoft China
         Frank K. Soong; Microsoft Research Asia
 
 SPE-P3.6: FOCUSING ON ATTENTION: PROSODY TRANSFER AND ADAPTATIVE OPTIMIZATION STRATEGY FOR MULTI-SPEAKER END-TO-END SPEECH SYNTHESIS
         Ruibo Fu; National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences
         Jianhua Tao; National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences
         Zhengqi Wen; National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences
         Jiangyan Yi; National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences
         Tao Wang; National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences
 
 SPE-P3.7: ALIGNTTS: EFFICIENT FEED-FORWARD TEXT-TO-SPEECH SYSTEM WITHOUT EXPLICIT ALIGNMENT
         Zhen Zeng; Ping An Technology (Shenzhen) Co., Ltd.
         Jianzong Wang; Ping An Technology (Shenzhen) Co., Ltd.
         Ning Cheng; Ping An Technology (Shenzhen) Co., Ltd.
         Tian Xia; Ping An Technology (Shenzhen) Co., Ltd.
         Jing Xiao; Ping An Technology (Shenzhen) Co., Ltd.
 
 SPE-P3.8: GRAPHTTS: GRAPH-TO-SEQUENCE MODELLING IN NEURAL TEXT-TO-SPEECH
         Aolan Sun; Ping An Technology (Shenzhen) Co., Ltd.
         Jianzong Wang; Ping An Technology (Shenzhen) Co., Ltd.
         Ning Cheng; Ping An Technology (Shenzhen) Co., Ltd.
         Huayi Peng; Ping An Technology (Shenzhen) Co., Ltd.
         Zhen Zeng; Ping An Technology (Shenzhen) Co., Ltd.
         Jing Xiao; Ping An Technology (Shenzhen) Co., Ltd.
 
 SPE-P3.9: EFFECT OF CHOICE OF PROBABILITY DISTRIBUTION, RANDOMNESS, AND SEARCH METHODS FOR ALIGNMENT MODELING IN SEQUENCE-TO-SEQUENCE TEXT-TO-SPEECH SYNTHESIS USING HARD ALIGNMENT
         Yusuke Yasuda; National Institute of Informatics
         Xin Wang; National Institute of Informatics
         Junichi Yamagishi; National Institute of Informatics
 
 SPE-P3.10: TRANSFORMER-BASED TEXT-TO-SPEECH WITH WEIGHTED FORCED ATTENTION
         Takuma Okamoto; National Institute of Information and Communications Technology (NICT)
         Tomoki Toda; Nagoya University
         Yoshinori Shiga; National Institute of Information and Communications Technology (NICT)
         Hisashi Kawai; National Institute of Information and Communications Technology (NICT)
 
 SPE-P3.11: IMPROVING END-TO-END SPEECH SYNTHESIS WITH LOCAL RECURRENT NEURAL NETWORK ENHANCED TRANSFORMER
         Yibin Zheng; Tencent
         Xin-Hui Li; Tencent
         Fenglong Xie; Tencent
         Li Lu; Tencent
 
SPE-P3.12: AN EFFECTIVE STYLE TOKEN WEIGHT CONTROL TECHNIQUE FOR END-TO-END EMOTIONAL SPEECH SYNTHESIS
         Ohsung Kwon; Naver Corporation
         Inseon Jang; Electronics and Telecommunications Research Institute (ETRI)
         ChungHyun Ahn; Electronics and Telecommunications Research Institute (ETRI)
         Hong-Goo Kang; Yonsei University