SPE-P3: Machine Learning for Speech Synthesis I |
Session Type: Poster |
Time: Tuesday, 5 May, 16:30 - 18:30 |
Location: On-Demand |
Virtual Session: View on Virtual Platform |
Session Chairs: Jianhua Tao, Chinese Academy of Sciences and Thomas Drugman, Amazon
|
|
SPE-P3.1: SCALABLE MULTILINGUAL FRONTEND FOR TTS |
Alistair Conkie; Apple |
Andrew Finch; Apple |
|
SPE-P3.2: A UNIFIED SEQUENCE-TO-SEQUENCE FRONT-END MODEL FOR MANDARIN TEXT-TO-SPEECH SYNTHESIS |
Junjie Pan; ByteDance |
Xiang Yin; ByteDance |
Zhiling Zhang; Shanghai Jiao Tong University |
Shichao Liu; ByteDance |
Yang Zhang; ByteDance |
Zejun Ma; ByteDance |
Yuxuan Wang; ByteDance |
|
SPE-P3.3: A HYBRID TEXT NORMALIZATION SYSTEM USING MULTI-HEAD SELF-ATTENTION FOR MANDARIN |
Junhui Zhang; Bytedance |
Junjie Pan; Bytedance |
Xiang Yin; Bytedance |
Chen Li; Bytedance |
Shichao Liu; Bytedance |
Yang Zhang; Bytedance |
Yuxuan Wang; Bytedance |
Zejun Ma; Bytedance |
|
SPE-P3.4: GENERATING DIVERSE AND NATURAL TEXT-TO-SPEECH SAMPLES USING A QUANTIZED FINE-GRAINED VAE AND AUTOREGRESSIVE PROSODY PRIOR |
Guangzhi Sun; Cambridge University |
Yu Zhang; Google |
Ron Weiss; Google |
Yuan Cao; Google |
Heiga Zen; Google |
Andrew Rosenberg; Google |
Bhuvana Ramabhadran; Google |
Yonghui Wu; Google |
|
SPE-P3.5: IMPROVING PROSODY WITH LINGUISTIC AND BERT DERIVED FEATURES IN MULTI-SPEAKER BASED MANDARIN CHINESE NEURAL TTS |
Yujia Xiao; Microsoft China |
Lei He; Microsoft China |
Huaiping Ming; Microsoft China |
Frank K. Soong; Microsoft Research Asia |
|
SPE-P3.6: FOCUSING ON ATTENTION: PROSODY TRANSFER AND ADAPTATIVE OPTIMIZATION STRATEGY FOR MULTI-SPEAKER END-TO-END SPEECH SYNTHESIS |
Ruibo Fu; National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences |
Jianhua Tao; National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences |
Zhengqi Wen; National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences |
Jiangyan Yi; National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences |
Tao Wang; National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences |
|
SPE-P3.7: ALIGNTTS: EFFICIENT FEED-FORWARD TEXT-TO-SPEECH SYSTEM WITHOUT EXPLICIT ALIGNMENT |
Zhen Zeng; Ping An Technology (Shenzhen) Co., Ltd. |
Jianzong Wang; Ping An Technology (Shenzhen) Co., Ltd. |
Ning Cheng; Ping An Technology (Shenzhen) Co., Ltd. |
Tian Xia; Ping An Technology (Shenzhen) Co., Ltd. |
Jing Xiao; Ping An Technology (Shenzhen) Co., Ltd. |
|
SPE-P3.8: GRAPHTTS: GRAPH-TO-SEQUENCE MODELLING IN NEURAL TEXT-TO-SPEECH |
Aolan Sun; Ping An Technology (Shenzhen) Co., Ltd. |
Jianzong Wang; Ping An Technology (Shenzhen) Co., Ltd. |
Ning Cheng; Ping An Technology (Shenzhen) Co., Ltd. |
Huayi Peng; Ping An Technology (Shenzhen) Co., Ltd. |
Zhen Zeng; Ping An Technology (Shenzhen) Co., Ltd. |
Jing Xiao; Ping An Technology (Shenzhen) Co., Ltd. |
|
SPE-P3.9: EFFECT OF CHOICE OF PROBABILITY DISTRIBUTION, RANDOMNESS, AND SEARCH METHODS FOR ALIGNMENT MODELING IN SEQUENCE-TO-SEQUENCE TEXT-TO-SPEECH SYNTHESIS USING HARD ALIGNMENT |
Yusuke Yasuda; National Institute of Informatics |
Xin Wang; National Institute of Informatics |
Junichi Yamagishi; National Institute of Informatics |
|
SPE-P3.10: TRANSFORMER-BASED TEXT-TO-SPEECH WITH WEIGHTED FORCED ATTENTION |
Takuma Okamoto; National Institute of Information and Communications Technology (NICT) |
Tomoki Toda; Nagoya University |
Yoshinori Shiga; National Institute of Information and Communications Technology (NICT) |
Hisashi Kawai; National Institute of Information and Communications Technology (NICT) |
|
SPE-P3.11: IMPROVING END-TO-END SPEECH SYNTHESIS WITH LOCAL RECURRENT NEURAL NETWORK ENHANCED TRANSFORMER |
Yibin Zheng; Tencent |
Xin-Hui Li; Tencent |
Fenglong Xie; Tencent |
Li Lu; Tencent |
|
SPE-P3.12: AN EFFECTIVE STYLE TOKEN WEIGHT CONTROL TECHNIQUE FOR END-TO-END EMOTIONAL SPEECH SYNTHESIS |
Ohsung Kwon; Naver Corporation |
Inseon Jang; Electronics and Telecommunications Research Institute (ETRI) |
ChungHyun Ahn; Electronics and Telecommunications Research Institute (ETRI) |
Hong-Goo Kang; Yonsei University |
|