TU2.I: Machine Learning for Speech Synthesis I |
| Session Type: Poster |
| Time: Tuesday, 5 May, 16:30 - 18:30 |
| Location: On-Demand |
| Session Chairs: Jianhua Tao, Chinese Academy of Sciences and Thomas Drugman, Amazon
|
| |
| TU2.I.1: SCALABLE MULTILINGUAL FRONTEND FOR TTS |
| Alistair Conkie; Apple |
| Andrew Finch; Apple |
| |
| TU2.I.2: A UNIFIED SEQUENCE-TO-SEQUENCE FRONT-END MODEL FOR MANDARIN TEXT-TO-SPEECH SYNTHESIS |
| Junjie Pan; ByteDance |
| Xiang Yin; ByteDance |
| Zhiling Zhang; Shanghai Jiao Tong University |
| Shichao Liu; ByteDance |
| Yang Zhang; ByteDance |
| Zejun Ma; ByteDance |
| Yuxuan Wang; ByteDance |
| |
| TU2.I.3: A HYBRID TEXT NORMALIZATION SYSTEM USING MULTI-HEAD SELF-ATTENTION FOR MANDARIN |
| Junhui Zhang; Bytedance |
| Junjie Pan; Bytedance |
| Xiang Yin; Bytedance |
| Chen Li; Bytedance |
| Shichao Liu; Bytedance |
| Yang Zhang; Bytedance |
| Yuxuan Wang; Bytedance |
| Zejun Ma; Bytedance |
| |
| TU2.I.4: GENERATING DIVERSE AND NATURAL TEXT-TO-SPEECH SAMPLES USING A QUANTIZED FINE-GRAINED VAE AND AUTOREGRESSIVE PROSODY PRIOR |
| Guangzhi Sun; Cambridge University |
| Yu Zhang; Google |
| Ron Weiss; Google |
| Yuan Cao; Google |
| Heiga Zen; Google |
| Andrew Rosenberg; Google |
| Bhuvana Ramabhadran; Google |
| Yonghui Wu; Google |
| |
| TU2.I.5: IMPROVING PROSODY WITH LINGUISTIC AND BERT DERIVED FEATURES IN MULTI-SPEAKER BASED MANDARIN CHINESE NEURAL TTS |
| Yujia Xiao; Microsoft China |
| Lei He; Microsoft China |
| Huaiping Ming; Microsoft China |
| Frank K. Soong; Microsoft Research Asia |
| |
| TU2.I.6: FOCUSING ON ATTENTION: PROSODY TRANSFER AND ADAPTATIVE OPTIMIZATION STRATEGY FOR MULTI-SPEAKER END-TO-END SPEECH SYNTHESIS |
| Ruibo Fu; National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences |
| Jianhua Tao; National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences |
| Zhengqi Wen; National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences |
| Jiangyan Yi; National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences |
| Tao Wang; National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences |
| |
| TU2.I.7: ALIGNTTS: EFFICIENT FEED-FORWARD TEXT-TO-SPEECH SYSTEM WITHOUT EXPLICIT ALIGNMENT |
| Zhen Zeng; Ping An Technology (Shenzhen) Co., Ltd. |
| Jianzong Wang; Ping An Technology (Shenzhen) Co., Ltd. |
| Ning Cheng; Ping An Technology (Shenzhen) Co., Ltd. |
| Tian Xia; Ping An Technology (Shenzhen) Co., Ltd. |
| Jing Xiao; Ping An Technology (Shenzhen) Co., Ltd. |
| |
| TU2.I.8: GRAPHTTS: GRAPH-TO-SEQUENCE MODELLING IN NEURAL TEXT-TO-SPEECH |
| Aolan Sun; Ping An Technology (Shenzhen) Co., Ltd. |
| Jianzong Wang; Ping An Technology (Shenzhen) Co., Ltd. |
| Ning Cheng; Ping An Technology (Shenzhen) Co., Ltd. |
| Huayi Peng; Ping An Technology (Shenzhen) Co., Ltd. |
| Zhen Zeng; Ping An Technology (Shenzhen) Co., Ltd. |
| Jing Xiao; Ping An Technology (Shenzhen) Co., Ltd. |
| |
| TU2.I.9: EFFECT OF CHOICE OF PROBABILITY DISTRIBUTION, RANDOMNESS, AND SEARCH METHODS FOR ALIGNMENT MODELING IN SEQUENCE-TO-SEQUENCE TEXT-TO-SPEECH SYNTHESIS USING HARD ALIGNMENT |
| Yusuke Yasuda; National Institute of Informatics |
| Xin Wang; National Institute of Informatics |
| Junichi Yamagishi; National Institute of Informatics |
| |
| TU2.I.10: TRANSFORMER-BASED TEXT-TO-SPEECH WITH WEIGHTED FORCED ATTENTION |
| Takuma Okamoto; National Institute of Information and Communications Technology (NICT) |
| Tomoki Toda; Nagoya University |
| Yoshinori Shiga; National Institute of Information and Communications Technology (NICT) |
| Hisashi Kawai; National Institute of Information and Communications Technology (NICT) |
| |
| TU2.I.11: IMPROVING END-TO-END SPEECH SYNTHESIS WITH LOCAL RECURRENT NEURAL NETWORK ENHANCED TRANSFORMER |
| Yibin Zheng; Tencent |
| Xin-Hui Li; Tencent |
| Fenglong Xie; Tencent |
| Li Lu; Tencent |
| |
| TU2.I.12: AN EFFECTIVE STYLE TOKEN WEIGHT CONTROL TECHNIQUE FOR END-TO-END EMOTIONAL SPEECH SYNTHESIS |
| Ohsung Kwon; Naver Corporation |
| Inseon Jang; Electronics and Telecommunications Research Institute (ETRI) |
| ChungHyun Ahn; Electronics and Telecommunications Research Institute (ETRI) |
| Hong-Goo Kang; Yonsei University |
| |