SPE-P12: Machine Learning for Speech Synthesis II |
Session Type: Poster |
Time: Thursday, 7 May, 11:30 - 13:30 |
Location: On-Demand |
Virtual Session: View on Virtual Platform |
Session Chairs: Tomoki Toda, Nagoya University and Zhiyong Wu, Tsinghua University
|
|
SPE-P12.1: EFFICIENT SHALLOW WAVENET VOCODER USING MULTIPLE SAMPLES OUTPUT BASED ON LAPLACIAN DISTRIBUTION AND LINEAR PREDICTION |
Patrick Lumban Tobing; Nagoya University |
Yi-Chiao Wu; Nagoya University |
Tomoki Hayashi; Nagoya University |
Kazuhiro Kobayashi; Nagoya University |
Tomoki Toda; Nagoya University |
|
SPE-P12.2: FLOW-TTS: A NON-AUTOREGRESSIVE NETWORK FOR TEXT TO SPEECH BASED ON FLOW |
Chenfeng Miao; Ping An Technology (Shenzhen) Co., Ltd. |
Shuang Liang; Ping An Technology (Shenzhen) Co., Ltd. |
Minchuan Chen; Ping An Technology (Shenzhen) Co., Ltd. |
Jun Ma; Ping An Technology (Shenzhen) Co., Ltd. |
Shaojun Wang; Ping An Technology (Shenzhen) Co., Ltd. |
Jing Xiao; Ping An Technology (Shenzhen) Co., Ltd. |
|
SPE-P12.3: WAVEFFJORD: FFJORD-BASED VOCODER FOR STATISTICAL PARAMETRIC SPEECH SYNTHESIS |
Ning-Qian Wu; University of Science and Technology of China |
Zhen-Hua Ling; University of Science and Technology of China |
|
SPE-P12.4: IMPROVING LPCNET-BASED TEXT-TO-SPEECH WITH LINEAR PREDICTION-STRUCTURED MIXTURE DENSITY NETWORK |
Min-Jae Hwang; Yonsei university |
Eunwoo Song; Naver corporation |
Ryuichi Yamamoto; LINE Corporation |
Frank K. Soong; Microsoft Research Asia |
Hong-Goo Kang; Yonsei university |
|
SPE-P12.5: DISENTANGLING TIMBRE AND SINGING STYLE WITH MULTI-SINGER SINGING SYNTHESIS SYSTEM |
Juheon Lee; Seoul National University |
Hyeong-Seok Choi; Seoul National University |
Junghyun Koo; Seoul National University |
Kyogu Lee; Seoul National University |
|
SPE-P12.6: SEQUENCE-TO-SEQUENCE SINGING SYNTHESIS USING THE FEED-FORWARD TRANSFORMER |
Merlijn Blaauw; Universitat Pompeu Fabra |
Jordi Bonada; Universitat Pompeu Fabra |
|
SPE-P12.7: KOREAN SINGING VOICE SYNTHESIS BASED ON AUTO-REGRESSIVE BOUNDARY EQUILIBRIUM GAN |
Soonbeom Choi; Korea Advanced Institute of Science and Technology (KAIST) |
Wonil Kim; Korea Advanced Institute of Science and Technology (KAIST) |
Saebyul Park; Korea Advanced Institute of Science and Technology (KAIST) |
Sangeon Yong; Korea Advanced Institute of Science and Technology (KAIST) |
Juhan Nam; Korea Advanced Institute of Science and Technology (KAIST) |
|
SPE-P12.8: FAST AND HIGH-QUALITY SINGING VOICE SYNTHESIS SYSTEM BASED ON CONVOLUTIONAL NEURAL NETWORKS |
Kazuhiro Nakamura; Techno-Speech |
Shinji Takaki; Techno-Speech |
Kei Hashimoto; Techno-Speech |
Keiichiro Oura; Techno-Speech |
Yoshihiko Nankaku; Nagoya Institute of Technology |
Keiichi Tokuda; Techno-Speech |
|
SPE-P12.9: HYBRID NEURAL-PARAMETRIC F0 MODEL FOR SINGING SYNTHESIS |
Jordi Bonada; Universitat Pompeu Fabra |
Merlijn Blaauw; Universitat Pompeu Fabra |
|
SPE-P12.10: UTTERANCE-LEVEL SEQUENTIAL MODELING FOR DEEP GAUSSIAN PROCESS BASED SPEECH SYNTHESIS USING SIMPLE RECURRENT UNIT |
Tomoki Koriyama; University of Tokyo |
Hiroshi Saruwatari; University of Tokyo |
|
SPE-P12.11: EMOTIONAL SPEECH SYNTHESIS WITH RICH AND GRANULARIZED CONTROL |
Se-Yun Um; Yonsei University |
Sangshin Oh; Yonsei University |
Kyungguen Byun; Yonsei University |
Inseon Jang; Electronics and Telecommunications Research Institute (ETRI) |
Chunghyun Ahn; Electronics and Telecommunications Research Institute (ETRI) |
Hong-Goo Kang; Yonsei University |
|
SPE-P12.12: TOWARDS UNSUPERVISED SPEECH RECOGNITION AND SYNTHESIS WITH QUANTIZED SPEECH REPRESENTATION LEARNING |
Alexander H. Liu; National Taiwan University |
Tao Tu; National Taiwan University |
Hung-yi Lee; National Taiwan University |
Lin-shan Lee; National Taiwan University |
|