SPE-P19: Machine Learning for Speech Synthesis III |
| Session Type: Poster |
| Time: Friday, 8 May, 11:45 - 13:45 |
| Location: On-Demand |
| Virtual Session: View on Virtual Platform |
| Session Chairs: Yu Zhang, Google and Mounya Elhilali, Johns Hopkins University |
| SPE-P19.1: END-TO-END CODE-SWITCHING TTS WITH CROSS-LINGUAL LANGUAGE MODEL |
| Xuehao Zhou; National University of Singapore |
| Xiaohai Tian; National University of Singapore |
| Grandee Lee; National University of Singapore |
| Rohan Kumar Das; National University of Singapore |
| Haizhou Li; National University of Singapore |
| SPE-P19.2: CODE-SWITCHED SPEECH SYNTHESIS USING BILINGUAL PHONETIC POSTERIORGRAM WITH ONLY MONOLINGUAL CORPORA |
| Yuewen Cao; Chinese University of Hong Kong |
| Songxiang Liu; Chinese University of Hong Kong |
| Xixin Wu; Chinese University of Hong Kong |
| Shiyin Kang; Tencent |
| Peng Liu; Tencent |
| Zhiyong Wu; Tsinghua University |
| Xunying Xliu; Chinese University of Hong Kong |
| Dan Su; Tencent |
| Dong Yu; Tencent |
| Helen Meng; Chinese University of Hong Kong |
| SPE-P19.3: GENERATING MULTILINGUAL VOICES USING SPEAKER SPACE TRANSLATION BASED ON BILINGUAL SPEAKER DATA |
| Soumi Maiti; City University of New York |
| Erik Marchi; Apple |
| Alistair Conkie; Apple |
| SPE-P19.4: SPEAKER ADAPTATION OF A MULTILINGUAL ACOUSTIC MODEL FOR CROSS-LANGUAGE SYNTHESIS |
| Ivan Himawan; ObEN |
| Sandesh Aryal; ObEN |
| Iris Ouyang; ObEN |
| Sam Kang; ObEN |
| Pierre Lanchantin; ObEN |
| Simon King; University of Edinburgh |
| SPE-P19.5: SEMI-SUPERVISED SPEAKER ADAPTATION FOR END-TO-END SPEECH SYNTHESIS WITH PRETRAINED MODELS |
| Katsuki Inoue; Okayama university |
| Sunao Hara; Okayama university |
| Masanobu Abe; Okayama university |
| Tomoki Hayashi; Nagoya university |
| Ryuichi Yamamoto; LINE Corporation |
| Shinji Watanabe; Johns Hopkins university |
| SPE-P19.6: BOFFIN TTS: FEW-SHOT SPEAKER ADAPTATION BY BAYESIAN OPTIMIZATION |
| Henry Moss; Lancaster University |
| Vatsal Aggarwal; Amazon, Inc. |
| Nishant Prateek; Amazon, Inc. |
| Javier Gonzalez; Amazon, Inc. |
| Roberto Barra-Chicote; Amazon, Inc. |
| SPE-P19.7: SEMI-SUPERVISED LEARNING BASED ON HIERARCHICAL GENERATIVE MODELS FOR END-TO-END SPEECH SYNTHESIS |
| Takato Fujimoto; Nagoya Institute of Technology |
| Shinji Takaki; Nagoya Institute of Technology |
| Kei Hashimoto; Nagoya Institute of Technology |
| Keiichiro Oura; Nagoya Institute of Technology |
| Yoshihiko Nankaku; Nagoya Institute of Technology |
| Keiichi Tokuda; Nagoya Institute of Technology |
| SPE-P19.8: BREATHING AND SPEECH PLANNING IN SPONTANEOUS SPEECH SYNTHESIS |
| Éva Székely; KTH Royal Institute of Technology |
| Gustav Eje Henter; KTH Royal Institute of Technology |
| Jonas Beskow; KTH Royal Institute of Technology |
| Joakim Gustafson; KTH Royal Institute of Technology |
| SPE-P19.9: ESPNET-TTS: UNIFIED, REPRODUCIBLE, AND INTEGRATABLE OPEN SOURCE END-TO-END TEXT-TO-SPEECH TOOLKIT |
| Tomoki Hayashi; Nagoya University |
| Ryuichi Yamamoto; LINE Corporation |
| Katsuki Inoue; Okayama University |
| Takenori Yoshimura; Nagoya University |
| Shinji Watanabe; Johns Hopkins University |
| Tomoki Toda; Nagoya University |
| Kazuya Takeda; Nagoya University |
| Yu Zhang; Google AI |
| Xu Tan; Microsoft Research Asia |
| SPE-P19.10: EXTRACTING UNIT EMBEDDINGS USING SEQUENCE-TO-SEQUENCE ACOUSTIC MODELS FOR UNIT SELECTION SPEECH SYNTHESIS |
| Xiao Zhou; University of Science and Technology of China |
| Zhen-Hua Ling; University of Science and Technology of China |
| Li-Rong Dai; University of Science and Technology of China |
| SPE-P19.11: AUDIO-ASSISTED IMAGE INPAINTING FOR TALKING FACES |
| Alexandros Koumparoulis; University of Thessaly |
| Gerasimos Potamianos; University of Thessaly |
| Samuel Thomas; IBM |
| Edmilson da Silva Morais; IBM |