Tue PM1.P.6
W2N-AVSC: AUDIOVISUAL EXTENSION FOR WHISPER-TO-NORMAL SPEECH CONVERSION
Shogo Seki, NTT Corporation, Japan; Kanami Imamura, The University of Tokyo, Japan; Hirokazu Kameoka, Takuhiro Kaneko, Kou Tanaka, Noboru Harada, NTT Corporation, Japan
Session:
Tue PM1.P: Conversion and Transformation of Audio and Speech Poster
Track:
ASMSP - Acoustic, Speech and Music Signal Processing
Location:
Fennia Foyer
Presentation Time:
Tue, 5 Sep, 14:30 - 16:10 Finland Time (UTC +3)
Session Chair:
Tom Bäckstrom, Aalto University
Presentation
Discussion
Resources
No resources available.
Session Tue PM1.P
Tue PM1.P.1: Adapting pretrained models for adult to child voice conversion
Protima Nomo Sudro, Anton Ragni, Thomas Hain, University of Sheffield, United Kingdom; ,
Tue PM1.P.2: NEAR-END INTELLIGIBILITY IMPROVEMENT THROUGH VOICE TRANSFORMATION IN TRANSFER LEARNING FRAMEWORK
Ritujoy Biswas, Karan Nathwani, Indian Institute of Technology Jammu, India; Vinayak Abrol, Indraprastha Institute of Information Technology Delhi, India
Tue PM1.P.3: Facetron: A Multi-speaker Face-to-Speech Model based on Cross-modal Latent Representations
Seyun Um, Jihyun Kim, Jihyun Lee, Hong-Goo Kang, Yonsei university, Korea (South)
Tue PM1.P.4: Spotting Parodies: Detecting Alignment Collapse Between Lyrics and Singing Voice
Tomoki Ariga, Yosuke Higuchi, Waseda University, Japan; Mitsunori Kanno, Rie Shigyo, Takato Mizuguchi, Naoki Okamoto, DAIICHIKOSHO CO., LTD., Japan; Tetsuji Ogawa, Waseda University, Japan
Tue PM1.P.5: Deep Learning-based F0 Synthesis for Speaker Anonymization
Ünal Ege Gaznepoglu, Nils Peters, FAU Erlangen/Nürnberg, Germany
Tue PM1.P.6: W2N-AVSC: AUDIOVISUAL EXTENSION FOR WHISPER-TO-NORMAL SPEECH CONVERSION
Shogo Seki, NTT Corporation, Japan; Kanami Imamura, The University of Tokyo, Japan; Hirokazu Kameoka, Takuhiro Kaneko, Kou Tanaka, Noboru Harada, NTT Corporation, Japan
Tue PM1.P.7: AUDIO DATA AUGMENTATION FOR ACOUSTIC-TO-ARTICULATORY SPEECH INVERSION
Yashish M. Siriwardena, Ahmed Adel Attia, University of Maryland College Park, United States; Ganesh Sivaraman, Pindrop, United States; Carol Espy-Wilson, University of Maryland College Park, United States
Tue PM1.P.8: SPECTRAL WINDOWING FOR ENHANCED TEMPORAL NOISE SHAPING ANALYSIS IN TRANSFORM AUDIO CODECS
Richard Füg, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany