MLSP-P2: Applications in Speech and Audio |
Session Type: Poster |
Time: Tuesday, 5 May, 11:30 - 13:30 |
Location: On-Demand |
Virtual Session: View on Virtual Platform |
Session Chair: Ritwik Giri, Amazon Web Services (AWS)
|
|
MLSP-P2.1: TOWARDS BLIND QUALITY ASSESSMENT OF CONCERT AUDIO RECORDINGS USING DEEP NEURAL NETWORKS |
Nikonas Simou; University of Crete |
Yannis Mastorakis; Foundation for Research and Technology-Hellas (FORTH) |
Nikolaos Stefanakis; Foundation for Research and Technology-Hellas (FORTH) |
|
MLSP-P2.3: MULTI-LABEL SOUND EVENT RETRIEVAL USING A DEEP LEARNING-BASED SIAMESE STRUCTURE WITH A PAIRWISE PRESENCE MATRIX |
Jianyu Fan; Simon Fraser University |
Eric Nichols; Microsoft |
Daniel Tompkins; Microsoft |
Ana Elisa Méndez Méndez; New York University |
Benjamin Elizalde; Carnegie Mellon University |
Philippe Pasquier; Simon Fraser University |
|
MLSP-P2.4: SPEECH-DRIVEN FACIAL ANIMATION USING POLYNOMIAL FUSION OF FEATURES |
Triantafyllos Kefalas; Imperial College London |
Konstantinos Vougioukas; Imperial College London |
Yannis Panagakis; Imperial College London |
Stavros Petridis; Imperial College London and Samsung AI Centre Cambridge |
Jean Kossaifi; Imperial College London and Samsung AI Centre Cambridge |
Maja Pantic; Imperial College London and Samsung AI Centre Cambridge |
|
MLSP-P2.5: SED-MDD: TOWARDS SENTENCE DEPENDENT END-TO-END MISPRONUNCIATION DETECTION AND DIAGNOSIS |
Yiqing Feng; Harbin Institute of Technology |
Guanyu Fu; Harbin Institute of Technology |
Qingcai Chen; Harbin Institute of Technology |
Kai Chen; Harbin Institute of Technology |
|
MLSP-P2.6: GENERATIVE PRE-TRAINING FOR SPEECH WITH AUTOREGRESSIVE PREDICTIVE CODING |
Yu-An Chung; Massachusetts Institute of Technology |
James Glass; Massachusetts Institute of Technology |
|
MLSP-P2.7: STARGAN FOR EMOTIONAL SPEECH CONVERSION: VALIDATED BY DATA AUGMENTATION OF END-TO-END EMOTION RECOGNITION |
Georgios Rizos; Imperial College London |
Alice Baird; University of Augsburg |
Max Elliott; Imperial College London |
Björn Schuller; Imperial College London |
|
MLSP-P2.8: MULTIMODAL TRANSFORMER FUSION FOR CONTINUOUS EMOTION RECOGNITION |
Jian Huang; Institute of Automation, Chinese Academy of Sciences |
Jianhua Tao; Institute of Automation, Chinese Academy of Sciences |
Bin Liu; Institute of Automation, Chinese Academy of Sciences |
Zheng Lian; Institute of Automation, Chinese Academy of Sciences |
Mingyue Niu; Institute of Automation, Chinese Academy of Sciences |
|
MLSP-P2.9: HKA: A HIERARCHICAL KNOWLEDGE ATTENTION MECHANISM FOR MULTI-TURN DIALOGUE SYSTEM |
Jian Song; Tsinghua University |
Kailai Zhang; Tsinghua University |
Xuesi Zhou; Tsinghua University |
Ji Wu; Tsinghua University |
|
MLSP-P2.10: SUBMODULAR RANK AGGREGATION ON SCORE-BASED PERMUTATIONS FOR DISTRIBUTED AUTOMATIC SPEECH RECOGNITION |
Jun Qi; Georgia Institute of Technology |
Chao-Han Huck Yang; Georgia Institute of Technology |
Javier Tejedor; Universidad San Pablo-CEU, CEU Universities |
|
MLSP-P2.11: BRIDGING MIXTURE DENSITY NETWORKS WITH META-LEARNING FOR AUTOMATIC SPEAKER IDENTIFICATION |
Ruirui Li; University of California, Los Angeles |
Jyun-Yu Jiang; University of California, Los Angeles |
Xian Wu; University of Notre Dame |
Hongda Mao; Amazon, Inc. |
Chu-Cheng Hsieh; Amazon, Inc. |
Wei Wang; University of California, Los Angeles |
|
MLSP-P2.12: PITCH ESTIMATION VIA SELF-SUPERVISION |
Beat Gfeller; Google |
Christian Frank; Google |
Dominik Roblek; Google |
Matt Sharifi; Google |
Marco Tagliasacchi; Google |
Mihajlo Velimirovic; Google |
|