MLSP-P2: Applications in Speech and Audio |
| Session Type: Poster |
| Time: Tuesday, 5 May, 11:30 - 13:30 |
| Location: On-Demand |
| Virtual Session: View on Virtual Platform |
| Session Chair: Ritwik Giri, Amazon Web Services (AWS) |
| MLSP-P2.1: TOWARDS BLIND QUALITY ASSESSMENT OF CONCERT AUDIO RECORDINGS USING DEEP NEURAL NETWORKS |
| Nikonas Simou; University of Crete |
| Yannis Mastorakis; Foundation for Research and Technology-Hellas (FORTH) |
| Nikolaos Stefanakis; Foundation for Research and Technology-Hellas (FORTH) |
| MLSP-P2.3: MULTI-LABEL SOUND EVENT RETRIEVAL USING A DEEP LEARNING-BASED SIAMESE STRUCTURE WITH A PAIRWISE PRESENCE MATRIX |
| Jianyu Fan; Simon Fraser University |
| Eric Nichols; Microsoft |
| Daniel Tompkins; Microsoft |
| Ana Elisa Méndez Méndez; New York University |
| Benjamin Elizalde; Carnegie Mellon University |
| Philippe Pasquier; Simon Fraser University |
| MLSP-P2.4: SPEECH-DRIVEN FACIAL ANIMATION USING POLYNOMIAL FUSION OF FEATURES |
| Triantafyllos Kefalas; Imperial College London |
| Konstantinos Vougioukas; Imperial College London |
| Yannis Panagakis; Imperial College London |
| Stavros Petridis; Imperial College London and Samsung AI Centre Cambridge |
| Jean Kossaifi; Imperial College London and Samsung AI Centre Cambridge |
| Maja Pantic; Imperial College London and Samsung AI Centre Cambridge |
| MLSP-P2.5: SED-MDD: TOWARDS SENTENCE DEPENDENT END-TO-END MISPRONUNCIATION DETECTION AND DIAGNOSIS |
| Yiqing Feng; Harbin Institute of Technology |
| Guanyu Fu; Harbin Institute of Technology |
| Qingcai Chen; Harbin Institute of Technology |
| Kai Chen; Harbin Institute of Technology |
| MLSP-P2.6: GENERATIVE PRE-TRAINING FOR SPEECH WITH AUTOREGRESSIVE PREDICTIVE CODING |
| Yu-An Chung; Massachusetts Institute of Technology |
| James Glass; Massachusetts Institute of Technology |
| MLSP-P2.7: STARGAN FOR EMOTIONAL SPEECH CONVERSION: VALIDATED BY DATA AUGMENTATION OF END-TO-END EMOTION RECOGNITION |
| Georgios Rizos; Imperial College London |
| Alice Baird; University of Augsburg |
| Max Elliott; Imperial College London |
| Björn Schuller; Imperial College London |
| MLSP-P2.8: MULTIMODAL TRANSFORMER FUSION FOR CONTINUOUS EMOTION RECOGNITION |
| Jian Huang; Institute of Automation, Chinese Academy of Sciences |
| Jianhua Tao; Institute of Automation, Chinese Academy of Sciences |
| Bin Liu; Institute of Automation, Chinese Academy of Sciences |
| Zheng Lian; Institute of Automation, Chinese Academy of Sciences |
| Mingyue Niu; Institute of Automation, Chinese Academy of Sciences |
| MLSP-P2.9: HKA: A HIERARCHICAL KNOWLEDGE ATTENTION MECHANISM FOR MULTI-TURN DIALOGUE SYSTEM |
| Jian Song; Tsinghua University |
| Kailai Zhang; Tsinghua University |
| Xuesi Zhou; Tsinghua University |
| Ji Wu; Tsinghua University |
| MLSP-P2.10: SUBMODULAR RANK AGGREGATION ON SCORE-BASED PERMUTATIONS FOR DISTRIBUTED AUTOMATIC SPEECH RECOGNITION |
| Jun Qi; Georgia Institute of Technology |
| Chao-Han Huck Yang; Georgia Institute of Technology |
| Javier Tejedor; Universidad San Pablo-CEU, CEU Universities |
| MLSP-P2.11: BRIDGING MIXTURE DENSITY NETWORKS WITH META-LEARNING FOR AUTOMATIC SPEAKER IDENTIFICATION |
| Ruirui Li; University of California, Los Angeles |
| Jyun-Yu Jiang; University of California, Los Angeles |
| Xian Wu; University of Notre Dame |
| Hongda Mao; Amazon, Inc. |
| Chu-Cheng Hsieh; Amazon, Inc. |
| Wei Wang; University of California, Los Angeles |
| MLSP-P2.12: PITCH ESTIMATION VIA SELF-SUPERVISION |
| Beat Gfeller; Google |
| Christian Frank; Google |
| Dominik Roblek; Google |
| Matt Sharifi; Google |
| Marco Tagliasacchi; Google |
| Mihajlo Velimirovic; Google |