SPE-P5: Deep Speaker Recognition Models |
| Session Type: Poster |
| Time: Wednesday, 6 May, 09:00 - 11:00 |
| Location: On-Demand |
| Virtual Session: View on Virtual Platform |
| Session Chair: Kong-Aik Lee, NEC Corporation |
| SPE-P5.1: FREQUENCY AND TEMPORAL CONVOLUTIONAL ATTENTION FOR TEXT-INDEPENDENT SPEAKER RECOGNITION |
| Sarthak Yadav; Staqu Technologies |
| Atul Rai; Staqu Technologies |
| SPE-P5.2: FRAME-LEVEL PHONEME-INVARIANT SPEAKER EMBEDDING FOR TEXT-INDEPENDENT SPEAKER RECOGNITION ON EXTREMELY SHORT UTTERANCES |
| Naohiro Tawara; NTT Communication Science Laboratories |
| Atsunori Ogawa; NTT Communication Science Laboratories |
| Tomoharu Iwata; NTT Communication Science Laboratories |
| Marc Delcroix; NTT Communication Science Laboratories |
| Tetsuji Ogawa; Waseda University |
| SPE-P5.3: PROTOTYPICAL NETWORKS FOR SMALL FOOTPRINT TEXT-INDEPENDENT SPEAKER VERIFICATION |
| Tom Ko; South University of Science and Technology |
| Yangbin Chen; City University of Hong Kong |
| Qing Li; Hong Kong Polytechnic University |
| SPE-P5.4: TDMF: TASK-DRIVEN MULTILEVEL FRAMEWORK FOR END-TO-END SPEAKER VERIFICATION |
| Chen Chen; Harbin Institute of Technology |
| Jiqing Han; Harbin Institute of Technology |
| SPE-P5.5: AN IMPROVED DEEP NEURAL NETWORK FOR MODELING SPEAKER CHARACTERISTICS AT DIFFERENT TEMPORAL SCALES |
| Bin Gu; University of Science and Technology of China |
| Wu Guo; University of Science and Technology of China |
| Li-Rong Dai; University of Science and Technology of China |
| Jun Du; University of Science and Technology of China |
| SPE-P5.6: PARTIAL AUC OPTIMIZATION BASED DEEP SPEAKER EMBEDDINGS WITH CLASS-CENTER LEARNING FOR TEXT-INDEPENDENT SPEAKER VERIFICATION |
| Zhongxin Bai; Northwestern Polytechnical University |
| Xiao-Lei Zhang; Northwestern Polytechnical University |
| Jingdong Chen; Northwestern Polytechnical University |
| SPE-P5.7: KNOWLEDGE DISTILLATION AND RANDOM ERASING DATA AUGMENTATION FOR TEXT-DEPENDENT SPEAKER VERIFICATION |
| Victoria Mingote; University of Zaragoza |
| Antonio Miguel; University of Zaragoza |
| Dayana Ribas; University of Zaragoza |
| Alfonso Ortega; University of Zaragoza |
| Eduardo Lleida; University of Zaragoza |
| SPE-P5.8: DISENTANGLED SPEECH EMBEDDINGS USING CROSS-MODAL SELF-SUPERVISION |
| Arsha Nagrani; Oxford University |
| Joon Son Chung; Oxford University |
| Samuel Albanie; Oxford University |
| Andrew Zisserman; Oxford University |
| SPE-P5.9: IMPROVING DEEP CNN NETWORKS WITH LONG TEMPORAL CONTEXT FOR TEXT-INDEPENDENT SPEAKER VERIFICATION |
| Yong Zhao; Microsoft Corporation |
| Tianyan Zhou; Microsoft Corporation |
| Zhuo Chen; Microsoft Corporation |
| Jian Wu; Microsoft Corporation |
| SPE-P5.10: MULTI-LEVEL DEEP NEURAL NETWORK ADAPTATION FOR SPEAKER VERIFICATION USING MMD AND CONSISTENCY REGULARIZATION |
| Weiwei Lin; Hong Kong Polytechnic University |
| Man-Mai Mak; Hong Kong Polytechnic University |
| Na Li; Tencent AI Lab |
| Dan Su; Tencent AI Lab |
| Dong Yu; Tencent AI Lab |
| SPE-P5.11: MULTI-TASK LEARNING FOR SPEAKER VERIFICATION AND VOICE TRIGGER DETECTION |
| Siddharth Sigtia; Apple |
| Erik Marchi; Apple |
| Sachin Kajarekar; Apple |
| Devang Naik; Apple |
| John Bridle; Apple |
| SPE-P5.12: STATISTICS POOLING TIME DELAY NEURAL NETWORK BASED ON X-VECTOR FOR SPEAKER VERIFICATION |
| Qian-Bei Hong; National Cheng Kung University and Academia Sinica |
| Chung-Hsien Wu; National Cheng Kung University and Academia Sinica |
| Hsin-Min Wang; National Cheng Kung University and Academia Sinica |
| Chien-Lin Huang; Ping An Technology (Shenzhen) Co., Ltd. |