Following is the list of accepted ASRU 2019 papers, sorted by paper title. You can use the search feature of your web browser to find your paper number. Notifications to all authors have also been sent by email. If you have not received your notification of the results by email, please contact us at asru2019@cmsworkshops.com.
1209 | A COMPARATIVE STUDY ON END-TO-END SPEECH TO TEXT TRANSLATION |
1193 | A COMPARATIVE STUDY ON TRANSFORMER VS RNN IN SPEECH APPLICATIONS |
1294 | A COMPARISON OF END-TO-END MODELS FOR LONG-FORM SPEECH RECOGNITION |
1013 | A COMPARISON OF TRANSFORMER AND LSTM ENCODER DECODER MODELS FOR ASR |
1069 | A CROSS-CORPUS STUDY ON SPEECH EMOTION RECOGNITION |
1185 | A DENSITY RATIO APPROACH TO LANGUAGE MODEL FUSION IN END-TO-END AUTOMATIC SPEECH RECOGNITION |
1017 | A DROPOUT-BASED SINGLE MODEL COMMITTEE APPROACH FOR ACTIVE LEARNING IN ASR |
1094 | A MODULARIZED NEURAL NETWORK WITH LANGUAGE-SPECIFIC OUTPUT LAYERS FOR CROSS-LINGUAL VOICE CONVERSION |
1273 | A Multi Purpose and Large Scale Speech Corpus in Persian and English for Speaker and Speech Recognition: the DeepMine Database |
1090 | A UNIFIED ENDPOINTER USING MULTITASK AND MULTIDOMAIN TRAINING |
1298 | ACOUSTIC MODEL ADAPTATION FROM RAW WAVEFORMS WITH SINCNET |
1352 | ADAPTING PRETRAINED TRANSFORMER TO LATTICES FOR SPOKEN LANGUAGE UNDERSTANDING |
1133 | ADDITIONAL SHARED DECODER ON SIAMESE MULTI-VIEW ENCODERS FOR LEARNING ACOUSTIC WORD EMBEDDINGS |
1389 | Advances in Online Audio-Visual Meeting Transcription |
1095 | ADVERSARIAL ATTACKS ON SPOOFING COUNTERMEASURES OF AUTOMATIC SPEAKER VERIFICATION |
1053 | AN INVESTIGATION INTO THE EFFECTIVENESS OF ENHANCEMENT IN ASR TRAINING AND TEST FOR CHIME-5 DINNER PARTY TRANSCRIPTION |
1270 | AN INVESTIGATION OF LSTM-CTC BASED JOINT ACOUSTIC MODEL FOR INDIAN LANGUAGE IDENTIFICATION |
1318 | Analyzing Large Receptive Field Convolutional Networks for Distant Speech Recognition |
1370 | ATTENTION BASED ON-DEVICE STREAMING SPEECH RECOGNITION WITH LARGE SPEECH CORPUS |
1201 | ATTENTION-BASED SPEECH RECOGNITION USING GAZE INFORMATION |
1258 | BAYESIAN ADVERSARIAL LEARNING FOR SPEAKER RECOGNITION |
1242 | BOOTSTRAPPING NON-PARALLEL VOICE CONVERSION FROM SPEAKER-ADAPTIVE TEXT-TO-SPEECH |
1368 | Character-Aware Attention-Based End-to-End Speech Recognition |
1319 | CNN WITH PHONETIC ATTENTION FOR TEXT-INDEPENDENT SPEAKER VERIFICATION |
1228 | CONTROLLING EMOTION STRENGTH WITH RELATIVE ATTRIBUTE FOR END-TO-END SPEECH SYNTHESIS |
1225 | DATA AUGMENTATION BASED ON VOWEL STRETCH FOR IMPROVING CHILDREN'S SPEECH RECOGNITION |
1098 | Detecting Deception in Political Debates Using Acoustic and Textual Features |
1143 | DEVELOPMENT OF VOICE SPOOFING DETECTION SYSTEMS FOR 2019 EDITION OF AUTOMATIC SPEAKER VERIFICATION AND COUNTERMEASURES CHALLENGE |
1271 | DIALOGUE ENVIRONMENTS ARE DIFFERENT FROM GAMES: INVESTIGATING VARIANTS OF DEEP Q-NETWORKS FOR DIALOGUE POLICY |
1374 | Domain Adaptation via Teacher-Student Learning for End-To-End Speech Recognition |
1107 | DOMAIN EXPANSION IN DNN-BASED ACOUSTIC MODELS FOR ROBUST SPEECH RECOGNITION |
1348 | DOVER: A METHOD FOR COMBINING DIARIZATION OUTPUTS |
1257 | EFFICIENT FREE KEYWORD DETECTION BASED ON CNN AND END-TO-END CONTINUOUS DP-MATCHING |
1325 | EFFICIENT SEMI-SUPERVISED LEARNING FOR NATURAL LANGUAGE UNDERSTANDING BY OPTIMIZING DIVERSITY |
1215 | EMBEDDINGS FOR DNN SPEAKER ADAPTIVE TRAINING |
1191 | EMOCEPTION: AN INCEPTION INSPIRED EFFICIENT SPEECH EMOTION RECOGNITION NETWORK |
1376 | END-TO-END CODE-SWITCHING ASR FOR LOW-RESOURCED LANGUAGE PAIRS |
1054 | END-TO-END NEURAL SPEAKER DIARIZATION WITH SELF-ATTENTION |
1205 | END-TO-END OVERLAPPED SPEECH DETECTION AND SPEAKER COUNTING WITH RAW WAVEFORM |
1409 | End-to-end Training of a Large Vocabulary End-to-end Speech Recognition System |
1055 | ENHANCED BERT-BASED RANKING MODELS FOR SPOKEN DOCUMENT RETRIEVAL |
1154 | ESPRESSO: A FAST END-TO-END NEURAL SPEECH RECOGNITION TOOLKIT |
1312 | EXPLICIT ALIGNMENT OF TEXT AND SPEECH ENCODINGS FOR ATTENTION-BASED END-TO-END SPEECH RECOGNITION |
1037 | Exploring Effective Data Augmentation with TDNN-LSTM Neural Network Embedding for Speaker Recognition |
1268 | EXPLORING MODEL UNITS AND TRAINING STRATEGIES FOR END-TO-END SPEECH RECOGNITION |
1342 | FASNET: LOW-LATENCY ADAPTIVE BEAMFORMING FOR MULTI-MICROPHONE AUDIO PROCESSING |
1194 | FROM SENONES TO CHENONES: TIED CONTEXT-DEPENDENT GRAPHEMES FOR HYBRID SPEECH RECOGNITION |
1134 | GANS FOR CHILDREN: A GENERATIVE DATA AUGMENTATION STRATEGY FOR CHILDREN SPEECH RECOGNITION |
1395 | GENERALIZED LARGE-CONTEXT LANGUAGE MODELS BASED ON FORWARD-BACKWARD HIERARCHICAL RECURRENT ENCODER-DECODER MODELS |
1351 | Hierarchical Transformers for Long Document Classification |
1061 | HIGHLY EFFICIENT NEURAL NETWORK LANGUAGE MODEL COMPRESSION USING SOFT BINARIZATION TRAINING |
1066 | IMPROVED MULTI-STAGE TRAINING OF ONLINE ATTENTION-BASED ENCODER-DECODER MODELS |
1284 | IMPROVING FUNDAMENTAL FREQUENCY GENERATION IN EMG-TO-SPEECH CONVERSION USING A QUANTIZATION APPROACH |
1192 | IMPROVING GRAPHEME-TO-PHONEME CONVERSION BY INVESTIGATING COPYING MECHANISM IN RECURRENT ARCHITECTURES |
1245 | Improving Mandarin End-to-End Speech Synthesis by Self-Attention and Learnable Gaussian Bias |
1108 | IMPROVING RNN TRANSDUCER MODELING FOR END-TO-END SPEECH RECOGNITION |
1059 | IMPROVING SPEECH ENHANCEMENT WITH PHONETIC EMBEDDING FEATURES |
1097 | IMPROVING SPEECH-BASED END-OF-TURN DETECTION VIA CROSS-MODAL REPRESENTATION LEARNING WITH PUNCTUATED TEXT DATA |
1114 | Incorporating Prior Knowledge Into Speaker Diarization and Linking for Identifying Common Speaker |
1011 | INCREMENTAL LATTICE DETERMINIZATION FOR WFST DECODERS |
1049 | INTEGRATING SOURCE-CHANNEL MODEL WITH ATTENTION-BASED SEQUENCE-TO-SEQUENCE MODELS FOR SPEECH RECOGNITION |
1326 | IN-THE-WILD END-TO-END DETECTION OF SPEECH AFFECTING DISEASES |
1126 | INVESTIGATION OF SHALLOW WAVENET VOCODER WITH LAPLACIAN DISTRIBUTION OUTPUT |
1237 | JOINT DISTRIBUTION LEARNING IN THE FRAMEWORK OF VARIATIONAL AUTOENCODERS FOR FAR-FIELD SPEECH ENHANCEMENT |
1249 | JOINT LEARNING OF WORD AND LABEL EMBEDDINGS FOR SEQUENCE LABELLING IN SPOKEN LANGUAGE UNDERSTANDING |
1031 | JOINT OPTIMIZATION OF CLASSIFICATION AND CLUSTERING FOR DEEP SPEAKER EMBEDDING |
1112 | KNOWLEDGE DISTILLATION FROM BERT IN PRE-TRAINING AND FINE-TUNING FOR POLYPHONE DISAMBIGUATION |
1219 | LANGUAGE MODEL BOOTSTRAPPING USING NEURAL MACHINE TRANSLATION FOR CONVERSATIONAL SPEECH RECOGNITION |
1173 | Latent space representation for multi-target speaker detection and identification with a sparse dataset using Triplet neural networks |
1068 | LEAD2GOLD: TOWARDS EXPLOITING THE FULL POTENTIAL OF NOISY TRANSCRIPTIONS FOR SPEECH RECOGNITION |
1080 | LEARNING BETWEEN DIFFERENT TEACHER AND STUDENT MODELS IN ASR |
1131 | LEARNING HIERARCHICAL REPRESENTATIONS FOR EXPRESSIVE SPEAKING STYLE IN END-TO-END SPEECH SYNTHESIS |
1335 | Leveraging language ID in multilingual end-to-end speech recognition |
1203 | LISTENING WHILE SPEAKING AND VISUALIZING: IMPROVING ASR THROUGH MULTIMODAL CHAIN |
1195 | LOGISTIC SIMILARITY METRIC LEARNING VIA AFFINITY MATRIX FOR TEXT-INDEPENDENT SPEAKER VERIFICATION |
1364 | LONG RANGE ACOUSTIC AND DEEP FEATURES PERSPECTIVE ON ASVSPOOF 2019 |
1304 | Low-Resource Domain Adaptation for Speaker Recognition Using Cycle-GANs |
1264 | MARKOV RECURRENT NEURAL NETWORK LANGUAGE MODEL |
1101 | MGB-5: Arabic dialect identification across 17 dialects and Moroccan speech recognition |
1167 | MIMO-SPEECH: End-to-End Multi-Channel Multi-Speaker Speech Recognition |
1234 | Mixed bandwidth acoustic modeling leveraging knowledge distillation |
1344 | MONOTONIC RECURRENT NEURAL NETWORK TRANSDUCER AND DECODING STRATEGIES |
1124 | MULTILINGUAL BOTTLENECK FEATURES FOR QUERY BY EXAMPLE SPOKEN TERM DETECTION |
1004 | MULTILINGUAL END-TO-END SPEECH TRANSLATION |
1302 | NATIVE LANGUAGE IDENTIFICATION FROM RAW WAVEFORMS USING DEEP CONVOLUTIONAL NEURAL NETWORKS WITH ATTENTIVE POOLING |
1218 | NEURAL MACHINE TRANSLATION WITH ACOUSTIC EMBEDDING |
1153 | NOVEL ENHANCED TEAGER ENERGY BASED CEPSTRAL COEFFICIENTS FOR REPLAY SPOOF DETECTION |
1256 | ON TEMPORAL CONTEXT INFORMATION FOR HYBRID BLSTM-BASED PHONEME RECOGNITION |
1081 | On the Study of Generative Adversarial Networks for Cross-lingual Voice Conversion |
1360 | ONE-TO-MANY MULTILINGUAL END-TO-END SPEECH TRANSLATION |
1286 | ONLINE BATCH NORMALIZATION ADAPTATION FOR AUTOMATIC SPEECH RECOGNITION |
1330 | OPTIMIZING NEURAL NETWORK EMBEDDINGS USING PAIR-WISE LOSS FOR TEXT-INDEPENDENT SPEAKER VERIFICATION |
1076 | ORTHOGONALITY CONSTRAINED MULTI-HEAD ATTENTION FOR KEYWORD SPOTTING |
1387 | Paraphrase Generation based on VAE and Pointer-Generator Networks |
1020 | PERSONALIZATION OF END-TO-END SPEECH RECOGNITION ON MOBILE DEVICES FOR NAME ENTITIES |
1410 | Power-law nonlinearity with maximally uniform distribution criterion for improved neural network training in automatic speech recognition |
1320 | PROBING THE INFORMATION ENCODED IN X-VECTORS |
1269 | QUERY-BY-EXAMPLE ON-DEVICE KEYWORD SPOTTING |
1333 | RDI-CU SYSTEM FOR THE 2019 ARABIC MULTI-GENRE BROADCAST CHALLENGE |
1329 | RECOGNIZING LONG-FORM SPEECH USING STREAMING END-TO-END MODELS |
1305 | RECURRENT NEURAL NETWORK TRANSDUCER FOR AUDIO-VISUAL SPEECH RECOGNITION |
1070 | ROBUST BELIEF STATE SPACE REPRESENTATION FOR STATISTICAL DIALOGUE MANAGERS USING DEEP AUTOENCODERS |
1313 | Scalable Neural Dialogue State Tracking |
1152 | Second Language Transfer Learning in Humans and Machines Using Image Supervision |
1210 | SELF-ADAPTIVE SOFT VOICE ACTIVITY DETECTION USING DEEP NEURAL NETWORKS FOR ROBUST SPEAKER VERIFICATION |
1285 | SEMI-SUPERVISED TRAINING AND DATA AUGMENTATION FOR ADAPTATION OF AUTOMATIC BROADCAST NEWS CAPTIONING SYSTEMS |
1120 | SHORT UTTERANCE COMPENSATION IN SPEAKER VERIFICATION VIA COSINE-BASED TEACHER-STUDENT LEARNING OF SPEAKER EMBEDDINGS |
1113 | SIMPLE GATED CONVENT FOR SMALL FOOTPRINT ACOUSTIC MODELING |
1327 | SIMPLIFIED LSTMS FOR SPEECH RECOGNITION |
1027 | SIMULTANEOUS SPEECH RECOGNITION AND SPEAKER DIARIZATION FOR MONAURAL DIALOGUE RECORDINGS WITH TARGET-SPEAKER ACOUSTIC MODELS |
1279 | SLU FOR VOICE COMMAND IN SMART HOME: COMPARISON OF PIPELINE AND END-TO-END APPROACHES |
1276 | SMALL-FOOTPRINT KEYWORD SPOTTING WITH GRAPH CONVOLUTIONAL NETWORK |
1355 | SPATIO-TEMPORAL CONTEXT MODELLING FOR SPEECH EMOTION CLASSIFICATION |
1288 | SPEAKER ADAPTIVE TRAINING USING MODEL AGNOSTIC META-LEARNING |
1224 | SPEAKER AND LANGUAGE AWARE TRAINING FOR END-TO-END ASR |
1322 | SPEAKER VERIFICATION WITH APPLICATION-AWARE BEAMFORMING |
1012 | SPEAKER-AWARE SPEECH-TRANSFORMER |
1317 | SPEECH RECOGNITION WITH AUGMENTED SYNTHESIZED SPEECH |
1283 | SPEECH REVEALS FUTURE RISK OF DEVELOPING DEMENTIA: PREDICTIVE DEMENTIA SCREENING FROM BIOGRAPHIC INTERVIEWS |
1086 | SPEECH SEPARATION USING SPEAKER INVENTORY |
1390 | SPEECH-TO-SPEECH TRANSLATION BETWEEN UNTRANSCRIBED UNKNOWN LANGUAGES |
1243 | SphereDiar: an effective speaker diarization system for meeting data |
1099 | SPOKEN LANGUAGE IDENTIFICATION USING BIDIRECTIONAL LSTM BASED LID SEQUENTIAL SENONES. |
1016 | SPOKEN MULTIPLE-CHOICE QUESTION ANSWERING USING MULTIMODAL CONVOLUTIONAL NEURAL NETWORKS |
1281 | SPOOF DETECTION USING TIME-DELAY SHALLOW NEURAL NETWORK AND FEATURE SWITCHING |
1057 | STATE-OF-THE-ART SPEECH RECOGNITION USING MULTI-STREAM SELF-ATTENTION WITH DILATED 1D CONVOLUTIONS |
1336 | STREAMING END-TO-END SPEECH RECOGNITION WITH JOINT CTC-ATTENTION BASED MODELS |
1165 | SYLLABLE-DEPENDENT DISCRIMINATIVE LEARNING FOR SMALL FOOTPRINT TEXT-DEPENDENT SPEAKER VERIFICATION |
1358 | TACOTRON-BASED ACOUSTIC MODEL USING PHONEME ALIGNMENT FOR PRACTICAL NEURAL TEXT-TO-SPEECH SYSTEMS |
1244 | Time Domain Audio Visual Speech Separation |
1104 | Time-domain speaker extraction network |
1266 | TOPIC-AWARE POINTER-GENERATOR NETWORKS FOR SUMMARIZING SPOKEN CONVERSATIONS |
1334 | TOWARDS CONTROLLING FALSE ALARM --- MISS TRADE-OFF IN PERCEPTUAL SPEAKER COMPARISON VIA NON-NEUTRAL LISTENING TASK FRAMING |
1363 | TOWARDS REAL-TIME MISPRONUNCIATION DETECTION IN KIDS’ SPEECH |
1156 | TRAINING LANGUAGE MODELS FOR LONG-SPAN CROSS-SENTENCE EVALUATION |
1043 | TRANSFER LEARNING FOR CONTEXT-AWARE SPOKEN LANGUAGE UNDERSTANDING |
1178 | TRANSFORMER ASR WITH CONTEXTUAL BLOCK PROCESSING |
1408 | Unsupervised Adaptation of Acoustic Models for ASR using Utterance-level Embeddings from Squeeze and Excitation Networks |
1014 | USING VERY DEEP CONVOLUTIONAL NEURAL NETWORKS TO AUTOMATICALLY DETECT PLAGIARIZED SPOKEN RESPONSES |
1091 | VERIFYING DEEP KEYWORD SPOTTING DETECTION WITH ACOUSTIC WORD EMBEDDINGS |
1065 | Virtual Adversarial Training for DS-CNN based Small-Footprint Keyword Spotting |
1089 | WAVENET FACTORIZATION WITH SINGULAR VALUE DECOMPOSITION FOR VOICE CONVERSION |
1375 | ZERO-SHOT CODE-SWITCHING ASR AND TTS WITH MULTILINGUAL MACHINE SPEECH CHAIN |
1377 | Zero-Shot Pronunciation Lexicons for Cross-Language Acoustic Model Transfer |