List of Accepted Papers

Following is the list of accepted ASRU 2019 papers, sorted by paper title. You can use the search feature of your web browser to find your paper number. Notifications to all authors have also been sent by email. If you have not received your notification of the results by email, please contact us at asru2019@cmsworkshops.com.

1209A COMPARATIVE STUDY ON END-TO-END SPEECH TO TEXT TRANSLATION
1193A COMPARATIVE STUDY ON TRANSFORMER VS RNN IN SPEECH APPLICATIONS
1294A COMPARISON OF END-TO-END MODELS FOR LONG-FORM SPEECH RECOGNITION
1013A COMPARISON OF TRANSFORMER AND LSTM ENCODER DECODER MODELS FOR ASR
1069A CROSS-CORPUS STUDY ON SPEECH EMOTION RECOGNITION
1185A DENSITY RATIO APPROACH TO LANGUAGE MODEL FUSION IN END-TO-END AUTOMATIC SPEECH RECOGNITION
1017A DROPOUT-BASED SINGLE MODEL COMMITTEE APPROACH FOR ACTIVE LEARNING IN ASR
1094A MODULARIZED NEURAL NETWORK WITH LANGUAGE-SPECIFIC OUTPUT LAYERS FOR CROSS-LINGUAL VOICE CONVERSION
1273A Multi Purpose and Large Scale Speech Corpus in Persian and English for Speaker and Speech Recognition: the DeepMine Database
1090A UNIFIED ENDPOINTER USING MULTITASK AND MULTIDOMAIN TRAINING
1298ACOUSTIC MODEL ADAPTATION FROM RAW WAVEFORMS WITH SINCNET
1352ADAPTING PRETRAINED TRANSFORMER TO LATTICES FOR SPOKEN LANGUAGE UNDERSTANDING
1133ADDITIONAL SHARED DECODER ON SIAMESE MULTI-VIEW ENCODERS FOR LEARNING ACOUSTIC WORD EMBEDDINGS
1389Advances in Online Audio-Visual Meeting Transcription
1095ADVERSARIAL ATTACKS ON SPOOFING COUNTERMEASURES OF AUTOMATIC SPEAKER VERIFICATION
1053AN INVESTIGATION INTO THE EFFECTIVENESS OF ENHANCEMENT IN ASR TRAINING AND TEST FOR CHIME-5 DINNER PARTY TRANSCRIPTION
1270AN INVESTIGATION OF LSTM-CTC BASED JOINT ACOUSTIC MODEL FOR INDIAN LANGUAGE IDENTIFICATION
1318Analyzing Large Receptive Field Convolutional Networks for Distant Speech Recognition
1370ATTENTION BASED ON-DEVICE STREAMING SPEECH RECOGNITION WITH LARGE SPEECH CORPUS
1201ATTENTION-BASED SPEECH RECOGNITION USING GAZE INFORMATION
1258BAYESIAN ADVERSARIAL LEARNING FOR SPEAKER RECOGNITION
1242BOOTSTRAPPING NON-PARALLEL VOICE CONVERSION FROM SPEAKER-ADAPTIVE TEXT-TO-SPEECH
1368Character-Aware Attention-Based End-to-End Speech Recognition
1319CNN WITH PHONETIC ATTENTION FOR TEXT-INDEPENDENT SPEAKER VERIFICATION
1228CONTROLLING EMOTION STRENGTH WITH RELATIVE ATTRIBUTE FOR END-TO-END SPEECH SYNTHESIS
1225DATA AUGMENTATION BASED ON VOWEL STRETCH FOR IMPROVING CHILDREN'S SPEECH RECOGNITION
1098Detecting Deception in Political Debates Using Acoustic and Textual Features
1143DEVELOPMENT OF VOICE SPOOFING DETECTION SYSTEMS FOR 2019 EDITION OF AUTOMATIC SPEAKER VERIFICATION AND COUNTERMEASURES CHALLENGE
1271DIALOGUE ENVIRONMENTS ARE DIFFERENT FROM GAMES: INVESTIGATING VARIANTS OF DEEP Q-NETWORKS FOR DIALOGUE POLICY
1374Domain Adaptation via Teacher-Student Learning for End-To-End Speech Recognition
1107DOMAIN EXPANSION IN DNN-BASED ACOUSTIC MODELS FOR ROBUST SPEECH RECOGNITION
1348DOVER: A METHOD FOR COMBINING DIARIZATION OUTPUTS
1257EFFICIENT FREE KEYWORD DETECTION BASED ON CNN AND END-TO-END CONTINUOUS DP-MATCHING
1325EFFICIENT SEMI-SUPERVISED LEARNING FOR NATURAL LANGUAGE UNDERSTANDING BY OPTIMIZING DIVERSITY
1215EMBEDDINGS FOR DNN SPEAKER ADAPTIVE TRAINING
1191EMOCEPTION: AN INCEPTION INSPIRED EFFICIENT SPEECH EMOTION RECOGNITION NETWORK
1376END-TO-END CODE-SWITCHING ASR FOR LOW-RESOURCED LANGUAGE PAIRS
1054END-TO-END NEURAL SPEAKER DIARIZATION WITH SELF-ATTENTION
1205END-TO-END OVERLAPPED SPEECH DETECTION AND SPEAKER COUNTING WITH RAW WAVEFORM
1409End-to-end Training of a Large Vocabulary End-to-end Speech Recognition System
1055ENHANCED BERT-BASED RANKING MODELS FOR SPOKEN DOCUMENT RETRIEVAL
1154ESPRESSO: A FAST END-TO-END NEURAL SPEECH RECOGNITION TOOLKIT
1312EXPLICIT ALIGNMENT OF TEXT AND SPEECH ENCODINGS FOR ATTENTION-BASED END-TO-END SPEECH RECOGNITION
1037Exploring Effective Data Augmentation with TDNN-LSTM Neural Network Embedding for Speaker Recognition
1268EXPLORING MODEL UNITS AND TRAINING STRATEGIES FOR END-TO-END SPEECH RECOGNITION
1342FASNET: LOW-LATENCY ADAPTIVE BEAMFORMING FOR MULTI-MICROPHONE AUDIO PROCESSING
1194FROM SENONES TO CHENONES: TIED CONTEXT-DEPENDENT GRAPHEMES FOR HYBRID SPEECH RECOGNITION
1134GANS FOR CHILDREN: A GENERATIVE DATA AUGMENTATION STRATEGY FOR CHILDREN SPEECH RECOGNITION
1395GENERALIZED LARGE-CONTEXT LANGUAGE MODELS BASED ON FORWARD-BACKWARD HIERARCHICAL RECURRENT ENCODER-DECODER MODELS
1351Hierarchical Transformers for Long Document Classification
1061HIGHLY EFFICIENT NEURAL NETWORK LANGUAGE MODEL COMPRESSION USING SOFT BINARIZATION TRAINING
1066IMPROVED MULTI-STAGE TRAINING OF ONLINE ATTENTION-BASED ENCODER-DECODER MODELS
1284IMPROVING FUNDAMENTAL FREQUENCY GENERATION IN EMG-TO-SPEECH CONVERSION USING A QUANTIZATION APPROACH
1192IMPROVING GRAPHEME-TO-PHONEME CONVERSION BY INVESTIGATING COPYING MECHANISM IN RECURRENT ARCHITECTURES
1245Improving Mandarin End-to-End Speech Synthesis by Self-Attention and Learnable Gaussian Bias
1108IMPROVING RNN TRANSDUCER MODELING FOR END-TO-END SPEECH RECOGNITION
1059IMPROVING SPEECH ENHANCEMENT WITH PHONETIC EMBEDDING FEATURES
1097IMPROVING SPEECH-BASED END-OF-TURN DETECTION VIA CROSS-MODAL REPRESENTATION LEARNING WITH PUNCTUATED TEXT DATA
1114Incorporating Prior Knowledge Into Speaker Diarization and Linking for Identifying Common Speaker
1011INCREMENTAL LATTICE DETERMINIZATION FOR WFST DECODERS
1049INTEGRATING SOURCE-CHANNEL MODEL WITH ATTENTION-BASED SEQUENCE-TO-SEQUENCE MODELS FOR SPEECH RECOGNITION
1326IN-THE-WILD END-TO-END DETECTION OF SPEECH AFFECTING DISEASES
1126INVESTIGATION OF SHALLOW WAVENET VOCODER WITH LAPLACIAN DISTRIBUTION OUTPUT
1237JOINT DISTRIBUTION LEARNING IN THE FRAMEWORK OF VARIATIONAL AUTOENCODERS FOR FAR-FIELD SPEECH ENHANCEMENT
1249JOINT LEARNING OF WORD AND LABEL EMBEDDINGS FOR SEQUENCE LABELLING IN SPOKEN LANGUAGE UNDERSTANDING
1031JOINT OPTIMIZATION OF CLASSIFICATION AND CLUSTERING FOR DEEP SPEAKER EMBEDDING
1112KNOWLEDGE DISTILLATION FROM BERT IN PRE-TRAINING AND FINE-TUNING FOR POLYPHONE DISAMBIGUATION
1219LANGUAGE MODEL BOOTSTRAPPING USING NEURAL MACHINE TRANSLATION FOR CONVERSATIONAL SPEECH RECOGNITION
1173Latent space representation for multi-target speaker detection and identification with a sparse dataset using Triplet neural networks
1068LEAD2GOLD: TOWARDS EXPLOITING THE FULL POTENTIAL OF NOISY TRANSCRIPTIONS FOR SPEECH RECOGNITION
1080LEARNING BETWEEN DIFFERENT TEACHER AND STUDENT MODELS IN ASR
1131LEARNING HIERARCHICAL REPRESENTATIONS FOR EXPRESSIVE SPEAKING STYLE IN END-TO-END SPEECH SYNTHESIS
1335Leveraging language ID in multilingual end-to-end speech recognition
1203LISTENING WHILE SPEAKING AND VISUALIZING: IMPROVING ASR THROUGH MULTIMODAL CHAIN
1195LOGISTIC SIMILARITY METRIC LEARNING VIA AFFINITY MATRIX FOR TEXT-INDEPENDENT SPEAKER VERIFICATION
1364LONG RANGE ACOUSTIC AND DEEP FEATURES PERSPECTIVE ON ASVSPOOF 2019
1304Low-Resource Domain Adaptation for Speaker Recognition Using Cycle-GANs
1264MARKOV RECURRENT NEURAL NETWORK LANGUAGE MODEL
1101MGB-5: Arabic dialect identification across 17 dialects and Moroccan speech recognition
1167MIMO-SPEECH: End-to-End Multi-Channel Multi-Speaker Speech Recognition
1234Mixed bandwidth acoustic modeling leveraging knowledge distillation
1344MONOTONIC RECURRENT NEURAL NETWORK TRANSDUCER AND DECODING STRATEGIES
1124MULTILINGUAL BOTTLENECK FEATURES FOR QUERY BY EXAMPLE SPOKEN TERM DETECTION
1004MULTILINGUAL END-TO-END SPEECH TRANSLATION
1302NATIVE LANGUAGE IDENTIFICATION FROM RAW WAVEFORMS USING DEEP CONVOLUTIONAL NEURAL NETWORKS WITH ATTENTIVE POOLING
1218NEURAL MACHINE TRANSLATION WITH ACOUSTIC EMBEDDING
1153NOVEL ENHANCED TEAGER ENERGY BASED CEPSTRAL COEFFICIENTS FOR REPLAY SPOOF DETECTION
1256ON TEMPORAL CONTEXT INFORMATION FOR HYBRID BLSTM-BASED PHONEME RECOGNITION
1081On the Study of Generative Adversarial Networks for Cross-lingual Voice Conversion
1360ONE-TO-MANY MULTILINGUAL END-TO-END SPEECH TRANSLATION
1286ONLINE BATCH NORMALIZATION ADAPTATION FOR AUTOMATIC SPEECH RECOGNITION
1330OPTIMIZING NEURAL NETWORK EMBEDDINGS USING PAIR-WISE LOSS FOR TEXT-INDEPENDENT SPEAKER VERIFICATION
1076ORTHOGONALITY CONSTRAINED MULTI-HEAD ATTENTION FOR KEYWORD SPOTTING
1387Paraphrase Generation based on VAE and Pointer-Generator Networks
1020PERSONALIZATION OF END-TO-END SPEECH RECOGNITION ON MOBILE DEVICES FOR NAME ENTITIES
1410Power-law nonlinearity with maximally uniform distribution criterion for improved neural network training in automatic speech recognition
1320PROBING THE INFORMATION ENCODED IN X-VECTORS
1269QUERY-BY-EXAMPLE ON-DEVICE KEYWORD SPOTTING
1333RDI-CU SYSTEM FOR THE 2019 ARABIC MULTI-GENRE BROADCAST CHALLENGE
1329RECOGNIZING LONG-FORM SPEECH USING STREAMING END-TO-END MODELS
1305RECURRENT NEURAL NETWORK TRANSDUCER FOR AUDIO-VISUAL SPEECH RECOGNITION
1070ROBUST BELIEF STATE SPACE REPRESENTATION FOR STATISTICAL DIALOGUE MANAGERS USING DEEP AUTOENCODERS
1313Scalable Neural Dialogue State Tracking
1152Second Language Transfer Learning in Humans and Machines Using Image Supervision
1210SELF-ADAPTIVE SOFT VOICE ACTIVITY DETECTION USING DEEP NEURAL NETWORKS FOR ROBUST SPEAKER VERIFICATION
1285SEMI-SUPERVISED TRAINING AND DATA AUGMENTATION FOR ADAPTATION OF AUTOMATIC BROADCAST NEWS CAPTIONING SYSTEMS
1120SHORT UTTERANCE COMPENSATION IN SPEAKER VERIFICATION VIA COSINE-BASED TEACHER-STUDENT LEARNING OF SPEAKER EMBEDDINGS
1113SIMPLE GATED CONVENT FOR SMALL FOOTPRINT ACOUSTIC MODELING
1327SIMPLIFIED LSTMS FOR SPEECH RECOGNITION
1027SIMULTANEOUS SPEECH RECOGNITION AND SPEAKER DIARIZATION FOR MONAURAL DIALOGUE RECORDINGS WITH TARGET-SPEAKER ACOUSTIC MODELS
1279SLU FOR VOICE COMMAND IN SMART HOME: COMPARISON OF PIPELINE AND END-TO-END APPROACHES
1276SMALL-FOOTPRINT KEYWORD SPOTTING WITH GRAPH CONVOLUTIONAL NETWORK
1355SPATIO-TEMPORAL CONTEXT MODELLING FOR SPEECH EMOTION CLASSIFICATION
1288SPEAKER ADAPTIVE TRAINING USING MODEL AGNOSTIC META-LEARNING
1224SPEAKER AND LANGUAGE AWARE TRAINING FOR END-TO-END ASR
1322SPEAKER VERIFICATION WITH APPLICATION-AWARE BEAMFORMING
1012SPEAKER-AWARE SPEECH-TRANSFORMER
1317SPEECH RECOGNITION WITH AUGMENTED SYNTHESIZED SPEECH
1283SPEECH REVEALS FUTURE RISK OF DEVELOPING DEMENTIA: PREDICTIVE DEMENTIA SCREENING FROM BIOGRAPHIC INTERVIEWS
1086SPEECH SEPARATION USING SPEAKER INVENTORY
1390SPEECH-TO-SPEECH TRANSLATION BETWEEN UNTRANSCRIBED UNKNOWN LANGUAGES
1243SphereDiar: an effective speaker diarization system for meeting data
1099SPOKEN LANGUAGE IDENTIFICATION USING BIDIRECTIONAL LSTM BASED LID SEQUENTIAL SENONES.
1016SPOKEN MULTIPLE-CHOICE QUESTION ANSWERING USING MULTIMODAL CONVOLUTIONAL NEURAL NETWORKS
1281SPOOF DETECTION USING TIME-DELAY SHALLOW NEURAL NETWORK AND FEATURE SWITCHING
1057STATE-OF-THE-ART SPEECH RECOGNITION USING MULTI-STREAM SELF-ATTENTION WITH DILATED 1D CONVOLUTIONS
1336STREAMING END-TO-END SPEECH RECOGNITION WITH JOINT CTC-ATTENTION BASED MODELS
1165SYLLABLE-DEPENDENT DISCRIMINATIVE LEARNING FOR SMALL FOOTPRINT TEXT-DEPENDENT SPEAKER VERIFICATION
1358TACOTRON-BASED ACOUSTIC MODEL USING PHONEME ALIGNMENT FOR PRACTICAL NEURAL TEXT-TO-SPEECH SYSTEMS
1244Time Domain Audio Visual Speech Separation
1104Time-domain speaker extraction network
1266TOPIC-AWARE POINTER-GENERATOR NETWORKS FOR SUMMARIZING SPOKEN CONVERSATIONS
1334TOWARDS CONTROLLING FALSE ALARM --- MISS TRADE-OFF IN PERCEPTUAL SPEAKER COMPARISON VIA NON-NEUTRAL LISTENING TASK FRAMING
1363TOWARDS REAL-TIME MISPRONUNCIATION DETECTION IN KIDS’ SPEECH
1156TRAINING LANGUAGE MODELS FOR LONG-SPAN CROSS-SENTENCE EVALUATION
1043TRANSFER LEARNING FOR CONTEXT-AWARE SPOKEN LANGUAGE UNDERSTANDING
1178TRANSFORMER ASR WITH CONTEXTUAL BLOCK PROCESSING
1408Unsupervised Adaptation of Acoustic Models for ASR using Utterance-level Embeddings from Squeeze and Excitation Networks
1014USING VERY DEEP CONVOLUTIONAL NEURAL NETWORKS TO AUTOMATICALLY DETECT PLAGIARIZED SPOKEN RESPONSES
1091VERIFYING DEEP KEYWORD SPOTTING DETECTION WITH ACOUSTIC WORD EMBEDDINGS
1065Virtual Adversarial Training for DS-CNN based Small-Footprint Keyword Spotting
1089WAVENET FACTORIZATION WITH SINGULAR VALUE DECOMPOSITION FOR VOICE CONVERSION
1375ZERO-SHOT CODE-SWITCHING ASR AND TTS WITH MULTILINGUAL MACHINE SPEECH CHAIN
1377Zero-Shot Pronunciation Lexicons for Cross-Language Acoustic Model Transfer