AUD-P2: Deep Learning for Speech and Audio |
| Session Type: Poster |
| Time: Tuesday, 5 May, 11:30 - 13:30 |
| Location: On-Demand |
| Virtual Session: View on Virtual Platform |
| Session Chair: Shoko Araki, NTT |
| AUD-P2.1: WAWENETS: A NO-REFERENCE CONVOLUTIONAL WAVEFORM-BASED APPROACH TO ESTIMATING NARROWBAND AND WIDEBAND SPEECH QUALITY |
| Andrew Catellier; Institute for Telecommunication Sciences |
| Stephen Voran; Institute for Telecommunication Sciences |
| AUD-P2.2: A NEURAL NETWORK FOR MONAURAL INTRUSIVE SPEECH INTELLIGIBILITY PREDICTION |
| Mathias Bach Pedersen; Aalborg University |
| Asger Heidemann Andersen; Oticon A/S |
| Søren Holdt Jensen; Aalborg University |
| Jesper Jensen; Aalborg University |
| AUD-P2.3: SOURCE CODING OF AUDIO SIGNALS WITH A GENERATIVE MODEL |
| Roy Fejgin; Dolby Laboratories |
| Janusz Klejsa; Dolby Sweden AB |
| Lars Villemoes; Dolby Sweden AB |
| Cong Zhou; Dolby Laboratories |
| AUD-P2.4: FULL-REFERENCE SPEECH QUALITY ESTIMATION WITH ATTENTIONAL SIAMESE NEURAL NETWORKS |
| Gabriel Mittag; Technische Universität Berlin |
| Sebastian Möller; Technische Universität Berlin |
| AUD-P2.5: ENHANCED METHOD OF AUDIO CODING USING CNN-BASED SPECTRAL RECOVERY WITH ADAPTIVE STRUCTURE |
| Seong-Hyeon Shin; Kwangwoon University |
| Seung Kwon Beack; Electronics and Telecommunications Research Institute (ETRI) |
| Wootaek Lim; Electronics and Telecommunications Research Institute (ETRI) |
| Hochong Park; Kwangwoon University |
| AUD-P2.6: AUDIO CODEC ENHANCEMENT WITH GENERATIVE ADVERSARIAL NETWORKS |
| Arijit Biswas; Dolby Germany GmbH |
| Dai Jia; Dolby Laboratories |
| AUD-P2.7: EFFICIENT AND SCALABLE NEURAL RESIDUAL WAVEFORM CODING WITH COLLABORATIVE QUANTIZATION |
| Kai Zhen; Indiana University |
| Mi Suk Lee; Electronics and Telecommunications Research Institute (ETRI) |
| Jongmo Sung; Electronics and Telecommunications Research Institute (ETRI) |
| Seungkwon Beack; Electronics and Telecommunications Research Institute (ETRI) |
| Minje Kim; Indiana University |
| AUD-P2.8: A DUAL-STAGED CONTEXT AGGREGATION METHOD TOWARDS EFFICIENT END-TO-END SPEECH ENHANCEMENT |
| Kai Zhen; Indiana University |
| Mi Suk Lee; Electronics and Telecommunications Research Institute (ETRI) |
| Minje Kim; Indiana University |
| AUD-P2.9: A RECURRENT VARIATIONAL AUTOENCODER FOR SPEECH ENHANCEMENT |
| Simon Leglaive; CentraleSupélec, IETR |
| Xavier Alameda-Pineda; Inria Grenoble Rhone-Alpes |
| Laurent Girin; Univ. Grenoble Alpes, Grenoble INP, GIPSA-lab |
| Radu Horaud; Inria Grenoble Rhone-Alpes |
| AUD-P2.10: SPEAKERFILTER: DEEP LEARNING-BASED TARGET SPEAKER EXTRACTION USING ANCHOR SPEECH |
| ShuLin He; Inner Mongolia University |
| Hao Li; Inner Mongolia University |
| XueLiang Zhang; Inner Mongolia University |
| AUD-P2.11: TACKLING REAL NOISY REVERBERANT MEETINGS WITH ALL-NEURAL SOURCE SEPARATION, COUNTING, AND DIARIZATION SYSTEM |
| Keisuke Kinoshita; NTT Corporation |
| Marc Delcroix; NTT Corporation |
| Shoko Araki; NTT Corporation |
| Tomohiro Nakatani; NTT Corporation |
| AUD-P2.12: TIME-DOMAIN AUDIO SOURCE SEPARATION BASED ON WAVE-U-NET COMBINED WITH DISCRETE WAVELET TRANSFORM |
| Tomohiko Nakamura; University of Tokyo |
| Hiroshi Saruwatari; University of Tokyo |