SLP-P58.8

VSTYLE: A BENCHMARK FOR VOICE STYLE ADAPTATION WITH SPOKEN INSTRUCTIONS

Jun Zhan, Fudan University, Alibaba Group, China; Mingyang Han, Alibaba Group, China; Yuxuan Xie, Fudan University, China; Chen Wang, Alibaba Group, China; Dong Zhang, Kexin Huang, Fudan University, China; Haoxiang Shi, Dongxiao Wang, Tengtao Song, Alibaba Group, China; Qinyuan Cheng, Shimin Li, Fudan University, China; Jun Song, Alibaba Group, China; Xipeng Qiu, Fudan University, China; Bo Zheng, Alibaba Group, China

Session:
SLP-P58: Paralinguistic Datasets, Benchmarks & Resources Poster

Track:
Speech and Language Processing [SL]

Location:
Poster Area 42

Presentation Time:
Fri, 8 May, 14:00 - 16:00

Presentation
Discussion
Resources
No resources available.
Session SLP-P58
SLP-P58.1: SUMMARY ON THE MULTILINGUAL CONVERSATIONAL SPEECH LANGUAGE MODEL CHALLENGE: DATASETS, TASKS, BASELINES, AND METHODS
Bingshen Mu, Pengcheng Guo, Zhaokai Sun, Northwestern Polytechnical University, China; Shuai Wang, Nanjing University, China; Hexin Liu, Nanyang Technological University, Singapore; Mingchen Shao, Lei Xie, Northwestern Polytechnical University, China; Eng Siong Chng, Nanyang Technological University, Singapore; Longshuai Xiao, Huawei Technologies, China; Qiangze Feng, Daliang Wang, Nexdata Technology Inc., United States of America
SLP-P58.2: FULL-DUPLEX-BENCH V1.5: EVALUATING OVERLAP HANDLING FOR FULL-DUPLEX SPEECH MODELS
Guan-Ting Lin, Shih-Yun Shan Kuan, National Taiwan University, Taiwan; Qirui Wang, University of Washington, United States of America; Jiachen Lian, Tingle Li, UC Berkeley, United States of America; Shinji Watanabe, Carnegie Mellon University, United States of America; Hung-yi Lee, National Taiwan University, Taiwan
SLP-P58.3: EMU: EMOTION UNDERSTANDING - A NATURALISTIC MULTIMODAL DATASET AND BENCHMARK
Marie S. Newman, Darshana Priyasad, Simon Denman, Queensland University of Technology, Australia; Bouchra Senadji, James Cook University, Australia; Tharindu Fernando, Katherine M. White, Clinton Fookes, Queensland University of Technology, Australia
SLP-P58.4: CAMEO: Collection of Multilingual Emotional Speech Corpora
Iwona Christop, Maciej Czajka, Adam Mickiewicz University, Poland
SLP-P58.5: MSBENCH: CAN SPEECH LANGUAGE MODELS GENERATE MULTI-SPEAKER DIALOGUES IN ONE PASS?
Zichao Xu, University of Electronic Science and Technology of China, China; Ting Liu, China Mobile (Suzhou) Software Technology Co., Ltd, China; Haiqiang Shen, Minhao Liu, Lixin Duan, University of Electronic Science and Technology of China, China
SLP-P58.6: MANGAVOX: DATASET OF ACTED VOICES ALIGNED WITH MANGA IMAGES TOWARDS COMPUTER UNDERSTANDING OF AUDIO COMICS
Shinnosuke Takamichi, Keio University, Japan; Tomohiko Nakamura, Hitoshi Suda, Satoru Fukayama, Jun Ogata, National Institute of Advanced Industrial Science and Technology, Japan
SLP-P58.7: HASAP: HIERARCHICAL ACOUSTIC-SEMANTIC ANNOTATION PIPELINE FOR SCRIPTED SPEECH DATA
Kehan Li, Runchuan Ye, Yixuan Zhou, Shenzhen International Graduate School, Tsinghua University, Shenzhen, China, China; Xin Liu, ModelBest Inc, Beijing, China, China; Zhiyong Wu, Shenzhen International Graduate School, Tsinghua University, Shenzhen, China, China
SLP-P58.8: VSTYLE: A BENCHMARK FOR VOICE STYLE ADAPTATION WITH SPOKEN INSTRUCTIONS
Jun Zhan, Fudan University, Alibaba Group, China; Mingyang Han, Alibaba Group, China; Yuxuan Xie, Fudan University, China; Chen Wang, Alibaba Group, China; Dong Zhang, Kexin Huang, Fudan University, China; Haoxiang Shi, Dongxiao Wang, Tengtao Song, Alibaba Group, China; Qinyuan Cheng, Shimin Li, Fudan University, China; Jun Song, Alibaba Group, China; Xipeng Qiu, Fudan University, China; Bo Zheng, Alibaba Group, China
SLP-P58.9: ISSE: AN INSTRUCTION-GUIDED SPEECH STYLE EDITING DATASET AND BENCHMARK
Yun Chen, University of Surrey, United Kingdom of Great Britain and Northern Ireland; Qi Chen, ByteDance, China; Zheqi Dai, The Chinese University of Hong Kong, China; Arshdeep Singh, Philip J.B. Jackson, Mark D. Plumbley, University of Surrey, United Kingdom of Great Britain and Northern Ireland
SLP-P58.10: SS-JDSC: SINGLE-SPEAKER JAPANESE DYSARTHRIC SPEECH CORPUS
Asahi Ogasawara, Iwate University, Japan; Shinnosuke Takamichi, Keio University, Japan; Jianing Yang, The University of Tokyo, Japan; Go Suenaga, Independent researcher, Japan; Yiyu Tan, Iwate University, Japan
SLP-P58.11: NORD-PARL-TTS: FINNISH AND SWEDISH TTS DATASET FROM PARLIAMENT SPEECH
Zirui Li, Aalto University, Finland; Jens Edlund, KTH Royal Institute of Technology, Sweden; Yicheng Gu, The Chinese University of Hong Kong, China; Nhan Phan, Lauri Juvela, Mikko Kurimo, Aalto University, Finland
SLP-P58.12: FROM PRETRAINING TO ROBUSTNESS: BENCHMARKING SSL MODELS FOR NOISE-ROBUST SPEECH EMOTION RECOGNITION
Sofia Eleftheriou, AUEB, Greece; Theodoros Giannakopoulos, NCSR, Demokritos, Greece; Themos Stafylakis, Ion Androutsopoulos, AUEB, Greece
SLP-P58.13: A COCKTAIL-PARTY BENCHMARK: MULTI-MODAL DATASET AND COMPARATIVE EVALUATION RESULTS
Thai-Binh Nguyen, Karlsruhe Institute of Technology, Germany; Katerina Zmolikova, Pingchuan Ma, Meta AI, United Kingdom of Great Britain and Northern Ireland; Ngoc Quan Pham, Carnegie Mellon University, United States of America; Christian Fuegen, Meta AI, United Kingdom of Great Britain and Northern Ireland; Alexander Waibel, Carnegie Mellon University, United States of America
SLP-P58.14: WENETSPEECH-CHUAN: A LARGE-SCALE SICHUANESE CORPUS WITH RICH ANNOTATION FOR DIALECTAL SPEECH PROCESSING
Yuhang Dai, Ziyu Zhang, Northwestern Polytechnical University, China; Shuai Wang, Nanjing University, China; Longhao Li, Zhao Guo, Tianlun Zuo, Shuiyuan Wang, Hongfei Xue, Chengyou Wang, Northwestern Polytechnical University, China; Qing Wang, Institute of Artificial Intelligence (TeleAI), China Telecom, China; Xin Xu, Hui Bu, Beijing AISHELL Technology Co., Ltd., China; Jie Li, Jian Kang, Institute of Artificial Intelligence (TeleAI), China Telecom, China; Binbin Zhang, WeNet Open Source Community, China; Lei Xie, Northwestern Polytechnical University, China
SLP-P58.15: A long-form single-speaker real-time MRI speech dataset and benchmark
Sean Foley, Jihwan Lee, Kevin Huang, Xuan Shi, Yoonjeong Lee, Louis Goldstein, Shrikanth Narayanan, University of Southern California, United States of America
Contacts