SLP-L20: Streaming and Efficient TTS Systems
Oral
Fri, 8 May, 14:00 - 16:00
Location: Room 114
Session Type: Oral
Track: Speech and Language Processing [SL]
Click the to view the manuscript on IEEE Xplore Open Preview
Fri, 8 May, 14:00 - 14:20

SLP-L20.1: VoXtream: Full-Stream Text-to-Speech with Extremely Low Latency

Nikita Torgashov, Gustav Eje Henter, Gabriel Skantze, KTH Royal Institute of Technology, Sweden
Fri, 8 May, 14:20 - 14:40

SLP-L20.2: SYNCSPEECH: EFFICIENT AND LOW-LATENCY TEXT-TO-SPEECH BASED ON TEMPORAL MASKED TRANSFORMER

Zhengyan Sheng, University of Science and Technology of China, China; Zhihao Du, Shiliang Zhang, Zhijie Yan, Independent Researcher, China; Liping Chen, University of Science and Technology of China, China
Fri, 8 May, 14:40 - 15:00

SLP-L20.3: RETRIEVAL-BASED SPECULATIVE DECODING FOR AUTOREGRESSIVE SPEECH SYNTHESIS

Alan Chi-Man Lee, The Chinese University of Hong Kong, Hong Kong; Wing-Sun Cheng, RISKSIS, Hong Kong; Calvin Chun-Kit Chan, The Chinese University of Hong Kong, Hong Kong
Fri, 8 May, 15:00 - 15:20

SLP-L20.4: PRINCIPLED COARSE-GRAINED ACCEPTANCE FOR SPECULATIVE DECODING IN SPEECH

Moran Yanuka, Apple, Tel-Aviv University, Israel; Paul Dixon, Eyal Finkelshtein, Daniel Rotman, Apple, Switzerland; Raja Giryes, Tel-Aviv University, Israel
Fri, 8 May, 15:20 - 15:40

SLP-L20.5: SPADE: STRUCTURED PRUNING AND ADAPTIVE DISTILLATION FOR EFFICIENT LLM-TTS

Tan Dat Nguyen, Jaehun Kim, Ji-Hoon Kim, Korea Advanced Institute of Science and Technology, Korea, Republic of; Shukjae Choi, Youshin Lim, 42dot Inc, Korea, Republic of; Joon Son Chung, Korea Advanced Institute of Science and Technology, Korea, Republic of
Fri, 8 May, 15:40 - 16:00

SLP-L20.6: T-CACHE: FAST INFERENCE FOR MASKED GENERATIVE TRANSFORMER-BASED TTS VIA PROMPT-AWARE FEATURE CACHING

Obed Irihose, Le Zhang, University of Electronic Science and Technology of China, China